Examples of Handling Structured Text Files with esProc

Structured text files are a common file format, and the tasks of handling them are common too. One approach that comes to mind easily is to import such a file into the database for manipulation. But this is time-consuming and causes heavy database usage. In some cases, there is even not a database available. It’s convenient to process the file directly, but unfortunately, high-level languages generally don’t provide the essential class libraries for structured data handling. Hardcoded computations are complicated and cumbersome, and the code is difficult to maintain.

esProc is geared for calculating structured data by encapsulating rich functions and providing support for set operations, related computing and order-related computations, making it easy to manipulate the structured text files when serving as the class library. Besides, it offers the JDBC interface for invocation from the Java application. (See How to Use esProc as the Class Library for Java).

Below are typical examples of handling structured text files and their esProc solutions.

Maintenance

Adding records

Insert a record at the position of the second row of sales.txt. Below is the original file:

esProc_text_structure_data_1

The code:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.insert(2,200,”MS”,20,2000,date(“2015-02-02”))
3	=file(“D:\\sales.txt”).export@t(A1)

Result:

esProc_text_structure_data_2

The insert function adds one or more records. Its first parameter is the position where the record is inserted, and if the parameter is 0, just append the record.

To append a record to a file, there’s no need to import the file. Just use the export@a function as follows:

	A
1	=create(OrderID,Client,SellerId,Amount,OrderDate).record([200,”MS”,20,2000,date(“2015-02-02”)])
2	=file(“D:\\sales.txt”).export@a(A1)

The create function creates a new two-dimensional table. The record function appends a record to the table.

Use insert@r to insert a batch of records, as the following code does:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=create(OrderID,Client,SellerId,Amount,OrderDate)
3	=A2.record([200,”MS”,20,2000,date(“2015-02-02″),300,”Ora”,30,3000,date(“2015-03-03”)])
4	=A1.insert@r(2:A2)

To save space, the code for exporting the file will be omitted in the following sections.

Deleting records

Delete the second record from sales.txt. The code is as follows:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.delete(2)

You can use the delete function to delete records in batches. Suppose you want to delete records 2,3,5,6,7, you can use the code A1.delete([2,3]|to(5,7)).

Or you can perform the conditional deletion. For example, to delete records whose Amount is less than 1000, use A1.delete(A1.select(Amount<1000)).

Modifying records

Modify the sellerId of the second record of sales.txt into 100, and the Amount to 1000. The code is as follows:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.modify(2,100:sellerId,1000:Amount)

You can also perform a batch modification. For example, to add 10 to the Amount of each of the first 10 records, use: A1.modify(1:10,Amount+10:Amount)

Adding columns

Add a computed column year to sales.txt. The code is as follows:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.derive(year(OrderDate):year)

Result:

esProc_text_structure_data_3

Deleting columns

It’s physically inefficient to remove certain columns. Usually the operation is realized through the equal action of “retrieving desired columns”. Supposing you want to retain the OrderID, Amount and OrderDate fields of sales.txt:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.new(OrderID,Amount,OrderDate)

Result:

esProc_text_structure_data_4

Modifying columns

Increase each value of the Amount column of sales.txt by 10%. The code is as follows:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.run(Amount*1.1:Amount)

Result:

esProc_text_structure_data_5

The alternative methods include modify and new functions.

Basic operations

Querying

Query sales.txt using parameters that define a time period. The code is as follows:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.select(OrderDate>=startDate && OrderDate<=endDate)

startDate and endDate are input parameters, such as 2010-01-01 and 2010-12-31 that define a certain time period. Here’s the result:

esProc_text_structure_data_6

Sorting

You are expected to sort sales.txt by the client code in descending order, and by the month and year in ascending order.

The code:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.sort(-Client,year(OrderDate),month(OrderDate))

Use the sign “-” to sort data in descending order. The year and month need to be obtained through computation.

Result:

esProc_text_structure_data_7

Grouping and aggregation

Calculate the annual sales amount and the annual number of orders for each salesperson.

The code:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.groups(SellerId,year(OrderDate);sum(Amount),count(~))

The groups function can group data and perform aggregation at the same time. ~ stands for each group or the current group. count(~) is equal to count(OrderID).

Result:

esProc_text_structure_data_8

Getting distinct values

Display the clients listed by sales.txt.

The code:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.id(Client)

Result:

esProc_text_structure_data_9

Removing duplicate values

Find out the first record of each client and of each salesperson from sales.txt.

The code:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.group@1(Client,SellerId)

The group function performs data grouping (without having to aggregate each group). @1 option works to get the first record of each group.

Result:

esProc_text_structure_data_10

Finding Top N

Finding the 3 top orders for each salesperson in terms of amount.

The code:

	A
1	=file(“D:\\sales.txt”).import@t()
2	=A1.group(SellerId;~.top(3,-Amount):t).conj(t)

The top function filters out the top N in each group. The sign “-” means performing sorting in a reverse direction. The conj function concatenates records.

To find the order with the biggest amount, you can use the maxp function.

Result:

esProc_text_structure_data_11

Binary operations

Set operations (for file comparison)

There are f1.txt and f2.txt, in which the first rows are the column names. The requirement is to find the intersection of the two file’s Name fields. Below are the selections of the files:

The f1.txt file:

esProc_text_structure_data_12

The f2.txt file:

esProc_text_structure_data_13

The code:

	A	B
1	=file(“E:\\f1.txt”).import@t()	=file(“E:\\f2.txt”).import@t()
2	=[A1.(Name),B1.(Name)].isect()

The isect function performs intersect operation on sets. A1.(Name) retrieves A1’s Name field to form a set. B1.(Name) retrieves B1’s Name field to form a set. The final result is as follows:

esProc_text_structure_data_14

Likewise, the union function gets the union of sets, the diff function finds the difference between sets, and the conj function calculates the concatenation of sets (equal to union all). You can also use operators to replace the functions to produce cleaner and simpler code. Below are counterparts of the isect, union, diff and conj functions:
A1.(Name) ^ B1.(Name)
A1.(Name) & B1.(Name)
A1.(Name) \ B1.(Name)
A1.(Name) | B1.(Name)

The above is the set operations on columns. To perform set operations on rows, just retrieve the file by rows as shown below:

	A	B
1	=file(“D:\\f1.txt”).import@tis()	=file(“D:\\f2.txt”).import@tis()
2	=A1^B1

Result:

esProc_text_structure_data_15

Merge operation

Table1.txt and Table2.txt have been sorted by their respective logical primary key (sort them first if they are not ordered). The requirement is to update Table1 with Table2 by comparing the primary key values. Specific rule is that you update Table1 when the key values are identical but insert the record into it when the key values are different.

Below are the source tables:

esProc_text_structure_data_16

The expected result of updating Table1 with Table2 is as follows:

esProc_text_structure_data_17

The code:

	A
1	=file(“D:\\table1.txt”).cursor@t()	=file(“D:\\table2.txt”).cursor@t()
2	=[B1,A1].merge@xu(A,B)
3	=file(“D:\\result.txt”).export@t(A2)

Retrieve Table1.txt and Table2.txt respectively as cursor, and update A1 using B1 according to their logical primary keys. The merge function combines data together and keeps the result ordered as well. @x means cursor manipulation, @u means getting their union. Then write the result set to a new file.

With the use of cursor, you can handle files that have any amount of data, including those extremely large files of course.

If the files are originally not ordered, they need to be sorted before being merged. In that case, A2 can be written as follows:

[B1.sortx(A,B),A1.sortx(A,B)].merge@xu(A,B)

Order-related set operations

The f1.txt file and f2.txt file have been sorted by Name and Dept, and you are required to find their intersection. Below are the source files:

The f1.txt file:

esProc_text_structure_data_18

The f2.txt file:

esProc_text_structure_data_19

	A
1	=file(“D:\\f1.txt”).import@t()	=file(“D:\\f2.txt”).import@t()
2	=[B1,A1].merge@i(Name,Dept)
3	=file(“D:\\result.txt”).export@t(A2)

1	=sOrder=file(“D:\\sales.txt”).import@t()
2	=emp=file(“D:\\emp.txt”).import@t(EId,Name,Dept,Gender)
3	=join@1(sOrder:s,SellerId;emp:e,EId)
4	=A3.new(s.OrderID, s.Client, s.SellerId, s.Amount, s.OrderDate,e.Name, e.Dept, e.Gender)

	A
1	=file(“D:/files/orders.txt”).import@t()
2	=file(“D:/files/customer.txt”).import@t()
3	=file(“D:/files/product.txt”).import@t()
4	=file(“D:/files/supplier.txt”).import@t()
5	=file(“D:/files/region.txt”).import@t()
6	=A2.switch(city,A5:city)
7	=A4.switch(city,A5:city)
8	=A3.switch(sid,A4:sid)
9	=A1.switch(pid,A3:pid; cid,A2:cid)
10	=A1.select(pid.sid.city.region==cid.city.region)
11	=A10.groups(cid.cid:cid;count(oid):count,sum(price*quantity):amount)

	A	B
1	=file(“d:\\old.csv”).import@tc()	=file(“d:\\new.csv”).import@tc()
2	=A1.sort(userName,date)	=B1.sort(userName,date)
3	=new=[B2,A2].merge@d(userName,date)
4	=diff=[B2,A2].merge@d(userName,date,saleValue,saleCount)
5	=update=[diff,new].merge@d(userName,date)

	A
1	=file(“d:\\Data.txt”).import@t()
2	=file(“d:\\List.txt”).read@n()
3	=A1.select(here.array(“;”)^A2!=[])
4	=file(“d:\\result.txt”).export@t(A3)

Examples of Handling Structured Text Files with esProc

Maintenance

Adding records

Deleting records

Modifying records

Adding columns

Deleting columns

Modifying columns

Basic operations

Querying

Sorting

Grouping and aggregation

Getting distinct values

Removing duplicate values

Finding Top N

Binary operations

Set operations (for file comparison)

Merge operation

Order-related set operations

Related computing

Complex computations

Multilevel association

Finding modified records from the CSV file

Comparison of files with different structures

Retrieving data from files in a multilevel directory

Exporting the file by group

Calculating the inventory

Hi，You need to fill in the Username and Email！

	A
1	=directory@p(path)
2	=A1.(file(~).cursor@s())
3	=A2.((~.skip(1),~.fetch@x(1)))
4	=A3.union()
5	=file(“d:\\result.txt”).export@a(A4)
6	=directory@dp(path)
7	=A6.(call(“c:\\readfile.dfx”,~))

	A
1	=file(“D:\\ sales.txt”).import@t()
2	=A1.group(string(OrderDate,”yyyy-MM”);~)
3	=A2.run(file(“d:\\temp\\”+#1+”.txt”).export(#2))

	A	B
1	=file(“D:\\stock.txt”).cursor@t()
2	=A1.group(name,date;~.select(flag==”In”).sum(quantity):in,~.select(flag==”Out”).sum(quantity):out)
3	=A2.group(name)	=periods((t=A2.id(date)).min(),t.max(),1)
4	for A3	=A4.align(B3,date)
5		>c=0
6		=B4.new(A4.name:name,B3(#):date,c:Opening,in,(b=c+in):Total,out,(c=b-out):Close)
7		=@\|B5