Structured text files are a common file format, and the tasks of handling them are common too. One approach that comes to mind easily is to import such a file into the database for manipulation. But this is time-consuming and causes heavy database usage. In some cases, there is even not a database available. It’s convenient to process the file directly, but unfortunately, high-level languages generally don’t provide the essential class libraries for structured data handling. Hardcoded computations are complicated and cumbersome, and the code is difficult to maintain.
esProc is geared for calculating structured data by encapsulating rich functions and providing support for set operations, related computing and order-related computations, making it easy to manipulate the structured text files when serving as the class library. Besides, it offers the JDBC interface for invocation from the Java application. (See How to Use esProc as the Class Library for Java).
Below are typical examples of handling structured text files and their esProc solutions.
Maintenance
Adding records
Insert a record at the position of the second row of sales.txt. Below is the original file:
The code:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.insert(2,200,”MS”,20,2000,date(“2015-02-02”)) |
3 | =file(“D:\\sales.txt”).export@t(A1) |
Result:
The insert function adds one or more records. Its first parameter is the position where the record is inserted, and if the parameter is 0, just append the record.
To append a record to a file, there’s no need to import the file. Just use the export@a function as follows:
A | |
1 | =create(OrderID,Client,SellerId,Amount,OrderDate).record([200,”MS”,20,2000,date(“2015-02-02”)]) |
2 | =file(“D:\\sales.txt”).export@a(A1) |
The create function creates a new two-dimensional table. The record function appends a record to the table.
Use insert@r to insert a batch of records, as the following code does:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =create(OrderID,Client,SellerId,Amount,OrderDate) |
3 | =A2.record([200,”MS”,20,2000,date(“2015-02-02″),300,”Ora”,30,3000,date(“2015-03-03”)]) |
4 | =A1.insert@r(2:A2) |
To save space, the code for exporting the file will be omitted in the following sections.
Deleting records
Delete the second record from sales.txt. The code is as follows:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.delete(2) |
You can use the delete function to delete records in batches. Suppose you want to delete records 2,3,5,6,7, you can use the code A1.delete([2,3]|to(5,7)).
Or you can perform the conditional deletion. For example, to delete records whose Amount is less than 1000, use A1.delete(A1.select(Amount<1000)).
Modifying records
Modify the sellerId of the second record of sales.txt into 100, and the Amount to 1000. The code is as follows:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.modify(2,100:sellerId,1000:Amount) |
You can also perform a batch modification. For example, to add 10 to the Amount of each of the first 10 records, use: A1.modify(1:10,Amount+10:Amount)
Adding columns
Add a computed column year to sales.txt. The code is as follows:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.derive(year(OrderDate):year) |
Result:
Deleting columns
It’s physically inefficient to remove certain columns. Usually the operation is realized through the equal action of “retrieving desired columns”. Supposing you want to retain the OrderID, Amount and OrderDate fields of sales.txt:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.new(OrderID,Amount,OrderDate) |
Result:
Modifying columns
Increase each value of the Amount column of sales.txt by 10%. The code is as follows:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.run(Amount*1.1:Amount) |
Result:
The alternative methods include modify and new functions.
Basic operations
Querying
Query sales.txt using parameters that define a time period. The code is as follows:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.select(OrderDate>=startDate && OrderDate<=endDate) |
startDate and endDate are input parameters, such as 2010-01-01 and 2010-12-31 that define a certain time period. Here’s the result:
Sorting
You are expected to sort sales.txt by the client code in descending order, and by the month and year in ascending order.
The code:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.sort(-Client,year(OrderDate),month(OrderDate)) |
Use the sign “-” to sort data in descending order. The year and month need to be obtained through computation.
Result:
Grouping and aggregation
Calculate the annual sales amount and the annual number of orders for each salesperson.
The code:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.groups(SellerId,year(OrderDate);sum(Amount),count(~)) |
The groups function can group data and perform aggregation at the same time. ~ stands for each group or the current group. count(~) is equal to count(OrderID).
Result:
Getting distinct values
Display the clients listed by sales.txt.
The code:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.id(Client) |
Result:
Removing duplicate values
Find out the first record of each client and of each salesperson from sales.txt.
The code:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.group@1(Client,SellerId) |
The group function performs data grouping (without having to aggregate each group). @1 option works to get the first record of each group.
Result:
Finding Top N
Finding the 3 top orders for each salesperson in terms of amount.
The code:
A | |
1 | =file(“D:\\sales.txt”).import@t() |
2 | =A1.group(SellerId;~.top(3,-Amount):t).conj(t) |
The top function filters out the top N in each group. The sign “-” means performing sorting in a reverse direction. The conj function concatenates records.
To find the order with the biggest amount, you can use the maxp function.
Result:
Binary operations
Set operations (for file comparison)
There are f1.txt and f2.txt, in which the first rows are the column names. The requirement is to find the intersection of the two file’s Name fields. Below are the selections of the files:
The f1.txt file:
The f2.txt file:
The code:
A | B | |
1 | =file(“E:\\f1.txt”).import@t() | =file(“E:\\f2.txt”).import@t() |
2 | =[A1.(Name),B1.(Name)].isect() |
The isect function performs intersect operation on sets. A1.(Name) retrieves A1’s Name field to form a set. B1.(Name) retrieves B1’s Name field to form a set. The final result is as follows:
Likewise, the union function gets the union of sets, the diff function finds the difference between sets, and the conj function calculates the concatenation of sets (equal to union all). You can also use operators to replace the functions to produce cleaner and simpler code. Below are counterparts of the isect, union, diff and conj functions:
A1.(Name) ^ B1.(Name)
A1.(Name) & B1.(Name)
A1.(Name) \ B1.(Name)
A1.(Name) | B1.(Name)
The above is the set operations on columns. To perform set operations on rows, just retrieve the file by rows as shown below:
A | B | |
1 | =file(“D:\\f1.txt”).import@tis() | =file(“D:\\f2.txt”).import@tis() |
2 | =A1^B1 |
Result:
Merge operation
Table1.txt and Table2.txt have been sorted by their respective logical primary key (sort them first if they are not ordered). The requirement is to update Table1 with Table2 by comparing the primary key values. Specific rule is that you update Table1 when the key values are identical but insert the record into it when the key values are different.
Below are the source tables:
The expected result of updating Table1 with Table2 is as follows:
The code:
A | ||
1 | =file(“D:\\table1.txt”).cursor@t() | =file(“D:\\table2.txt”).cursor@t() |
2 | =[B1,A1].merge@xu(A,B) | |
3 | =file(“D:\\result.txt”).export@t(A2) |
Retrieve Table1.txt and Table2.txt respectively as cursor, and update A1 using B1 according to their logical primary keys. The merge function combines data together and keeps the result ordered as well. @x means cursor manipulation, @u means getting their union. Then write the result set to a new file.
With the use of cursor, you can handle files that have any amount of data, including those extremely large files of course.
If the files are originally not ordered, they need to be sorted before being merged. In that case, A2 can be written as follows:
[B1.sortx(A,B),A1.sortx(A,B)].merge@xu(A,B)
Order-related set operations
The f1.txt file and f2.txt file have been sorted by Name and Dept, and you are required to find their intersection. Below are the source files:
The f1.txt file:
The f2.txt file: