In some cases, data must be stored in the file system, rather than in a database. That requires handling file-based data computing manually. Since Java lacks the related class library, you have to hardcode the structured file computing, which produces complicated and unreadable code.
Yet, you can make up for this deficiency with esProc (free edition is available). esProc encapsulates plenty of functions for processing structured files and provides the JDBC interface. A Java application will treat an esProc script as a database stored procedure, execute it after passing parameter and get the result set via JDBC. You can learn more details from How to Use esProc as the Class Library for Java.
Now let’s look at some familiar scenarios involving structured computing in Java and their esProc solutions.
Querying text file
sOrder.txt is a text file separated by tabs. You need to query orders in a specified time period in Java.
Below is the source data:
esProc code:
|
A |
1 |
=file(“D:\\sOrder.txt”).import@t() |
2 |
=A1.select(OrderDate>=startDate && OrderDate<=endDate) |
Import the file, using tab as the default separator. @t means importing the first row as column names. Then perform the conditional filtering. startDate and endDate are input parameters, like a period from 2010-01-01 to 2010-12-31. Here’s the result:
Sorting text file
Based on the above sOrder.txt, you need to sort records in Java by client numbers in descending order and by year and month in ascending order.
esProc code: =A1.sort(-Client,year(OrderDate),month(OrderDate))
Explanation: Use “-” to sort data in descending order. You need to do some calculation to get the years and the months.
Related information: To perform sort after the querying, you can use =A2.sort(…), or =A1.select(…).sort(…) .
Here’s the result:
Grouping and aggregations
You need to calculate the sales amount and number of orders per seller per year.
esProc code: =A1.groups(SellerId,year(OrderDate);sum(Amount),count(~))
Explanation: group function can perform aggregate while grouping data. ~ represents each group or the current group. count(~) is equivalent to count(OrderID).
Here’s the result:
Getting distinct value
You need to make a client list according to sOrder.
esProc code: =A1.id(Client)
Here’s the result:
Removing duplicate values
You need to get the first record for each client and for each seller
esPrco code: =A1.group@1(Client,SellerId)
Explanation: group function is used to group records (without having to aggregate them). @1 means getting the first record from each group.
Here’s the result:
Finding topN
You need to get the 3 orders with the greatest sales amount for each seller.
esProc code: =A1.group(SellerId;~.top(3,-Amount):t).conj(t)
Explanation: top function filters records to get TopN. The sign “-” means a reversed order. conj function is used for concatenation.
Related information: To get the order with the greatest sales amount, use maxp function.
Here’s the result:
Related computing
emp.txt is a text file, separated by tabs. Its EId field corresponds to sOrder’s SellerId field. Now with Java, you need to align Name, Dept and Gender fields in emp.txt with sOrder.txt.
Below is the source data:
esProc code:
1 |
=sOrder=file(“D:\\sOrder.txt”).import@t() |
2 |
=emp=file(“D:\\emp.txt”).import@t(EId,Name,Dept,Gender) |
3 |
=join@1(sOrder:s,SellerId;emp:e,EId) |
4 |
=A3.new(s.OrderID, s.Client, s.SellerId, s.Amount, s.OrderDate,
e.Name, e.Dept, e.Gender) |
The join function performs a join and changes names of the two tables into s and e respectively; @1 means a left join. Then you retrieve the desired fields from the joined table to create a new structured two-dimensional table. Here’s the result:
Related information: @1 means a left join; @f means a full join. No option is needed for an inner join.
All the above examples assume that the size of the file is relatively small. If the file is too big to be entirely loaded into memory, you can use the esProc cursor to handle it. See related documents for detailed information.