Problem Source: http://stackoverflow.com/questions/26820118/text-file-processing-using-java .
JAVA lacks the class library for handling text file JOIN operation. So it’s quite complicated to write code specifically for it, especially when the files are too big to be loaded into the memory all at once. esProc can help JAVA do this kind of operation. Below is an example showing the steps in detail.
order.txt is a text file, in which the first row contains column names and SellerId column is the logical foreign key that points to EID column in employee.txt. It is required to retrieve OrderID, Client, Name and Dept to create a new file. Below are selections of the source data:
Order.txt
Employee.txt
Suppose both files are not big, then we can perform an inner join using the following esProc code:
|
A |
1 |
=file(“D:\\Order.txt”).import@t() |
2 |
=file(“D:\\employee.txt”).import@t() |
3 |
=join(A1:o,SellerId; A2:e,EId) |
4 |
=A3.new(o.OrderID,o.Client,e.Name,e.Dept) |
5 |
=file(“D:\\result.txt”).export@t(A4) |
A1, A2:Retrieve files to memory. @t indicates making the first row as column names.
A3:Perform inner join and the result is as follows:
A4: Retrieve the desired columns from A3.
A5: Export them to a new file. The result is as follows:
If left join is required, then just use @1 option with JOIN function. The code will be =join@1(A1:o,SellerId; A2:e,EId) and the result file is this:
Similarly we can use @f option for a full join.
However if one of the files is big, or both are big, we can handle the operation with esProc cursor. As the following code shows:
|
A |
1 |
=file(“D:\\OrderBig.txt”).cursor@t() |
2 |
=file(“D:\\employeeBig.txt”).cursor@t() |
3 |
=join@x(A1.sortx(SellerId):o,SellerId; A2.sortx(EId):e,EId) |
4 |
=A3.new(o.OrderID,o.Client,e.Name,e.Dept) |
5 |
=file(“D:\\result.txt”).export@t(A4) |
A1,A2:Open the big files with cursors.
A3: Inner join of cursors, which, different from the in-memory inner join, requires each cursor is ordered. So sortx function is used here to sort.
A4: Retrieve the desired columns to create a new cursor.
A5: Export the cursor data to a new file.
Similarly, @1 indicates the left join and @f the full join.
So far all the data processing has been done. Then the above script needs to be integrated with JAVA via JDBC, with the following JAVA code:
// establish a connection via esProc JDBC
Class.forName(“com.esproc.jdbc.InternalDriver”);
con= DriverManager.getConnection(“jdbc:esproc:local://”);
// call esProc script whose name is test
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call test()”);
st.execute();//execute esProc stored procedure
ResultSet set = st.getResultSet();//get the result