esProc Assists Java to Join Text Files

Uncategorized 904 0

Problem Source: http://stackoverflow.com/questions/26820118/text-file-processing-using-java .

JAVA lacks the class library for handling text file JOIN operation. So it’s quite complicated to write code specifically for it, especially when the files are too big to be loaded into the memory all at once. esProc can help JAVA do this kind of operation. Below is an example showing the steps in detail.

order.txt is a text file, in which the first row contains column names and SellerId column is the logical foreign key that points to EID column in employee.txt. It is required to retrieve OrderID, Client, Name and Dept to create a new file. Below are selections of the source data:

Order.txt

esProc_java_join_text_1

Employee.txt

esProc_java_join_text_2

Suppose both files are not big, then we can perform an inner join using the following esProc code:

        

A

1

=file(“D:\\Order.txt”).import@t()

2

=file(“D:\\employee.txt”).import@t()

3

=join(A1:o,SellerId; A2:e,EId)

4

=A3.new(o.OrderID,o.Client,e.Name,e.Dept)

5

=file(“D:\\result.txt”).export@t(A4)

A1, A2:Retrieve files to memory. @t indicates making the first row as column names.

A3:Perform inner join and the result is as follows:

esProc_java_join_text_4

A4: Retrieve the desired columns from A3.

A5: Export them to a new file. The result is as follows:

esProc_java_join_text_5

If left join is required, then just use @1 option with JOIN function. The code will be =join@1(A1:o,SellerId; A2:e,EId) and the result file is this:

esProc_java_join_text_6

Similarly we can use @f option for a full join.

However if one of the files is big, or both are big, we can handle the operation with esProc cursor. As the following code shows:

        

A

1

=file(“D:\\OrderBig.txt”).cursor@t()

2

=file(“D:\\employeeBig.txt”).cursor@t()

3

=join@x(A1.sortx(SellerId):o,SellerId; A2.sortx(EId):e,EId)

4

=A3.new(o.OrderID,o.Client,e.Name,e.Dept)

5

=file(“D:\\result.txt”).export@t(A4)

A1,A2:Open the big files with cursors.

A3: Inner join of cursors, which, different from the in-memory inner join, requires each cursor is ordered. So sortx function is used here to sort.

A4: Retrieve the desired columns to create a new cursor.

A5: Export the cursor data to a new file.

Similarly, @1 indicates the left join and @f the full join.

So far all the data processing has been done. Then the above script needs to be integrated with JAVA via JDBC, with the following JAVA code:

         // establish a connection via esProc JDBC

         Class.forName(“com.esproc.jdbc.InternalDriver”);

         con= DriverManager.getConnection(“jdbc:esproc:local://”);

         // call esProc script whose name is test

         st =(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call test()”);

         st.execute();//execute esProc stored procedure

         ResultSet set = st.getResultSet();//get the result

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website