This article aims to test performance of esProc in processing text files, using an example of data query and filtering and through the comparison with Java and Perl doing the same processing.
Test data is some order records stored in orders.txt file. The imported data is as follows:
ORDERID CLIENT SELLERID AMOUNT ORDERDATE NOTE
1 287 47 5825 2013-05-31 gafcaghafdgie f ci…
2 89 22 8681 2013-05-04 gafcaghafdgie f ci…
3 47 67 7702 2009-11-22 gafcaghafdgie f ci…
4 76 85 8717 2011-12-13 gafcaghafdgie f ci…
5 307 81 8003 2008-06-01 gafcaghafdgie f ci…
6 366 39 6948 2009-09-25 gafcaghafdgie f ci…
7 295 8 1419 2013-11-11 gafcaghafdgie f ci…
8 496 35 6018 2011-02-18 gafcaghafdgie f ci…
9 273 37 9255 2011-05-04 gafcaghafdgie f ci…
10 212 0 2155 2009-03-22 gafcaghafdgie f ci…
…
note field is the string field for increasing each record’s length and hasn’t any practical meaning.
Criteria for data query and filtering: client is 191 and orderdate is between 2013-09-01 and 2013-11-01.
Data volume: 28G
Hardware configuration of the test machine: Normal PC
CPU: Core(TM) i5-3450 Four cores, four threads
Memory capacity: 16GB
SSD
esProc script select.dfx for data filtering:
A | B | C | |
1 | 4 | =date(“2013-09-01”) | =date(“2013-11-01”) |
2 | fork to(A1) | =file(“/ssd/data/orders.txt”).cursor@tz(orderid:string,client:string, sellerid:string,amount:float,orderdate:date,note:string;,A2:A1) |
|
3 | =B2.select(client==”191″ && orderdate>B1&& orderdate<A10).fetch() | ||
4 | result B3 | ||
5 | =A2.conj() |
Java program for data filtering:
package files;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class BigFilter {
public static void myBigFilter(Date start,Date end,String client) throws Exception{
String path=”/ssd/data/”;
SimpleDateFormat sdf = new SimpleDateFormat(“yyyy-MM-dd”);
File file = new File(path+”orders.txt”);
FileInputStream fis = null;
fis = new FileInputStream(file);
InputStreamReader input = new InputStreamReader(fis);
BufferedReader br = new BufferedReader(input);
String line=null;
long len=0;
int index=-1;
List resultList= new ArrayList();
if ((line = br.readLine())== null) return;
while((line = br.readLine())!= null){
String tmp_line=line;
len++;
index=line.indexOf(“\t”);
String orderid1=line.substring(0,index);
line=line.substring(index+1);
index=line.indexOf(“\t”);
String client1=line.substring(0,index);
line=line.substring(index+1);
index=line.indexOf(“\t”);
String sellerid1=line.substring(0,index);
line=line.substring(index+1);
index=line.indexOf(“\t”);
float amount1=Float.parseFloat(line.substring(0,index));
line=line.substring(index+1);
index=line.indexOf(“\t”);
Date orderdate1 =sdf.parse(line.substring(0,index));
line=line.substring(index+1);
String note1=line;
if (client1.equals(client)){
Map<String,Object> emp=new HashMap<String,Object>();
emp.put(“orderid”,orderid1);
emp.put(“client”,client1);
emp.put(“sellerid”,sellerid1);
emp.put(“amount”,amount1);
emp.put(“orderdate”,orderdate1);
emp.put(“note”,note1);
resultList.add(emp);
}
}
System.out.println(“len=”+len);
}
public static void main(String[] args) throws Exception {
SimpleDateFormat df = new SimpleDateFormat(“yyyy-MM-dd HH:mm:ss”);
Date begin=new Date();
System.out.println(“begin:”+df.format(begin));
Date start=new SimpleDateFormat(“yyyy-MM-dd”).parse(“2013-09-01”);
Date end=new SimpleDateFormat(“yyyy-MM-dd”).parse(“2013-11-01″);
String client=”191”;
myBigFilter(start,end,client);
long diff = (new Date()).getTime() – begin.getTime();
System.out.println(“::end::time=”+diff/1000);
}
}
Perl program for data filtering:
#!/usr/bin/perl -w
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;
$year += 1900;
$mon += 1;
$begin=time();
my $datetime = sprintf (“begin:%d-%02d-%02d %02d:%02d:%02d”, $year,$mon,$mday,$hour,$min,$sec);
print $datetime.”\n”;
$start= 20130901;
$end = 20131101;
open(FILE_IN,”/ssd/data/orders.txt”) or die “Can’t open txt, $!”;
<FILE_IN>;
$perIns = <FILE_IN>;
while(defined($perIns = <FILE_IN>))
{
@row=split(“\t” , $perIns);
$orderid=@row[0];
$client=$row[1];
$sellerid=$row[2];
$amount=$row[3];
$orderdate = $row[4];
@row1=split(” “,$row[4]);#print “@row1” . “\n”;
@row2=split(“-“,$row1[0]);#print “@row2” . “\n”;
$orderdate= $row2[0]*10000+$row2[1]*100+$row2[2];
if ($client==191 && $orderdate>$start && $orderdate<$end)
{
push(@inputFileArray,[@row]);
}
}
$t=time()-$begin;
print “time:$t”. “\n”;
Test result:
esProc (Single thread) | Java | Perl | esProc (4 threads) | |
Execution time | 534 seconds | 394 seconds | 604 seconds | 159 seconds |
Conclusion:
Compared with Java, esProc, a language interpreted and implemented by Java, suffers only slight performance loss using single thread in handling a computational task like data filtering. Java hardcoding is no more than twice times as fast as esProc. Both being the interpreted language, Java-based esProc overtakes C-based Perl in performance.
Using multithreaded processing, esProc’s performance is significantly improved and esProc code is simple. Java’s performance will be much improved too when using multithreads, but with complex code. Perl hasn’t advantages with its complex multithreading code and only adequate performance using single thread.
On the whole, esProc is the most practicable one with satisfactory performance and simple enough code.
One more point to note: It is found during the test that Perl functions for converting strings to data of date/time type (Date::Calc package’s Mktime and Time::Piece package’s timegm) have very poor performance. It takes Perl 103 seconds to filter a text file of 1G size, compared with 18 seconds with esProc and 7 seconds with Java. In view of this, the above code adopts the approach of comparing numerical values instead of using conversion functions. With this approach, it takes Perl 20 seconds to filter the text file of 1G size. But you cannot handle more complicated date and time computations. For example, to find out what day it is on a certain month’s last day. Maybe there is more efficient Perl package for handling date/time data, but the impact on the result is little so it is won’t be discussed. Java function String.split() also performs poorly, so indexO() and substring() are used in the above code to split a string apart.