Performance Test of File Retrieval in esProc

This article aims to test esProc performance of retrieving files and compares it with the result got by using Oracle JDBC.

The test data is some order records saved in the orders.b file. Import the file with esProc and its content would be like this:

ORDERID CLIENT SELLERID AMOUNT ORDERDATE NOTE

1 287 47 5825 2013-05-31 gafcaghafdgie f ci…

2 89 22 8681 2013-05-04 gafcaghafdgie f ci…

3 47 67 7702 2009-11-22 gafcaghafdgie f ci…

4 76 85 8717 2011-12-13 gafcaghafdgie f ci…

5 307 81 8003 2008-06-01 gafcaghafdgie f ci…

6 366 39 6948 2009-09-25 gafcaghafdgie f ci…

7 295 8 1419 2013-11-11 gafcaghafdgie f ci…

8 496 35 6018 2011-02-18 gafcaghafdgie f ci…

9 273 37 9255 2011-05-04 gafcaghafdgie f ci…

10 212 0 2155 2009-03-22 gafcaghafdgie f ci…

…

note field is the string field for increasing each record’s length and doesn’t have practical meaning. Test process: Write a Java program to call esProc JDBC and Oracle JDBC respectively for retrieving and traversing the same data, and record the time taken. The Java program, esProc script, the data file and Oracle are all held in one machine, so no network is needed to transmit them.

Data volume: 890M

Hardware and software configuration for the test machine: Normal PC

CPU: Core(TM) i5-3450 4 cores, 4 threads

Memory: 16GB

SSD

OS：CENT OS

Oracle：11g

esProc version: 3.1

esProc script esProcjdbc.dfx for data retrieval (cellset parameter fileName is the file name):

	A
1	=file(fileName).import@b()
2	result A1

The test program of calling esProc JDBC by Java:

import java.sql.*;

import java.text.SimpleDateFormat;

import java.util.Date;

public class esproc {

public static void testDataServer(String input){

Connection con = null;

try{

SimpleDateFormat df = new SimpleDateFormat(“yyyy-MM-dd HH:mm:ss”); Date begin=new Date();

System.out.println(“begin”+df.format(begin));

long diff=0;

Class.forName(“com.esproc.jdbc.InternalDriver”);

con= DriverManager.getConnection(“jdbc:esproc:local://”);

com.esproc.jdbc.InternalCStatement st=

(com.esproc.jdbc.InternalCStatement)con.prepareCall(“call esProcjdbc(?)”);

st.setObject(1,input);

ResultSet set =st.executeQuery();

int len=0;

while(set.next())

{

String orderid=set.getString(“ORDERID”);

String client=set.getString(“CLIENT”);

String sellerid=set.getString(“SELLERID”);

float amount=set.getFloat(“AMOUNT”);

Timestamp ORDERDATE=set.getTimestamp(“ORDERDATE”);

String note=set.getString(“NOTE”);

len++;

}

System.out.println(“len=”+len);

diff=((new Date()).getTime()-begin.getTime())/1000;

System.out.println(“time of all=”+diff);

}

catch(Exception e){

System.out.println(e);

}

finally{

if (con!=null) {

try {

con.close();

}

catch(Exception e) {

System.out.println(e);

}

public static void main(String[] args) {

testDataServer(“/ssd/data/orders.b”);

}

The SQL statement that Oracle implements: Select * from orders;

The test program of calling Oracle JDBC by Java:

package jdbc;

import java.sql.*;

import java.util.Date;

import java.text.SimpleDateFormat;

public class oracle {

public static void testDataServer(String input){

Connection con = null;

try{

SimpleDateFormat df = new SimpleDateFormat(“yyyy-MM-dd HH:mm:ss”); Date begin=new Date();

System.out.println(“begin”+df.format(begin));

long diff=0;

Class.forName(“oracle.jdbc.driver.OracleDriver”);

con= DriverManager.getConnection(“jdbc:oracle:thin:@192.168.1.21:1521:orcl”,”test”,”test”);

Statement st=con.createStatement();

ResultSet set =st.executeQuery(input);

int len=0;

while(set.next())

{

String orderid=set.getString(“ORDERID”);

String client=set.getString(“CLIENT”);

String sellerid=set.getString(“SELLERID”);

float amount=set.getFloat(“AMOUNT”);

Timestamp ORDERDATE=set.getTimestamp(“ORDERDATE”);

String note=set.getString(“NOTE”);

len++;

}

System.out.println(“len=”+len);

diff=((new Date()).getTime()-begin.getTime())/1000;

System.out.println(“time of all=”+diff);

}

catch(Exception e){

System.out.println(e);

}

finally{

if (con!=null) {

try {

con.close();

}

catch(Exception e) {

System.out.println(e);

}

public static void main(String[] args) {

testDataServer(“select * from orders”);

}

Test result:

	esProc	Oracle
Execution time	13 seconds	43 seconds

Conclusion:

Data retrieval using Oracle JDBC is more than one time slower than that via esProc JDBC. The reason is that data flow needs to be converted to object through Oracle JDBC, which takes time. There is no need for Java-based esProc to do the same thing, so it works much faster.

One point to note is that the time taken by the first execution of calling Oracle JDBC to retrieve data is not used by the test. On the other hand, because the data volume is well within 6G, Oracle’s maximum memory capacity, it can make full use of the cache memory to save data. Therefore most of the execution time is devoted to conversion to object via JDBC.

This shows that esProc approach of retrieving data into the external memory is more efficient than Oracle’s throughput capacity.

Generally data computing within database will achieve higher performance. But in some cases you need to retrieve a great amount of data from the database. For example, data outside the database is involved, or the algorithm is too complex and specialized to be expressed in SQL. In these cases, JDBC for database connectivity will be the key factor to determine the performance.

Yet better performance can be achieved by using esProc to retrieve data from the database and save it in files for use in later computation. To handle or display a large amount of historical data by report applications, it is more efficient to retrieve data into files before processing than to process it by retrieving data directly from the database. Within the log analysis system, you can save data as files and use esProc to handle the computation if you must retrieve data from the database using a programming language to perform certain operations involving log files.

Hi，You need to fill in the Username and Email！