Performance Test of Concurrent File Processing with esProc

Blog 1075 0

This article aims to test esProc performance in processing files concurrently, based on filter, group and aggregate operations and comparison with Oracle.

Test data includes tables of ordering information of different years. With esProc, files (Orders2009.b, Orders2010.b, Orders2011.b and Orders2012.b) of binary format are imported. Below is a selection from the source data:

ORDERID CLIENT     SELLERID AMOUNT ORDERDATE NOTE

1       287  47     5825         2013-05-31       gafcaghafdgie f ci…

2       89     22     8681         2013-05-04       gafcaghafdgie f ci…

3       47     67     7702         2009-11-22       gafcaghafdgie f ci…

4       76     85     8717         2011-12-13       gafcaghafdgie f ci…

5       307  81     8003         2008-06-01       gafcaghafdgie f ci…

6       366  39     6948         2009-09-25       gafcaghafdgie f ci…

7       295  8       1419         2013-11-11       gafcaghafdgie f ci…

8       496  35     6018         2011-02-18       gafcaghafdgie f ci…

9       273  37     9255         2011-05-04       gafcaghafdgie f ci…

10     212  0       2155         2009-03-22       gafcaghafdgie f ci…

In which note is a string field for increasing each record’s length and hasn’t practical meaning.

Oracle imports the four tables Orders2009, Orders2010, Orders2011 and Orders2012. The data is the same as that of those files. The number of parallel threads is 4. esProc and Oracle use single thread, 2 threads and 4 threads respectively to process the computational task. Each subtask has an input parameter year.

esProc script group.dfx for data grouping and aggregation:

  A B
1 =debug(“group.dfx?year=”+string(year))
2 4  
3 fork to(A2) =file(“/ssd/chenhu/data/orders”+string(year)+”.b”).cursor@bz(;,A3:A2)
4   =B3.groups(CLIENT;sum(AMOUNT):amount,count(ORDERID):count)
5   result B4
6 =A3.conj()  
7 =A6.groups(CLIENT;sum(amount):amount,sum(count):count)
8 result A7  

Oracle SQL1 for data grouping and aggregation:

“select /*+ Parallel(4) */ client,sum(amount),count(orderid) from orders”+string(year)+” group by client” .

esProc script select.dfx for data filtering:

  A B
1 =debug(“select.dfx?year=”+string(year))
2 4  
3 fork to(A2) =file(“/ssd/chenhu/data/orders”+string(year)+”.b”).cursor@bz(;,A3:A2)
4   =B3.select(SELLERID==1 && CLIENT==50).fetch()
5   result B4
6 =A3.conj()  
7 result A6   

Oracle SQL2 for data filtering:”select * from orders1g”+string(year)+” where SELLERID=1 and CLIENT=50” .

Hardware description:

CPU: Core(TM) i5-3450  4 cores and 4 threads

Memory capacity: 16GB

SSD

Note:

1. This performance test focuses on cases in which the result set is small and can be entirely loaded into memory.

2. The input parameters for the four concurrent tasks are four different years.

3. The test result adopts the average value, the maximum value and the minimum value of results of four consecutive computations. Both esProc and Oracle might use the disk cache.

4. Oracle’s available memory space is 6G.

Test result one: Each subtask handles 1G data. The total data is 4G.

Time unit: Second

1. Group and aggregate

  Single thread 2 threads 4 threads
Average Max Min Average Max Min Average Max Min
Oracle 1.9 3 1 1.6 3 1 1.4 3 1
esProc 3.6 3 4 3.5 4 3 3.4 4 4

2. Query and filter

  Single thread 2 threads 4 threads
Average Max Min Average Max Min Average Max Min
Oracle 3.8 10 1 1.0 2 0.5 3.1 10 1
esProc 2.6 3 2 2.5 3 2 2.4 3 2

Test result two: Each subtask handles 4G data. The total data is 16G.

Time unit: Second

1. Group and aggregate

  Single thread 2 threads 4 threads
Average Max Min Average Max Min Average Max Min
Oracle 76 84 51 72 83 43 66 82 44
esProc 50 50 50 54.8 55 54 36 39 31

2. Query and filter

  Single thread 2 threads 4 threads
Average Max Min Average Max Min Average Max Min
Oracle 62 68 38 66 79 51 72 75 66
esProc 54 55 54 56 56 55 53 62 45

Conclusion

1. Oracle cache can be brought into full play in handling the 4G total data that is within the Oracle memory capacity. In this case Oracle gets a higher performance.

2. When the total data is 16G, which exceeds Oracle’s available memory, esProc is generally better than Oracle if using the parallel processing.

3. On the same conditions, esProc has a steady computational time, while Oracle has a big gap between the maximum time and the minimum time. 

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website