This article aims to test esProc performance in processing files concurrently, based on filter, group and aggregate operations and comparison with Oracle.
Test data includes tables of ordering information of different years. With esProc, files (Orders2009.b, Orders2010.b, Orders2011.b and Orders2012.b) of binary format are imported. Below is a selection from the source data:
ORDERID CLIENT SELLERID AMOUNT ORDERDATE NOTE
1 287 47 5825 2013-05-31 gafcaghafdgie f ci…
2 89 22 8681 2013-05-04 gafcaghafdgie f ci…
3 47 67 7702 2009-11-22 gafcaghafdgie f ci…
4 76 85 8717 2011-12-13 gafcaghafdgie f ci…
5 307 81 8003 2008-06-01 gafcaghafdgie f ci…
6 366 39 6948 2009-09-25 gafcaghafdgie f ci…
7 295 8 1419 2013-11-11 gafcaghafdgie f ci…
8 496 35 6018 2011-02-18 gafcaghafdgie f ci…
9 273 37 9255 2011-05-04 gafcaghafdgie f ci…
10 212 0 2155 2009-03-22 gafcaghafdgie f ci…
…
In which note is a string field for increasing each record’s length and hasn’t practical meaning.
Oracle imports the four tables Orders2009, Orders2010, Orders2011 and Orders2012. The data is the same as that of those files. The number of parallel threads is 4. esProc and Oracle use single thread, 2 threads and 4 threads respectively to process the computational task. Each subtask has an input parameter year.
esProc script group.dfx for data grouping and aggregation:
A | B | |
1 | =debug(“group.dfx?year=”+string(year)) | |
2 | 4 | |
3 | fork to(A2) | =file(“/ssd/chenhu/data/orders”+string(year)+”.b”).cursor@bz(;,A3:A2) |
4 | =B3.groups(CLIENT;sum(AMOUNT):amount,count(ORDERID):count) | |
5 | result B4 | |
6 | =A3.conj() | |
7 | =A6.groups(CLIENT;sum(amount):amount,sum(count):count) | |
8 | result A7 |
Oracle SQL1 for data grouping and aggregation:
“select /*+ Parallel(4) */ client,sum(amount),count(orderid) from orders”+string(year)+” group by client” .
esProc script select.dfx for data filtering:
A | B | |
1 | =debug(“select.dfx?year=”+string(year)) | |
2 | 4 | |
3 | fork to(A2) | =file(“/ssd/chenhu/data/orders”+string(year)+”.b”).cursor@bz(;,A3:A2) |
4 | =B3.select(SELLERID==1 && CLIENT==50).fetch() | |
5 | result B4 | |
6 | =A3.conj() | |
7 | result A6 |
Oracle SQL2 for data filtering:”select * from orders1g”+string(year)+” where SELLERID=1 and CLIENT=50” .
Hardware description:
CPU: Core(TM) i5-3450 4 cores and 4 threads
Memory capacity: 16GB
SSD
Note:
1. This performance test focuses on cases in which the result set is small and can be entirely loaded into memory.
2. The input parameters for the four concurrent tasks are four different years.
3. The test result adopts the average value, the maximum value and the minimum value of results of four consecutive computations. Both esProc and Oracle might use the disk cache.
4. Oracle’s available memory space is 6G.
Test result one: Each subtask handles 1G data. The total data is 4G.
Time unit: Second
1. Group and aggregate
Single thread | 2 threads | 4 threads | |||||||
Average | Max | Min | Average | Max | Min | Average | Max | Min | |
Oracle | 1.9 | 3 | 1 | 1.6 | 3 | 1 | 1.4 | 3 | 1 |
esProc | 3.6 | 3 | 4 | 3.5 | 4 | 3 | 3.4 | 4 | 4 |
2. Query and filter
Single thread | 2 threads | 4 threads | |||||||
Average | Max | Min | Average | Max | Min | Average | Max | Min | |
Oracle | 3.8 | 10 | 1 | 1.0 | 2 | 0.5 | 3.1 | 10 | 1 |
esProc | 2.6 | 3 | 2 | 2.5 | 3 | 2 | 2.4 | 3 | 2 |
Test result two: Each subtask handles 4G data. The total data is 16G.
Time unit: Second
1. Group and aggregate
Single thread | 2 threads | 4 threads | |||||||
Average | Max | Min | Average | Max | Min | Average | Max | Min | |
Oracle | 76 | 84 | 51 | 72 | 83 | 43 | 66 | 82 | 44 |
esProc | 50 | 50 | 50 | 54.8 | 55 | 54 | 36 | 39 | 31 |
2. Query and filter
Single thread | 2 threads | 4 threads | |||||||
Average | Max | Min | Average | Max | Min | Average | Max | Min | |
Oracle | 62 | 68 | 38 | 66 | 79 | 51 | 72 | 75 | 66 |
esProc | 54 | 55 | 54 | 56 | 56 | 55 | 53 | 62 | 45 |
Conclusion
1. Oracle cache can be brought into full play in handling the 4G total data that is within the Oracle memory capacity. In this case Oracle gets a higher performance.
2. When the total data is 16G, which exceeds Oracle’s available memory, esProc is generally better than Oracle if using the parallel processing.
3. On the same conditions, esProc has a steady computational time, while Oracle has a big gap between the maximum time and the minimum time.