Performance Test of Concurrent File Processing with esProc

This article aims to test esProc performance in processing files concurrently, based on filter, group and aggregate operations and comparison with Oracle.

Test data includes tables of ordering information of different years. With esProc, files (Orders2009.b, Orders2010.b, Orders2011.b and Orders2012.b) of binary format are imported. Below is a selection from the source data:

ORDERID CLIENT SELLERID AMOUNT ORDERDATE NOTE

1 287 47 5825 2013-05-31 gafcaghafdgie f ci…

2 89 22 8681 2013-05-04 gafcaghafdgie f ci…

3 47 67 7702 2009-11-22 gafcaghafdgie f ci…

4 76 85 8717 2011-12-13 gafcaghafdgie f ci…

5 307 81 8003 2008-06-01 gafcaghafdgie f ci…

6 366 39 6948 2009-09-25 gafcaghafdgie f ci…

7 295 8 1419 2013-11-11 gafcaghafdgie f ci…

8 496 35 6018 2011-02-18 gafcaghafdgie f ci…

9 273 37 9255 2011-05-04 gafcaghafdgie f ci…

10 212 0 2155 2009-03-22 gafcaghafdgie f ci…

…

In which note is a string field for increasing each record’s length and hasn’t practical meaning.

Oracle imports the four tables Orders2009, Orders2010, Orders2011 and Orders2012. The data is the same as that of those files. The number of parallel threads is 4. esProc and Oracle use single thread, 2 threads and 4 threads respectively to process the computational task. Each subtask has an input parameter year.

esProc script group.dfx for data grouping and aggregation:

	A	B
1	=debug(“group.dfx?year=”+string(year))
2	4
3	fork to(A2)	=file(“/ssd/chenhu/data/orders”+string(year)+”.b”).cursor@bz(;,A3:A2)
4		=B3.groups(CLIENT;sum(AMOUNT):amount,count(ORDERID):count)
5		result B4
6	=A3.conj()
7	=A6.groups(CLIENT;sum(amount):amount,sum(count):count)
8	result A7

Oracle SQL1 for data grouping and aggregation:

“select /*+ Parallel(4) */ client,sum(amount),count(orderid) from orders”+string(year)+” group by client” .

esProc script select.dfx for data filtering:

	A	B
1	=debug(“select.dfx?year=”+string(year))
2	4
3	fork to(A2)	=file(“/ssd/chenhu/data/orders”+string(year)+”.b”).cursor@bz(;,A3:A2)
4		=B3.select(SELLERID==1 && CLIENT==50).fetch()
5		result B4
6	=A3.conj()
7	result A6

Oracle SQL2 for data filtering:”select * from orders1g”+string(year)+” where SELLERID=1 and CLIENT=50” .

Hardware description:

CPU: Core(TM) i5-3450 4 cores and 4 threads

Memory capacity: 16GB

SSD

Note:

1. This performance test focuses on cases in which the result set is small and can be entirely loaded into memory.

2. The input parameters for the four concurrent tasks are four different years.

3. The test result adopts the average value, the maximum value and the minimum value of results of four consecutive computations. Both esProc and Oracle might use the disk cache.

4. Oracle’s available memory space is 6G.

Test result one: Each subtask handles 1G data. The total data is 4G.

Time unit: Second

1. Group and aggregate

	Single thread			2 threads			4 threads
	Average	Max	Min	Average	Max	Min	Average	Max	Min
Oracle	1.9	3	1	1.6	3	1	1.4	3	1
esProc	3.6	3	4	3.5	4	3	3.4	4	4

2. Query and filter

	Single thread			2 threads			4 threads
	Average	Max	Min	Average	Max	Min	Average	Max	Min
Oracle	3.8	10	1	1.0	2	0.5	3.1	10	1
esProc	2.6	3	2	2.5	3	2	2.4	3	2

Test result two: Each subtask handles 4G data. The total data is 16G.

Time unit: Second

1. Group and aggregate

	Single thread			2 threads			4 threads
	Average	Max	Min	Average	Max	Min	Average	Max	Min
Oracle	76	84	51	72	83	43	66	82	44
esProc	50	50	50	54.8	55	54	36	39	31

2. Query and filter

	Single thread			2 threads			4 threads
	Average	Max	Min	Average	Max	Min	Average	Max	Min
Oracle	62	68	38	66	79	51	72	75	66
esProc	54	55	54	56	56	55	53	62	45

Conclusion

1. Oracle cache can be brought into full play in handling the 4G total data that is within the Oracle memory capacity. In this case Oracle gets a higher performance.

2. When the total data is 16G, which exceeds Oracle’s available memory, esProc is generally better than Oracle if using the parallel processing.

3. On the same conditions, esProc has a steady computational time, while Oracle has a big gap between the maximum time and the minimum time.

Hi，You need to fill in the Username and Email！