In this article, we’ll test the performance of esProc in handling in-memory small data computing, and compare with that of Oracle when performing the same computation.
The test involves two cases: normal simple computing and complicated related computing:
The test data used in normal computing is order information, as shown below:
ORDERID CLIENT SELLERID AMOUNT ORDERDATE
1 287 47 5825 2013-05-31
2 89 22 8681 2013-05-04
3 47 67 7702 2009-11-22
4 76 85 8717 2011-12-13
5 307 81 8003 2008-06-01
6 366 39 6948 2009-09-25
7 295 8 1419 2013-11-11
8 496 35 6018 2011-02-18
9 273 37 9255 2011-05-04
10 212 0 2155 2009-03-22
esProc script for normal computing:
A | B | |
1 | =date(“2013-09-01”) | =date(“2013-11-01”) |
2 | =file(“/ssd/data/orders.b”).import@b() | |
3 | =A2.select(ORDERDATE>A1 && ORDERDATE<B1) | |
4 | =A3.groups@u(CLIENT,SELLERID;sum(AMOUNT):S,count(ORDERID):C) | |
5 | =A4.select(C==13) | |
6 | result A5 |
Note about the esProc script: A2 imports data in one go and its result can be used repeatedly. So it will not be counted in the execution time of in-memory computing, which only includes the time spent in executing steps from A3 to A6.
SQL script that Oracle uses:
select client,sellerid,sum(amount) s,count(orderid) c
from orders
where orderdate>to_date(‘2013-09-01′,’yyyy-MM-dd’)
and orderdate<to_date(‘2013-11-01′,’yyyy-MM-dd’)
group by client,sellerid
having count(orderid)=13
There are 8 million rows of data in orders.
Test result:
esProc | Oracle | |
Execution time | 0.570 seconds | 0.623 seconds |
Note: Execute Oracle’s SQL script four times and get the average of the last three times. In this way Oracle can make full use of its in-memory cached data. As for esProc script, we’ll execute it three times and get the average time.
During related computing testing, esProc imports the following files – orders.b, orders_detail.b, employee.b, department.b and performance.b. The relationships among them are as follows:
esProc script for data importing:
A | B | |
1 | =”/ssd/data/” | |
2 | =file(A1+”orders_detail.b”).import@b() | |
3 | =file(A1+”orders.b”).import@b().primary(ORDERID).index() | |
4 | =file(A1+”employee.b”).import@b().primary(EID).index() | |
5 | =file(A1+”department.b”).import@b().primary(DEPT).index() | |
6 | =file(A1+”performance.b”).import@b().primary(EID).index() | |
7 | =A2.switch(ORDERID,A3:ORDERID) | =A3.switch(SELLERID,A4:EID) |
8 | =A4.switch(DEPT,A5:DEPT) | =A5.switch(MANAGER,A4:EID) |
9 | =A4.switch(EID,A6:EID) | =A2.select(ORDERID.SELLERID.DEPT.MANAGER.EID.BONUS>6000) |
10 | =B9.groups@u(ORDERID.CLIENT:CLIENT,ORDERID.SELLERID.EID.EID:EID,PRODUCT;sum(PRICE*QUANTITY):S,count(ORDERID):C) | |
11 | =A10.select(S>24500 && S<25000) | result A11 |
Note about esProc script: The code from A1 to A9 imports files and establishes the relationships (import and switch). This is done at one time and the result can be used repeatedly. This part of execution will not be included in the execution time of in-memory computing. The execution time is the time spent in executing steps from B9 to B11.
Oracle’s SQL:
select o.client,o.sellerid,od.product,sum(od.price*od.quantity) s,count(o.orderid) c
from orders_detail od left join orders o on o.orderid=od.orderid
left join employee e on o.sellerid=e.eid
left join department d on d.dept=e.dept
left join employee e1 on d.manager=e1.eid
left join performance p on p.employeeid=e1.eid
where p.bonus>=6000
group by o.client,o.sellerid,od.product
having sum(od.price*od.quantity) between 24500 and 25000
Test one: Both the orders table and the orders_detail table have 8 million rows. There are less than 1,000 rows in other tables.
Test result:
esProc | Oracle | |
Execution time | 4.9 seconds | 9.7 seconds |
Test two: Both the orders table and the orders_detail table have 4 million rows. There are less than 1,000 rows in other tables.
Test result:
esProc | Oracle | |
Execution time | 2.3 seconds | 5.1 seconds |
Similarly, we execute Oracle’s SQL script four times and get the average of the last three times, in order to let Oracle make full use of its in-memory cached data. We execute esProc script three times and get the average.
Conclusion:
For small data in-memory computing, when the computation is simple without file relating, esProc is a little faster than Oracle. This is probably because, though Oracle can cache data, it still needs to convert the cached data (without using the physical disk) before processing it. When handling the complicated related computing, esProc works much faster than Oracle. The reason is that Oracle uses the hash algorithm to join two tables, while esProc uses a pointer to reference foreign key values, and thus does not need to compute hash values and query matching values.
Therefore, when there is enough available memory capacity, esProc can achieve much higher performance than conventional relational database if we import data into memory and arrange it beforehand. esProc is great in achieving high performance for in-memory computing.
Test environment:
CPU: Core(TM) i5-3450 four cores, four threads
Memory capacity: 16GB
OS:CENT OS
Oracle11g:Maximum memory capacity is 14G
esProc 3.1 version: Maximum memory capacity is 14G