Sometimes we may encounter computations too complicated to be handled solely using library functions. We need to hardcode the computational logic using scripts. In that case the interpretative and executional performance of the script becomes really important. Not a few traditional database tasks, being unable to be expressed directly in SQL, require stored procedure to perform data traversal.
We’ll test performance of executing esProc scripts and compare it with executing script handling the same task with Oracle’s stored procedure.
Test data:
ORDERID CLIENT SELLERID AMOUNT ORDERDATE NOTE
1 287 47 5825 2013-05-31 gafcaghafdgie f ci…
2 89 22 8681 2013-05-04 gafcaghafdgie f ci…
3 47 67 7702 2009-11-22 gafcaghafdgie f ci…
4 76 85 8717 2011-12-13 gafcaghafdgie f ci…
5 307 81 8003 2008-06-01 gafcaghafdgie f ci…
6 366 39 6948 2009-09-25 gafcaghafdgie f ci…
7 295 8 1419 2013-11-11 gafcaghafdgie f ci…
8 496 35 6018 2011-02-18 gafcaghafdgie f ci…
9 273 37 9255 2011-05-04 gafcaghafdgie f ci…
10 212 0 2155 2009-03-22 gafcaghafdgie f ci…
…
The test involves two cases: small data and big data. The former will be loaded into memory in one go to be processed by loop. The latter will be imported into memory in batches or row by row to be processed by loop.
Case one: Small data
Below is an esProc script:
A | B | C | |
1 | =date(“20150101″,”yyyyMMdd”) | ||
2 | =file(“/ssd/data/orders.b”).import@b() | ||
3 | for 5 | for A2 | =interval(B3.ORDERDATE,A1) |
4 | =int(pos(B3.NOTE,”a”)) | ||
5 | =B3.CLIENT+B3.SELLERID | ||
6 | =C3*C4/if(C5!=0,C5,1) |
The execution time of the esProc script only includes the execution of A3’s loop, without the data importing in B2 counted. As the execution time is short, we’ll repeat the execution five times in case that large error will be produced due to the tiny result value.
Below is the Oracle’s stored procedure for doing the same task:
create or replace procedure orderssp is
type v_table is table of orders%rowtype index by binary_integer;
my_table v_table;
ret2 number;num1 number;num2 number;num3 number;date1 date;
begin
date1:=to_date(‘20150101′,’yyyymmdd’);
select * bulk collect into my_table from orders;
for j in 1..5
loop
for i in 1..my_table.count
loop
num1:= floor(date1-my_table(i).orderdate);
num2:= instr(my_table(i).note,’a’);
num3:= (my_table(i).client+my_table(i).sellerid);
if num3<>0 then ret2:=num1* num2/num3;
else ret2:=num1*num2;
end if;
end loop;
end loop;
end orderssp;
Similarly, the execution time of the Oracle’s stored procedure will not include the data importing time (select * bulk collect into my_table from orders;). It is the time taken to complete the for loop, and executions will repeat five times.
Test data amount: 4 million rows. Oracle data table is 457M and esProc data file is 418M.
Test results (measured by seconds):
Execution time | |
esProc | 14 |
Oracle | 16 |
Case two: Big data
esProc script:
A | B | C | |
1 | =date(“20150101″,”yyyyMMdd”) | ||
2 | =file(“/ssd/data/orders.b”).cursor@b() | ||
3 | for A2,10000 | for A3 | =interval(B3.orderdate,A1) |
4 | =int(pos(B3.note,”a”)) | ||
5 | =B3.client+B3.sellerid | ||
6 | =C3*C4/if(C5!=0,C5,1) |
esProc execution time = Total execution time – data fetching time. The data fetching time is the time spent in fetching data, without real computation counted, i.e. the execution time after removing code from C3 to C6. We’ll do the execution twice to get the total execution time and the data fetching time respectively.
Below is the Oracle’s stored procedure for doing the same task:
create or replace procedure mem is
cursor mem_cur is
select * from orders;
l_mem orders%rowtype;
ret2 number;num1 number;num2 number;num3 number;date1 date;
begin
date1:=to_date(‘20150101′,’yyyymmdd’);
open mem_cur;
loop
fetch mem_cur into l_mem;
exit when mem_cur%notfound;
num1:= floor(date1-l_mem.orderdate);
num2:= instr(l_mem.note,’a’);
num3:= (l_mem.client+l_mem.sellerid);
if num3<>0 then ret2:=num1* num2/num3;
else ret2:=num1*num2;
end if;
end loop;
close mem_cur;
end mem;
Likewise the time of executing Oracle’s stored procedure computing= Total execution time – data fetching time. The data fetching time is the time spent in fetching data, without real computation counted, that is, the time spent in executing the loop body where only fetch statement and exit when statement exist.
Test data amount: 2.4 billion rows. Oracle data table is 28G and esProc data file is 25G.
Test result (measured by seconds):
Total execution | Data fetching | Result | |
esProc | 198 | 113 | 85 |
Oracle | 1731 | 1331 | 400 |
Conclusion:
For small data that can be loaded into memory all at once, esProc and Oracle are equal in their computing performance.
For big data that cannot be entirely loaded into memory, esProc excels in both data fetching and computing performance. The interpreter of Oracle’s stored procedure performs poorly.
In real world business, when dealing with complicated computations that cannot be directly coded using SQL and require handling by the stored procedure after data are fetched row by row, we can move the data out of the database and use esProc to process it. By doing so, higher computing performance can be achieved.
Test environment:
CPU: Core(TM) i5-3450 Four cores, four threads
Memory capacity: 16GB
OS:CENT OS
Oracle11g:Maximum memory 14G
esProc 3.1 version:Maximum memory 14G