1. Testing purposes
Test esProc and Oracle on the same hardware to compare their single machine performances in carrying out a single big task and small concurrent tasks.
2. Testing contents and methods
Data volume:
Small data volume: Single fact table around 10G. To prevent the testing result from being affected by the operating system cache, different fact tables are accessed with multiple concurrent requests.
Large data volume: Single fact table around 100G.
Algorithms:
Several typical SQL algorithms, including data scan, join, data grouping with small result set and with large result set. The use of these simple algorithms is to make the comparison more intuitively and easier to understand, not because esProc and SQL are identical to each other. In fact, the two of them focus on different things. esProc is good at procedure computation for tasks with complicated business logic, while SQL is suitable for computations of average complexity. A complicated algorithm that can be realized with different SQL execution plans makes execution paths uncontrollable and lacks comparability. So the testing doesn’t include one.
Use Cases:
Multiple use cases are designed for testing each algorithm according to table widths and data types.
Among them, the purpose of the use cases of data grouping with large result set is to test the performances of Oracle and esProc in handling the scenarios where the result sets of the group operations are too large to be entirely accommodated by the memory. So this report tests the parallel processing of a task with an amount of 100G data, not the concurrent processing of small tasks.
Note: An appendix of esProc Oracle Single Machine Performance Comparison Process is added to this report. Please refer to it for details such as data structures, test code, and test reproduction.
3. Testing Environment
Testing Machine:Dell Power Edge T610
CPU:Intel Xeon E5620*2
RAM:20G
HDD:Raid5 1T
Operating System:CentOS 6.4
JDK:1.6
Oracle Version:11g
esProc Version:3.1
4. Data Description
The volume of the data is decided by the amount allowed by the exported text file.
For the single machine test, esProc uses the proprietary file format.
As the purpose is mainly to test the performance of big data computation and full table scan performance, no primary key or index is built for any table in the database. The primary key field described in data structure isn’t a physical key but a logical one defining that data repetition isn’t allowed.
4.1Data tables and table association
Fact Tables T1, T11, T12, and T13
They are wide data tables that each has 100 fields to simulate the real-world fact tables with a large number of fields. The four tables have identical structures. T11, T12, T13 are used for small concurrent accesses to different tables to avoid system cache.
Fact Tables T2, T21, T22, and T23
These are narrow data tables having identical structures and 11 fields for simulating the real-world facts tables with a small number of fields. T21, T22, T23 are used for small concurrent accesses to different tables to avoid system cache.
Fact tables are the main data source for this test, and are used in data scan, data grouping, and join operations. Tests are done for both large and small data volume, depending on the numbers of records inserted to the tables.
Dimension tables DL2, DL6, DD2, DD6, DC2, and DC6
The dimension tables are only used in the use cases of the join (and multi-join) operations where dimension tables and fact tables are joined or the dimension tables are joined between themselves. These tables have fixed data volume.
4.2. Data volume
The following statistics are the numbers of records in the testing tables, not the numbers of bytes that they occupy.
Table | Big single task | Small concurrent tasks |
T1 | 150M | 15M |
T11/T12/T13 | — | 15M |
T2 | 1180M | 118M |
T21/T22/T23 | — | 118M |
DL2 | 100 | 100 |
DL6 | 1M | 1M |
DD2 | 100 | 100 |
DD6 | 1M | 1M |
DC2 | 100 | 100 |
DC6 | 1M | 1M |
Note: The big-task test on a single machine is intended to measure the performance when data requires an amount of memory that is far larger than the physical memory. The standard used by the testing is that a fact table is approximately 5 times larger than the physical memory and a dimension table is far smaller than the physical memory. The actual number of records in a table can be adjusted according to the machine configuration.
The test of small concurrent tasks on a single machine is intended to measure the performance when the amount of memory occupied by data is less than the physical memory, and the computation could be done completely in the memory. The standard used by the testing is that a fact table is approximately 50% of the size of the physical memory and a dimension table is far smaller than the physical memory. The actual number of records in a table can be adjusted according to the machine configuration. T11/T12/T13 and T21/T22/T23 are only used for the test of the concurrent computation of small tasks.
To make sure that esProc and Oracle are computing exactly the same data, the data generated in Oracle is exported to a file with the esProc-defined binary format, which is esProc’s data source for the test.
5. Use Cases Description
For better understanding, all the logics in the test are described in SQL.
During the test Oracle executes SQL statements directly, whereas functionally equivalent esProc programs are written to complete the same computation.
5.1. Use Case for big-data scan
This use case is about big-task single machine test. It’s designed to test the computational performance of full table scan for a large volume of data. Performances for full table counting, integer sum, float sum, value sum, integer filter, number filter, character filter, date filter are considered respectively.
Category | Use Case Number | SQL Description |
Full Table Counting | 101 | select count(*) from T1 |
102 | select count(*) from T2 | |
Integer sum | 103 | select sum(TL11) from T1 |
104 | select sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 | |
105 | select sum(TL2) from T2 | |
106 | select sum(L11),sum(FL6),sum(to_number(FD6)),sum(TL1),sum(TL2), sum(TL1*TL2) from T2 | |
Float Sum | 107 | select sum(TN21) from T1 |
108 | select sum(TN21), sum(TN22), sum(TN23), sum(TN24), sum(TN25), sum(TN26), sum(TN27), sum(TN28), sum(TN29) from T1 | |
109 | select sum(TN3) from T2 | |
110 | select sum(TN3*0.1), sum(TN3*0.5), sum(TN3), sum(TN3*1.5), sum(TN3*2), sum(TN3*10), sum(TN3*100), sum(TN3*1000), sum(TN3*10000), sum(TN3*100000) fromT2 | |
Value Sum | 111 | select sum(TN11) from T1 |
112 | select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 | |
113 | select sum(TN2) from T2 | |
114 | select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 | |
Integer Filter | 115 | select count(*) from T1 where FL6 between 100000 and 599999 |
116 | select sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 where FL6 between 100000 and 599999 | |
117 | select count(*) from T2 where FL6 between 100000 and 599999 | |
118 | select sum(L11),sum(FL6),sum(to_number(FD6)),sum(TL1),sum(TL2), sum(TL1*TL2) from T2 where FL6 between 100000 and 599999 | |
Number Filter | 119 | select count(*) from T1 where FD6 between ‘100000’ and ‘599999’ |
120 | select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where FD6 between ‘100000’ and‘599999’ | |
121 | select count(*) from T2 where FD6 between ‘100000’ and ‘599999’ | |
122 | select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 where FD6 between ‘100000’ and ‘599999’ | |
Character Filter | 123 | select count(*) from T1 where FC6 between ‘A00000’ and ‘E99999’ |
124 | select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where FC6 between ‘A00000’ and‘E99999’ | |
125 | select count(*) from T2 where FC6 between ‘A00000’ and ‘E99999’ | |
126 | select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 where FC6 between ‘A00000’ and ‘E99999’ | |
Date Filter | 127 | select count(*) from T1 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date(‘1980-01-01’,’yyyy-mm-dd’) + 5000 |
128 | select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date (‘1980-01-01’,’yyyy-mm-dd’) + 5000 | |
129 | select count(*) from T2 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date(‘1980-01-01’,’yyyy-mm-dd’) + 5000 | |
130 | select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date (‘1980-01-01’, ’yyyy-mm-dd’) + 5000 |
5.2. Use Case for big-data grouping with small result set
This use case is about big-task single machine test. It’s designed to test the performance of grouping computation involving large data volume, where the number of records in each group is small, smaller than the physical memory. Performances for integer grouping, number grouping, character grouping and date grouping are considered respectively.
Category | Use Case Number | SQL Description |
Integer Grouping | 201 | select FL6,count(*) from T1 group by FL6 |
202 | select FL6, sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 group by FL6 | |
203 | select FL6,count(*) from T2 group by FL6 | |
204 | select FL6, sum(L11),sum(FL6),sum(to_number(FD6)), sum(TL1), sum(TL2), sum(TL1*TL2) from T2 group by FL6 | |
Number Grouping | 205 | select FD6, count(*) from T1 group by FD6 |
206 | select FD6, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by FD6 | |
207 | select FD6, count(*) from T2 group by FD6 | |
208 | select FD6, sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 group by FD6 | |
Character Grouping | 209 | select FC6, count(*) from T1 group by FC6 |
210 | select FC6, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by FC6 | |
211 | select FC6, count(*) from T2 group by FC6 | |
212 | select FC6, sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 group by FC6 | |
Date Grouping | 213 | select R2,R4, count(*) from T1 group by R2,R4 |
214 | select R2,R4, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by R2,R4 | |
215 | select R2,R4,count(*) from T2 group by R2,R4 | |
216 | select R2,R4, sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 group by R2,R4 |
5.3. Use Case for big-data grouping with large result set
This use case is about big-task single machine test. It’s designed to test the performance of grouping computation involving large data volume, where the number of records in each group is large, larger than the physical memory, and thus the computation cannot be entirely completed within the memory.
Category | Use Case Number | SQL Description |
Integer Grouping | 301 | select mod(FL11,10000000), count(*) from T1 group by mod(FL11,10000000) |
302 | select mod(FL11,10000000), sum(TL11),sum(TL12),sum(TL13), sum(TL14), sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 group by mod(FL11,10000000) |
5.4. Use Case for big-data joins
This use case is about big-task single machine test. It’s designed to test the performance of join computations involving large data volume. Performances for integer, number, character joins, as well as single-level and multi-level joins, are considered respectively.
Category | Use Case Number | SQL Description |
Integer Join | 401 | select count(*) from T1 join DL6 on T1.FL6=DL6.P |
402 | select sum(T1.TL11),sum(T1.TL12), sum(T1.TL13), sum(T1.TL14), sum(T1.TL15), sum(T1.TL16), sum(T1.TL17), sum(T1.TL18), sum(T1.TL19), sum(DL6.L2) from T1 join DL6 on T1.FL6=DL6.P |
|
403 | select count(*) from T2 join DL6 on T2.FL6=DL6.P | |
404 | select sum(T2.L11), sum(T2.FL6), sum(to_number(T2.FD6)), sum(T2.TL1),sum(T2.TL2), sum(T2.TL1*T2.TL2), sum(DL6.L2) from T2 join DL6 on T2.FL6=DL6.P | |
Integer Multi-level Join | 405 | select count(*) from T1 join DL6 on T1.FL6=DL6.P join DL2 on DL6.FL2=DL2.P |
406 | select sum(T1.TL11),sum(T1.TL12), sum(T1.TL13), sum(T1.TL14), sum(T1.TL15), sum(T1.TL16), sum(T1.TL17), sum(T1.TL18), sum(T1.TL19), sum(DL6.L2), sum (DL2.L2) from T1 join DL6 on T1.FL6=DL6.P join DL2 on DL6.FL2=DL2.P |
|
407 | select count(*) from T2 join DL6 on T2.FL6=DL6.P join DL2 on DL6.FL2=DL2.P | |
408 | select sum(T2.L11), sum(T2.FL6), sum(to_number(T2.FD6)), sum(T2.TL1),sum(T2.TL2), sum(T2.TL1*T2.TL2), sum(DL6.L2), sum(DL2.L2) from T2 join DL6 on T2.FL6=DL6.P join DL2 on DL6.FL2=DL2.P |
|
Number Join | 409 | select count(*) from T1 join DD6 on T1.FD6=DD6.P |
410 | select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DD6.N5) from T1 join DD6 on T1.FD6=DD6.P |
|
411 | select sum(T2.TL2*DD6.L2) from T2 join DD6 on T1.FD6=DD6.P | |
412 | select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DD6.N5) from T2 join DL6 on T2.FL6=DL6.P |
|
Number Multi-level Join | 413 | select count(*) from T1 join DD6 on T1.FD6=DD6.P join DD2 on DD6.FD2=DD2.P |
414 | select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DD6.N5), sum (DD2.N5) from T1 join DD6 on T1.FD6=DD6.P join DD2 on DD6.FD2=DD2.P |
|
415 | select count(*) from T2 join DD6 on T2.FD6=DD6.P join DD2 on DD6.FD2=DD2.P | |
416 | select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DD6.N5), sum(DD2.N5) from T2 join DD6 on T2.FD6=DD6.P join DD2 on DD6.FD2=DD2.P |
|
Character Join | 417 | select count(*) from T1 join DC6 on T1.FC6=DC6.P |
418 | select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DC6.N5) from T1 join DC6 on T1.FC6=DC6.P |
|
419 | select count(*) from T2 join DC6 on T2.FC6=DC6.P | |
420 | select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DC6.N5) from T2 join DC6 on T2.FC6=DC6.P |
|
Character Multi-level Join | 421 | select count(*) from T1 join DC6 on T1.FC6=DC6.P join DC2 on DC6.FC2=DC2.P |
422 | select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DC6.N5), sum (DC2.N5) from T1 join DC6 on T1.FC6=DC6.P join DC2 on DC6.FC2=DC2.P |
|
423 | select count(*) from T2 join DC6 on T2.FC6=DC6.P join DC2 on DC6.FC2=DC2.P | |
424 | select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DC6.N5), sum(DC2.N5) from T2 join DC6 on T2.FC6=DC6.P join DC2 on DC6.FC2=DC2.P |
5.5. Use cases for performing small concurrent tasks
These use cases test the performance of carrying out small concurrent tasks that access various physical tables or files to avoid OS cache. Tests are done for three types operations – data scanning, data grouping and joins – using the above use cases 5.1, 5.2 and 5.4 where the data volume is set as small (See the section of Data Volume).
Scenarios of single task, dual-task concurrency, and four-task concurrency are tested respectively.
6. Use Cases for Testing
6.1. Small concurrent scans
This use case tests Oracle and esProc for scanning performance against small tables (files). The test is done with a multi-task concurrent access mode. Each task is accessing a different table (file). Oracle executes 16-threads parallel processing, while esProc executes 4 tasks in parallel, because it is proved through the test that those are the very number of parallel tasks for highest performance.
Testing statistics:
Note: In the following table, the time unit is second. 2 Concurrencies indicates 2 SQL statements are executed simultaneously; and the same with 4 concurrencies.
Statistics interpretation:
1. Data with blue background is the peak values obtained by this use case. Both Oracle and esProc reach peak performance with 1 concurrency, but Oracle’s performance is several times higher than esProc.
2. Performance degrades significantly both for Oracle and esProc when the number of concurrencies increases to 2. Oracle still has advantage but much of the gap has been closed.
3. esProc handles each task with equal performance, while Oracle is extremely unstable.
6.2. Big-data scan
This use case tests Oracle and esProc for their performance in scanning big tables (files). Scenarios of 1, 2 and 4 parallel tasks are tested respectively.
Testing statistics:
Category | Use Case Number | Oracle | esProc | |||
1 Parallel | 2 Parallels | 1 Parallel | 2 Parallels | 4 Parallels | ||
Full Table Counting | 101 | 280 | 329 | 243 | 235 | 272 |
102 | 311 | 383 | 271 | 214 | 246 | |
Integer sum | 103 | 288 | 328 | 243 | 241 | 268 |
104 | 324 | 385 | 289 | 210 | 197 | |
105 | 375 | 412 | 298 | 220 | 246 | |
106 | 928 | 1011 | 725 | 416 | 210 | |
Float Sum | 107 | 321 | 350 | 247 | 247 | 267 |
108 | 389 | 424 | 297 | 206 | 200 | |
109 | 399 | 436 | 303 | 221 | 248 | |
110 | 1623 | 1816 | 885 | 492 | 321 | |
Value Sum | 111 | 313 | 347 | 260 | 248 | 270 |
112 | 423 | 450 | 483 | 251 | 197 | |
113 | 428 | 463 | 457 | 243 | 248 | |
114 | 1332 | 1420 | 2148 | 1188 | 549 | |
Integer Filter | 115 | 277 | 322 | 256 | 251 | 266 |
116 | 318 | 345 | 302 | 209 | 199 | |
117 | 335 | 364 | 376 | 231 | 245 | |
118 | 676 | 709 | 675 | 362 | 188 | |
Number Filter | 119 | 267 | 331 | 262 | 234 | 268 |
120 | 350 | 377 | 450 | 260 | 201 | |
121 | 338 | 367 | 433 | 253 | 246 | |
122 | 830 | 909 | 1540 | 770 | 390 | |
Character Filter | 123 | 313 | 322 | 264 | 236 | 268 |
124 | 361 | 388 | 486 | 236 | 201 | |
125 | 339 | 370 | 422 | 249 | 246 | |
126 | 887 | 973 | 1642 | 840 | 419 | |
Date Filter | 127 | 274 | 320 | 265 | 230 | 267 |
128 | 273 | 321 | 363 | 209 | 219 | |
129 | 346 | 376 | 1413 | 711 | 378 | |
130 | 801 | 912 | 2482 | 1322 | 647 |
Statistics interpretation:
1. Oracle is at peak performance with 1 thread, but performance starts to degrade with 2 parallels. However, esProc’s performance with 4 parallel tasks is generally higher than with 1 task, sometimes even several times higher, excepting for a few instances.
2. esProc demonstrates an overall advantage over Oracle in this use case test. This is particularly obvious with narrow tables and computation-intensive cases, such as 106, 110 and 114. Use cases 129 and 119 are exceptions, however, where Oracle performs slightly better than esProc.
3. esProc is observed to be capable of increasing the computational performance several times with the increase of parallel number, as in the use cases 106, 110, 114, 118, 122, 126 and 130. These are all about narrow table access, and with more computations.
4. In some cases esProc’s performance also degrade, as in the use cases 103, 107, 111, 115, 119, 119, 123 and 127. These are all about wide table access with fewer computations.
6.3. Small concurrent grouping tasks
This use case tests Oracle and esProc for grouping computation performance against small tables (files). The test is done in a multi-task concurrent access mode. Each task accesses different tables (files).
Testing statistics:
Note: According to the data volume, esProc can choose to do pure in-memory computation or mixed-mode-of-in-memory-and-external-memory computation. This use case uses in-memory computation, while the grouping use case with large data volume in later part of this report is handled with a mixed mode. Pure in-memory computation has the risk of memory overflow. Thus use case uses 1, 2 and 4 concurrencies to avoid that. Data in green is for 1 concurrency, data in red for 2 concurrencies and data in black for 4 concurrencies. Green character reflects parallel 1, red character is for parallel 2, and black for parallel 4. Oracle forces a mixed mode computation to avoid memory overflow. Each task is performed with a fixed number of 16 parallel threads to achieve the best performance.
Statistics interpretation:
1. Oracle performs better than esProc in general, especially with 1 concurrency when the performance of Oracle can be several times better.
2. Oracle is extremely unstable in the performances of concurrent tasks, usually several times faster or slower between one and another. esProc’s performance, however, is generally stable.
3. All date grouping operations are two-level grouping, which, as we can see, Oracle performs better.
6.4. Big-data grouping
This use case tests Oracle and esProc for grouping computation performance against large tables (files). The test is done in a non-concurrent mode using parallel threads of 1, 2 and 4.
Testing statistics:
Category | Use Case Number | Oracle | esProc | |||
1 Parallel | 2 Parallels | 1 Parallel | 2 Parallels | 4 Parallels | ||
Integer Filter Grouping | 201 | 365 | 399 | 401 | 252 | 285 |
202 | 769 | 673 | 542 | 302 | 255 | |
203 | 930 | 1042 | 1018 | 567 | 342 | |
204 | 2660 | 1712 | 1867 | 1115 | 607 | |
Number Grouping | 205 | 368 | 408 | 387 | 252 | 286 |
206 | 1028 | 1247 | 1124 | 690 | 486 | |
207 | 930 | 1041 | 1340 | 759 | 371 | |
208 | 4058 | 2781 | 4352 | 2459 | 1493 | |
Character Grouping | 209 | 365 | 409 | 374 | 258 | 293 |
210 | 1012 | 1184 | 1017 | 651 | 488 | |
211 | 906 | 1032 | 1312 | 710 | 391 | |
212 | 3579 | 2427 | 4409 | 2488 | 1474 | |
Date Grouping | 213 | 304 | 332 | 278 | 259 | 271 |
214 | 454 | 478 | 509 | 273 | 203 | |
215 | 1042 | 1164 | 1318 | 756 | 426 | |
216 | 4730 | 3224 | 4910 | 2542 | 1481 |
Statistics interpretation:
1. Oracle is at peak performance with 1 parallel, and performance starts to decrease with 2. However, esProc’s performance with 4 parallel tasks is normally higher than with 1 task, even several times higher in some cases.
2. esProc’s performance is several times higher than Oracle in this use case test.
6.5. Small concurrent joins
This use case tests Oracle and esProc for performance of joining small (files). The test is done in a multi-task concurrent access mode. Each task accesses different tables (files). Parallel processing won’t be used for Oracle because no performance improvement can be achieved with it. esProc uses 4-thread parallel processing because more threads won’t increase the performance.
Testing statistics:
Statistics interpretation:
1. esProc is one time or several times faster than Oracle.
2. esProc’s performance variation among concurrent tasks is very small. Oracle’s performance variation remains big, but less than the variations in other use cases.
3. Parallel mode yields no performance gain for Oracle, while esProc reaches peak performance with 4 parallel tasks.
6.6 Big-data joins
This use case tests Oracle and esProc for their performance in joining large tables (files) . The test is done in a single-task, non-concurrent mode. Oracle reaches peak performance with 1 parallel task, while esProc reaches its own with 4 parallel tasks.
Testing statistics:
Category | Use Case Number | Oracle | esProc | ||
1 Parallel | 2 Parallels | 2 Parallels | 4 Parallels | ||
Integer Join | 401 | 347 | 383 | 248 | 271 |
402 | 446 | 469 | 210 | 204 | |
403 | 837 | 910 | 505 | 271 | |
404 | 1555 | 1709 | 810 | 402 | |
Integer Multi-level Join | 405 | 352 | 377 | 247 | 270 |
406 | 453 | 507 | 213 | 205 | |
407 | 829 | 945 | 504 | 257 | |
408 | 1587 | 1810 | 847 | 412 | |
Number Join | 409 | 353 | 384 | 245 | 275 |
410 | 509 | 549 | 324 | 199 | |
411 | 1077 | 1152 | 840 | 451 | |
412 | 1767 | 1926 | 1704/1792 | 903/889 | |
Number Multi-level Join | 413 | 354 | 385 | 243 | 275 |
414 | 521 | 557 | 331 | 198 | |
415 | 863 | 946 | 827 | 450 | |
416 | 1830 | 2024 | 1825 | 943 | |
Character Join | 417 | 354 | 385 | 246 | 268 |
418 | 509 | 554 | 322 | 198 | |
419 | 868 | 944 | 513 | 278 | |
420 | 1757 | 1932 | 1689 | 909 | |
Character Multi-level Join | 421 | 356 | 388 | 251 | 277 |
422 | 523 | 549 | 320 | 201 | |
423 | 863 | 957 | 513 | 285 | |
424 | 1827 | 2027 | 1776 | 905 |
Statistics interpretation:
1. esProc performs better than Oracle.
2. With multi-level join, performance degradation for both is little.
6.7. Big-data grouping with big result set
This use case tests Oracle and esProc for grouping computation performance against large tables (files), with the grouping results are too big to be all stored in memory. The test is done in a single-task non-concurrent mode using 1, 2 and 4 parallel threads respectively.
Testing statistics:
Category | Use Case Number | Oracle | esProc | ||||
1 Parallel | 2 Parallels | 4 Parallels | 1 Parallel | 2 Parallels | 4 Parallels | ||
Integer Filter Grouping | 301 | 585 | 570 | 667 | 649 | 419 | 385 |
302 | 1059 | 1010 | 1098 | 1067 | 706 | 656 |
Statistics interpretation:
1. esProc performs better than Oracle.
2. Oracle is at peak performance with 2 parallel tasks. esProc is at peak performance with 4 parallel tasks.
7. Summary Analysis
From all use case tests we can generally reach the following conclusions about the testing statistics:
1. Oracle normally performs better than esProc with small concurrent tasks, even several times higher in some cases. But, there’s the exception of joins where esProc’s performance surpasses Oracle’s.
2. For big-data, single-task computation, esProc’s performance is always higher than Oracle’s, several times higher sometimes.
3. Performance degrades significantly both for Oracle and esProc when concurrent number changes from 1 to 2. Oracle still has advantage but the difference between the two is small enough.
4. For small concurrent tasks, esProc shows stable performance with each task completed using approximately the same time, while Oracle demonstrates great variation among different tasks (normally the variation is up to ten times). However, there are exceptions. Oracle has a very small performance variation among different tasks during join operations, and parallel processing won’t improve the performance in such situations.
Causes of the above conclusions could be as follows:
1. esProc is JAVA-coded and interpreted for execution, and thus has a lower efficiency than Oracle, which is coded in native C language. The latter has a sophisticated multi-tier cache mechanism which enables a much higher performance in in-memory computations. As small concurrent computations can normally be cached, Oracle generally performs better than esProc in such use cases.
2. For big-data, single-task computation, esProc’s performance is always higher than Oracle. This is because a large volume of data cannot be cached, and Oracle’s main method for performance improvement is inapplicable. But in such a situation, esProc has an obvious advantage with its more efficient function-based algorithm.
3. With the concurrent computation with small data volume, overall performance of both degrades significantly with the parallel number changed from 1 to 2, and then to 4. This is a signal of incompetence of the processing power of CPU. Under this circumstance, Oracle still has the advantage, but only slightly. This is because Oracle doesn’t have the time to read the cache, and the cache mechanism almost stops working though the only advantage of native code remains. From this we can see that, basically, the performance-enhancing method for Oracle is effectively leveraging cache.
4. Since Oracle’s cache mechanism and execution plan cannot be managed manually, thus with small concurrent tasks Oracle has a significant performance variation, which sometimes is up to ten times.
5. Small-data joins are an exception. In those cases, esProc’s performance is one or several times higher than Oracle’s and the Oracle performance variation among concurrent tasks is small with parallel processing being invalid. Probably this is caused by the same reason that Oracle’s cache mechanism and execution plan cannot be managed manually, making a join automatically treated as an external-memory computation. The external-memory computation involves competitions for hard disk resources, which leads to the lack of performance improvement with parallel processing. Parallel processing is effective in enhancing esProc’s performance because esProc can load the small dimension tables into the memory, and thus leverage the multi-core CPU for computation. The small variation among concurrent tasks is possibly also because that Oracle automatically treats a join as external-memory computation, which does not require cache, and that resource is equally distributed among concurrent processes. With join computations, Oracle’s advantage of using cache disappears, but the performance enhancement with native code is limited, whereas esProc gains an advantage with efficient function-based algorithm and thus achieves a higher performance.
Final conclusion: Oracle performs better in handling computational tasks with small data or with fewer computations and simple algorithms. esProc performs better in handling big-data tasks, computation-intensive tasks, or those with complicated logics.
Pleast Click here to download the entire document.