Oracle & esProc Performance Comparison Testing on Single Machine

Blog 2010 0

1. Testing purposes

Test esProc and Oracle on the same hardware to compare their single machine performances in carrying out a single big task and small concurrent tasks.

2. Testing contents and methods

Data volume:

Small data volume: Single fact table around 10G. To prevent the testing result from being affected by the operating system cache, different fact tables are accessed with multiple concurrent requests.

Large data volume: Single fact table around 100G.

Algorithms:

Several typical SQL algorithms, including data scan, join, data grouping with small result set and with large result set. The use of these simple algorithms is to make the comparison more intuitively and easier to understand, not because esProc and SQL are identical to each other. In fact, the two of them focus on different things. esProc is good at procedure computation for tasks with complicated business logic, while SQL is suitable for computations of average complexity. A complicated algorithm that can be realized with different SQL execution plans makes execution paths uncontrollable and lacks comparability. So the testing doesn’t include one.

Use Cases:

Multiple use cases are designed for testing each algorithm according to table widths and data types.

Among them, the purpose of the use cases of data grouping with large result set is to test the performances of Oracle and esProc in handling the scenarios where the result sets of the group operations are too large to be entirely accommodated by the memory. So this report tests the parallel processing of a task with an amount of 100G data, not the concurrent processing of small tasks.

Note: An appendix of esProc Oracle Single Machine Performance Comparison Process  is added to this report. Please refer to it for details such as data structures, test code, and test reproduction.

3. Testing Environment

Testing Machine:Dell Power Edge T610

CPU:Intel Xeon E5620*2

RAM:20G

HDD:Raid5 1T

Operating System:CentOS 6.4

JDK:1.6

Oracle Version:11g

esProc Version:3.1

4. Data Description

The volume of the data is decided by the amount allowed by the exported text file.

For the single machine test, esProc uses the proprietary file format.

As the purpose is mainly to test the performance of big data computation and full table scan performance, no primary key or index is built for any table in the database. The primary key field described in data structure isn’t a physical key but a logical one defining that data repetition isn’t allowed.

4.1Data tables and table association

Fact Tables T1, T11, T12, and T13

They are wide data tables that each has 100 fields to simulate the real-world fact tables with a large number of fields. The four tables have identical structures. T11, T12, T13 are used for small concurrent accesses to different tables to avoid system cache.

Fact Tables T2, T21, T22, and T23

These are narrow data tables having identical structures and 11 fields for simulating the real-world facts tables with a small number of fields. T21, T22, T23 are used for small concurrent accesses to different tables to avoid system cache.

Fact tables are the main data source for this test, and are used in data scan, data grouping, and join operations. Tests are done for both large and small data volume, depending on the numbers of records inserted to the tables.

Dimension tables DL2, DL6, DD2, DD6, DC2, and DC6

The dimension tables are only used in the use cases of the join (and multi-join) operations where dimension tables and fact tables are joined or the dimension tables are joined between themselves. These tables have fixed data volume.

4.2. Data volume

The following statistics are the numbers of records in the testing tables, not the numbers of bytes that they occupy.

Table Big single task Small concurrent tasks
T1 150M 15M
T11/T12/T13 15M
T2 1180M 118M
T21/T22/T23 118M
DL2 100 100
DL6 1M 1M
DD2 100 100
DD6 1M 1M
DC2 100 100
DC6 1M 1M

Note: The big-task test on a single machine is intended to measure the performance when data requires an amount of memory that is far larger than the physical memory. The standard used by the testing is that a fact table is approximately 5 times larger than the physical memory and a dimension table is far smaller than the physical memory. The actual number of records in a table can be adjusted according to the machine configuration.

The test of small concurrent tasks on a single machine is intended to measure the performance when the amount of memory occupied by data is less than the physical memory, and the computation could be done completely in the memory. The standard used by the testing is that a fact table is approximately 50% of the size of the physical memory and a dimension table is far smaller than the physical memory. The actual number of records in a table can be adjusted according to the machine configuration. T11/T12/T13 and T21/T22/T23 are only used for the test of the concurrent computation of small tasks.

To make sure that esProc and Oracle are computing exactly the same data, the data generated in Oracle is exported to a file with the esProc-defined binary format, which is esProc’s data source for the test.

5. Use Cases Description

For better understanding, all the logics in the test are described in SQL.

During the test Oracle executes SQL statements directly, whereas functionally equivalent esProc programs are written to complete the same computation.

5.1. Use Case for big-data scan

This use case is about big-task single machine test. It’s designed to test the computational performance of full table scan for a large volume of data. Performances for full table counting, integer sum, float sum, value sum, integer filter, number filter, character filter, date filter are considered respectively.

Category Use Case Number SQL Description
Full Table Counting 101 select count(*) from T1
  102 select count(*) from T2
Integer sum 103 select sum(TL11) from T1
  104 select sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1
  105 select sum(TL2) from T2
  106 select sum(L11),sum(FL6),sum(to_number(FD6)),sum(TL1),sum(TL2), sum(TL1*TL2) from T2
Float Sum 107 select sum(TN21) from T1
  108 select sum(TN21), sum(TN22), sum(TN23), sum(TN24), sum(TN25), sum(TN26), sum(TN27), sum(TN28), sum(TN29) from T1
  109 select sum(TN3) from T2
  110 select sum(TN3*0.1), sum(TN3*0.5), sum(TN3), sum(TN3*1.5), sum(TN3*2), sum(TN3*10), sum(TN3*100), sum(TN3*1000), sum(TN3*10000), sum(TN3*100000) fromT2
Value Sum 111 select sum(TN11) from T1
  112 select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1
  113 select sum(TN2) from T2
  114 select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2
Integer Filter 115 select count(*) from T1 where FL6 between 100000 and 599999
  116 select sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 where FL6 between 100000 and 599999
  117 select count(*) from T2 where FL6 between 100000 and 599999
  118 select sum(L11),sum(FL6),sum(to_number(FD6)),sum(TL1),sum(TL2), sum(TL1*TL2) from T2 where FL6 between 100000 and 599999
Number Filter 119 select count(*) from T1 where FD6 between ‘100000’ and ‘599999’
  120 select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where FD6 between ‘100000’ and‘599999’
  121 select count(*) from T2 where FD6 between ‘100000’ and ‘599999’
  122 select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 where FD6 between ‘100000’ and ‘599999’
Character Filter 123 select count(*) from T1 where FC6 between ‘A00000’ and ‘E99999’
  124 select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where FC6 between ‘A00000’ and‘E99999’
  125 select count(*) from T2 where FC6 between ‘A00000’ and ‘E99999’
  126 select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 where FC6 between ‘A00000’ and ‘E99999’
Date Filter 127 select count(*) from T1 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’)  and to_date(‘1980-01-01’,’yyyy-mm-dd’) + 5000
  128 select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’)  and to_date (‘1980-01-01’,’yyyy-mm-dd’) + 5000
  129 select count(*) from T2 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’)  and to_date(‘1980-01-01’,’yyyy-mm-dd’) + 5000
  130 select sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’)  and to_date (‘1980-01-01’, ’yyyy-mm-dd’) + 5000

5.2. Use Case for big-data grouping with small result set

This use case is about big-task single machine test. It’s designed to test the performance of grouping computation involving large data volume, where the number of records in each group is small, smaller than the physical memory. Performances for integer grouping, number grouping, character grouping and date grouping are considered respectively.

Category Use Case Number SQL Description
Integer Grouping 201 select FL6,count(*) from T1 group by FL6
  202 select FL6, sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 group by FL6
  203 select FL6,count(*) from T2 group by FL6
  204 select FL6, sum(L11),sum(FL6),sum(to_number(FD6)), sum(TL1), sum(TL2), sum(TL1*TL2) from T2 group by FL6
Number Grouping 205 select FD6, count(*) from T1 group by FD6
  206 select FD6, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by FD6
  207 select FD6, count(*) from T2 group by FD6
  208 select FD6, sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 group by FD6
Character Grouping 209 select FC6, count(*) from T1 group by FC6
  210 select FC6, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by FC6
  211 select FC6, count(*) from T2 group by FC6
  212 select FC6, sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 group by FC6
Date Grouping 213 select R2,R4, count(*) from T1 group by R2,R4
  214 select R2,R4, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by R2,R4
  215 select R2,R4,count(*) from T2 group by R2,R4
  216 select R2,R4, sum(TN1), sum(TN2), sum(TL1*TN1), sum(TL1*TN2), sum(TL2*TN1), sum(TL2*TN2) from T2 group by R2,R4

5.3. Use Case for big-data grouping with large result set

This use case is about big-task single machine test. It’s designed to test the performance of grouping computation involving large data volume, where the number of records in each group is large, larger than the physical memory, and thus the computation cannot be entirely completed within the memory.

Category Use Case Number SQL Description
Integer Grouping 301 select mod(FL11,10000000), count(*) from T1 group by mod(FL11,10000000)
  302 select mod(FL11,10000000), sum(TL11),sum(TL12),sum(TL13), sum(TL14), sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 group by mod(FL11,10000000)

5.4. Use Case for big-data joins

This use case is about big-task single machine test. It’s designed to test the performance of join computations involving large data volume. Performances for integer, number, character joins, as well as single-level and multi-level joins, are considered respectively.

Category Use Case Number SQL Description
Integer Join 401 select count(*) from T1 join DL6 on T1.FL6=DL6.P
  402 select sum(T1.TL11),sum(T1.TL12), sum(T1.TL13), sum(T1.TL14), sum(T1.TL15), sum(T1.TL16), sum(T1.TL17), sum(T1.TL18), sum(T1.TL19), sum(DL6.L2) from T1
join DL6 on T1.FL6=DL6.P
  403 select count(*) from T2 join DL6 on T2.FL6=DL6.P
  404 select sum(T2.L11), sum(T2.FL6), sum(to_number(T2.FD6)), sum(T2.TL1),sum(T2.TL2), sum(T2.TL1*T2.TL2), sum(DL6.L2) from T2 join DL6 on T2.FL6=DL6.P
Integer Multi-level Join 405 select count(*) from T1 join DL6 on T1.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
  406 select sum(T1.TL11),sum(T1.TL12), sum(T1.TL13), sum(T1.TL14), sum(T1.TL15), sum(T1.TL16), sum(T1.TL17), sum(T1.TL18), sum(T1.TL19), sum(DL6.L2), sum
(DL2.L2)  from T1 join DL6 on T1.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
  407 select count(*) from T2 join DL6 on T2.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
  408 select sum(T2.L11), sum(T2.FL6), sum(to_number(T2.FD6)), sum(T2.TL1),sum(T2.TL2), sum(T2.TL1*T2.TL2), sum(DL6.L2), sum(DL2.L2) from T2 join DL6 on
T2.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
Number Join 409 select count(*) from T1 join DD6 on T1.FD6=DD6.P
  410 select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DD6.N5) from T1
join DD6 on T1.FD6=DD6.P
  411 select sum(T2.TL2*DD6.L2) from T2 join DD6 on T1.FD6=DD6.P
  412 select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DD6.N5) from T2 join DL6 on
T2.FL6=DL6.P
Number Multi-level Join 413 select count(*) from T1 join DD6 on T1.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
  414 select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DD6.N5), sum
(DD2.N5) from T1 join DD6 on T1.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
  415 select count(*) from T2 join DD6 on T2.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
  416 select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DD6.N5), sum(DD2.N5) from T2 join
DD6 on T2.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
Character Join 417 select count(*) from T1 join DC6 on T1.FC6=DC6.P
  418 select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DC6.N5) from T1
join DC6 on T1.FC6=DC6.P
  419 select count(*) from T2 join DC6 on T2.FC6=DC6.P
  420 select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DC6.N5) from T2 join DC6 on
T2.FC6=DC6.P
Character Multi-level Join 421 select count(*) from T1 join DC6 on T1.FC6=DC6.P join DC2 on DC6.FC2=DC2.P
  422 select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DC6.N5), sum
(DC2.N5) from T1 join DC6 on T1.FC6=DC6.P join DC2 on DC6.FC2=DC2.P
  423 select count(*) from T2 join DC6 on T2.FC6=DC6.P join DC2 on DC6.FC2=DC2.P
  424 select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1*T2.TN1), sum(T2.TL1*T2.TN2), sum(T2.TL2*T2.TN1), sum(T2.TL2*T2.TN2), sum(DC6.N5), sum(DC2.N5) from T2 join
DC6 on T2.FC6=DC6.P join DC2 on DC6.FC2=DC2.P

5.5. Use cases for performing small concurrent tasks

These use cases test the performance of carrying out small concurrent tasks that access various physical tables or files to avoid OS cache. Tests are done for three types operations – data scanning, data grouping and joins – using the above use cases 5.1, 5.2 and 5.4 where the data volume is set as small (See the section of Data Volume).

Scenarios of single task, dual-task concurrency, and four-task concurrency are tested respectively.

6. Use Cases for Testing

6.1. Small concurrent scans

This use case tests Oracle and esProc for scanning performance against small tables (files). The test is done with a multi-task concurrent access mode. Each task is accessing a different table (file). Oracle executes 16-threads parallel processing, while esProc executes 4 tasks in parallel, because it is proved through the test that those are the very number of parallel tasks for highest performance.

Testing statistics:

Note: In the following table, the time unit is second. 2 Concurrencies indicates 2 SQL statements are executed simultaneously; and the same with 4 concurrencies.
esProc_oracle_test1
Statistics interpretation:

1. Data with blue background is the peak values obtained by this use case. Both Oracle and esProc reach peak performance with 1 concurrency, but Oracle’s performance is several times higher than esProc.

2. Performance degrades significantly both for Oracle and esProc when the number of concurrencies increases to 2. Oracle still has advantage but much of the gap has been closed.

3. esProc handles each task with equal performance, while Oracle is extremely unstable.

6.2. Big-data scan

This use case tests Oracle and esProc for their performance in scanning big tables (files). Scenarios of 1, 2 and 4 parallel tasks are tested respectively.

Testing statistics:

Category Use Case Number Oracle esProc
1 Parallel 2 Parallels 1 Parallel 2 Parallels 4 Parallels
Full Table Counting 101 280 329 243 235 272
  102 311 383 271 214 246
Integer sum 103 288 328 243 241 268
  104 324 385 289 210 197
  105 375 412 298 220 246
  106 928 1011 725 416 210
Float Sum 107 321 350 247 247 267
  108 389 424 297 206 200
  109 399 436 303 221 248
  110 1623 1816 885 492 321
Value Sum 111 313 347 260 248 270
  112 423 450 483 251 197
  113 428 463 457 243 248
  114 1332 1420 2148 1188 549
Integer Filter 115 277 322 256 251 266
  116 318 345 302 209 199
  117 335 364 376 231 245
  118 676 709 675 362 188
Number Filter 119 267 331 262 234 268
  120 350 377 450 260 201
  121 338 367 433 253 246
  122 830 909 1540 770 390
Character Filter 123 313 322 264 236 268
  124 361 388 486 236 201
  125 339 370 422 249 246
  126 887 973 1642 840 419
Date Filter 127 274 320 265 230 267
  128 273 321 363 209 219
  129 346 376 1413 711 378
  130 801 912 2482 1322 647

Statistics interpretation:

1. Oracle is at peak performance with 1 thread, but performance starts to degrade with 2 parallels. However, esProc’s performance with 4 parallel tasks is generally higher than with 1 task, sometimes even several times higher, excepting for a few instances.

2. esProc demonstrates an overall advantage over Oracle in this use case test. This is particularly obvious with narrow tables and computation-intensive cases, such as 106, 110 and 114. Use cases 129 and 119 are exceptions, however, where Oracle performs slightly better than esProc.

3. esProc is observed to be capable of increasing the computational performance several times with the increase of parallel number, as in the use cases 106, 110, 114, 118, 122, 126 and 130. These are all about narrow table access, and with more computations.

4. In some cases esProc’s performance also degrade, as in the use cases 103, 107, 111, 115, 119, 119, 123 and 127. These are all about wide table access with fewer computations.

6.3. Small concurrent grouping tasks

This use case tests Oracle and esProc for grouping computation performance against small tables (files). The test is done in a multi-task concurrent access mode. Each task accesses different tables (files).

Testing statistics:

Note: According to the data volume, esProc can choose to do pure in-memory computation or mixed-mode-of-in-memory-and-external-memory computation. This use case uses in-memory computation, while the grouping use case with large data volume in later part of this report is handled with a mixed mode. Pure in-memory computation has the risk of memory overflow. Thus use case uses 1, 2 and 4 concurrencies to avoid that. Data in green is for 1 concurrency, data in red for 2 concurrencies and data in black for 4 concurrencies. Green character reflects parallel 1, red character is for parallel 2, and black for parallel 4. Oracle forces a mixed mode computation to avoid memory overflow. Each task is performed with a fixed number of 16 parallel threads to achieve the best performance.
esProc_oracle_test2
Statistics interpretation:

1. Oracle performs better than esProc in general, especially with 1 concurrency when the performance of Oracle can be several times better.

2. Oracle is extremely unstable in the performances of concurrent tasks, usually several times faster or slower between one and another. esProc’s performance, however, is generally stable.

3. All date grouping operations are two-level grouping, which, as we can see, Oracle performs better.

6.4. Big-data grouping

This use case tests Oracle and esProc for grouping computation performance against large tables (files). The test is done in a non-concurrent mode using parallel threads of 1, 2 and 4.

Testing statistics:

Category Use Case Number Oracle esProc
1 Parallel 2 Parallels 1 Parallel 2 Parallels 4 Parallels
Integer Filter Grouping 201 365 399 401 252 285
  202 769 673 542 302 255
  203 930 1042 1018 567 342
  204 2660 1712 1867 1115 607
Number Grouping 205 368 408 387 252 286
  206 1028 1247 1124 690 486
  207 930 1041 1340 759 371
  208 4058 2781 4352 2459 1493
Character Grouping 209 365 409 374 258 293
  210 1012 1184 1017 651 488
  211 906 1032 1312 710 391
  212 3579 2427 4409 2488 1474
Date Grouping 213 304 332 278 259 271
  214 454 478 509 273 203
  215 1042 1164 1318 756 426
  216 4730 3224 4910 2542 1481

Statistics interpretation:

1. Oracle is at peak performance with 1 parallel, and performance starts to decrease with 2. However, esProc’s performance with 4 parallel tasks is normally higher than with 1 task, even several times higher in some cases.

2. esProc’s performance is several times higher than Oracle in this use case test.

6.5. Small concurrent joins

This use case tests Oracle and esProc for performance of joining small (files). The test is done in a multi-task concurrent access mode. Each task accesses different tables (files). Parallel processing won’t be used for Oracle because no performance improvement can be achieved with it. esProc uses 4-thread parallel processing because more threads won’t increase the performance.

Testing statistics:
esProc_oracle_test3
Statistics interpretation:

1. esProc is one time or several times faster than Oracle.

2. esProc’s performance variation among concurrent tasks is very small. Oracle’s performance variation remains big, but less than the variations in other use cases.

3. Parallel mode yields no performance gain for Oracle, while esProc reaches peak performance with 4 parallel tasks.

6.6 Big-data joins

This use case tests Oracle and esProc for their performance in joining large tables (files) . The test is done in a single-task, non-concurrent mode. Oracle reaches peak performance with 1 parallel task, while esProc reaches its own with 4 parallel tasks.

Testing statistics:

Category Use Case Number Oracle esProc
1 Parallel 2 Parallels 2 Parallels 4 Parallels
Integer Join 401 347 383 248 271
  402 446 469 210 204
  403 837 910 505 271
  404 1555 1709 810 402
Integer Multi-level Join 405 352 377 247 270
  406 453 507 213 205
  407 829 945 504 257
  408 1587 1810 847 412
Number Join 409 353 384 245 275
  410 509 549 324 199
  411 1077 1152 840 451
  412 1767 1926 1704/1792 903/889
Number Multi-level Join 413 354 385 243 275
  414 521 557 331 198
  415 863 946 827 450
  416 1830 2024 1825 943
Character Join 417 354 385 246 268
  418 509 554 322 198
  419 868 944 513 278
  420 1757 1932 1689 909
Character Multi-level Join 421 356 388 251 277
  422 523 549 320 201
  423 863 957 513 285
  424 1827 2027 1776 905

Statistics interpretation:

1. esProc performs better than Oracle.

2. With multi-level join, performance degradation for both is little.

6.7. Big-data grouping with big result set

This use case tests Oracle and esProc for grouping computation performance against large tables (files), with the grouping results are too big to be all stored in memory. The test is done in a single-task non-concurrent mode using 1, 2 and 4 parallel threads respectively.

Testing statistics:

 Category  Use Case Number Oracle esProc
1 Parallel 2 Parallels 4 Parallels 1 Parallel 2 Parallels 4 Parallels
Integer Filter Grouping 301 585 570 667 649 419 385
  302 1059 1010 1098 1067 706 656

Statistics interpretation:

1. esProc performs better than Oracle.

2. Oracle is at peak performance with 2 parallel tasks. esProc is at peak performance with 4 parallel tasks.

7. Summary Analysis

From all use case tests we can generally reach the following conclusions about the testing statistics:

1. Oracle normally performs better than esProc with small concurrent tasks, even several times higher in some cases. But, there’s the exception of joins where esProc’s performance surpasses Oracle’s.

2. For big-data, single-task computation, esProc’s performance is always higher than Oracle’s, several times higher sometimes.

3. Performance degrades significantly both for Oracle and esProc when concurrent number changes from 1 to 2. Oracle still has advantage but the difference between the two is small enough.

4. For small concurrent tasks, esProc shows stable performance with each task completed using approximately the same time, while Oracle demonstrates great variation among different tasks (normally the variation is up to ten times). However, there are exceptions. Oracle has a very small performance variation among different tasks during join operations, and parallel processing won’t improve the performance in such situations.

Causes of the above conclusions could be as follows:

1. esProc is JAVA-coded and interpreted for execution, and thus has a lower efficiency than Oracle,  which is coded in native C language. The latter has a sophisticated multi-tier cache mechanism which enables a much higher performance in in-memory computations. As small concurrent computations can normally be cached, Oracle generally performs better than esProc in such use cases.

2. For big-data, single-task computation, esProc’s performance is always higher than Oracle. This is because a large volume of data cannot be cached, and Oracle’s main method for performance improvement is inapplicable. But in such a situation, esProc has an obvious advantage with its more efficient function-based algorithm.

3. With the concurrent computation with small data volume, overall performance of both degrades significantly with the parallel number changed from 1 to 2, and then to 4. This is a signal of incompetence of the processing power of CPU. Under this circumstance, Oracle still has the advantage, but only slightly. This is because Oracle doesn’t have the time to read the cache, and the cache mechanism almost stops working though the only advantage of native code remains. From this we can see that, basically, the performance-enhancing method for Oracle is effectively leveraging cache.

4. Since Oracle’s cache mechanism and execution plan cannot be managed manually, thus with small concurrent tasks Oracle has a significant performance variation, which sometimes is up to ten times.

5. Small-data joins are an exception. In those cases, esProc’s performance is one or several times higher than Oracle’s and the Oracle performance variation among concurrent tasks is small with parallel processing being invalid. Probably this is caused by the same reason that Oracle’s cache mechanism and execution plan cannot be managed manually, making a join automatically treated as an external-memory computation. The external-memory computation involves competitions for hard disk resources, which leads to the lack of performance improvement with parallel processing. Parallel processing is effective in enhancing esProc’s performance because esProc can load the small dimension tables into the memory, and thus leverage the multi-core CPU for computation. The small variation among concurrent tasks is possibly also because that Oracle automatically treats a join as external-memory computation, which does not require cache, and that resource is equally distributed among concurrent processes. With join computations, Oracle’s advantage of using cache disappears, but the performance enhancement with native code is limited, whereas esProc gains an advantage with efficient function-based algorithm and thus achieves a higher performance.

Final conclusion: Oracle performs better in handling computational tasks with small data or with fewer computations and simple algorithms. esProc performs better in handling big-data tasks, computation-intensive tasks, or those with complicated logics.

Pleast Click here to download the entire document.

FAVOR (1)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website