Oracle & esProc Performance Comparison Testing on Single Machine

1. Testing purposes

Test esProc and Oracle on the same hardware to compare their single machine performances in carrying out a single big task and small concurrent tasks.

2. Testing contents and methods

Data volume:

Small data volume: Single fact table around 10G. To prevent the testing result from being affected by the operating system cache, different fact tables are accessed with multiple concurrent requests.

Large data volume: Single fact table around 100G.

Algorithms:

Several typical SQL algorithms, including data scan, join, data grouping with small result set and with large result set. The use of these simple algorithms is to make the comparison more intuitively and easier to understand, not because esProc and SQL are identical to each other. In fact, the two of them focus on different things. esProc is good at procedure computation for tasks with complicated business logic, while SQL is suitable for computations of average complexity. A complicated algorithm that can be realized with different SQL execution plans makes execution paths uncontrollable and lacks comparability. So the testing doesn’t include one.

Use Cases:

Multiple use cases are designed for testing each algorithm according to table widths and data types.

Among them, the purpose of the use cases of data grouping with large result set is to test the performances of Oracle and esProc in handling the scenarios where the result sets of the group operations are too large to be entirely accommodated by the memory. So this report tests the parallel processing of a task with an amount of 100G data, not the concurrent processing of small tasks.

Note: An appendix of esProc Oracle Single Machine Performance Comparison Process is added to this report. Please refer to it for details such as data structures, test code, and test reproduction.

3. Testing Environment

Testing Machine：Dell Power Edge T610

CPU：Intel Xeon E5620*2

RAM：20G

HDD：Raid5 1T

Operating System：CentOS 6.4

JDK：1.6

Oracle Version：11g

esProc Version:3.1

4. Data Description

The volume of the data is decided by the amount allowed by the exported text file.

For the single machine test, esProc uses the proprietary file format.

As the purpose is mainly to test the performance of big data computation and full table scan performance, no primary key or index is built for any table in the database. The primary key field described in data structure isn’t a physical key but a logical one defining that data repetition isn’t allowed.

4.1Data tables and table association

Fact Tables T1, T11, T12, and T13

They are wide data tables that each has 100 fields to simulate the real-world fact tables with a large number of fields. The four tables have identical structures. T11, T12, T13 are used for small concurrent accesses to different tables to avoid system cache.

Fact Tables T2, T21, T22, and T23

These are narrow data tables having identical structures and 11 fields for simulating the real-world facts tables with a small number of fields. T21, T22, T23 are used for small concurrent accesses to different tables to avoid system cache.

Fact tables are the main data source for this test, and are used in data scan, data grouping, and join operations. Tests are done for both large and small data volume, depending on the numbers of records inserted to the tables.

Dimension tables DL2, DL6, DD2, DD6, DC2, and DC6

The dimension tables are only used in the use cases of the join (and multi-join) operations where dimension tables and fact tables are joined or the dimension tables are joined between themselves. These tables have fixed data volume.

4.2. Data volume

The following statistics are the numbers of records in the testing tables, not the numbers of bytes that they occupy.

Table	Big single task	Small concurrent tasks
T1	150M	15M
T11/T12/T13	—	15M
T2	1180M	118M
T21/T22/T23	—	118M
DL2	100	100
DL6	1M	1M
DD2	100	100
DD6	1M	1M
DC2	100	100
DC6	1M	1M

Note: The big-task test on a single machine is intended to measure the performance when data requires an amount of memory that is far larger than the physical memory. The standard used by the testing is that a fact table is approximately 5 times larger than the physical memory and a dimension table is far smaller than the physical memory. The actual number of records in a table can be adjusted according to the machine configuration.

The test of small concurrent tasks on a single machine is intended to measure the performance when the amount of memory occupied by data is less than the physical memory, and the computation could be done completely in the memory. The standard used by the testing is that a fact table is approximately 50% of the size of the physical memory and a dimension table is far smaller than the physical memory. The actual number of records in a table can be adjusted according to the machine configuration. T11/T12/T13 and T21/T22/T23 are only used for the test of the concurrent computation of small tasks.

To make sure that esProc and Oracle are computing exactly the same data, the data generated in Oracle is exported to a file with the esProc-defined binary format, which is esProc’s data source for the test.

5. Use Cases Description

For better understanding, all the logics in the test are described in SQL.

During the test Oracle executes SQL statements directly, whereas functionally equivalent esProc programs are written to complete the same computation.

5.1. Use Case for big-data scan

This use case is about big-task single machine test. It’s designed to test the computational performance of full table scan for a large volume of data. Performances for full table counting, integer sum, float sum, value sum, integer filter, number filter, character filter, date filter are considered respectively.

Category	Use Case Number	SQL Description
Full Table Counting	101	select count(*) from T1
	102	select count(*) from T2
Integer sum	103	select sum(TL11) from T1
	104	select sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1
	105	select sum(TL2) from T2
	106	select sum(L11),sum(FL6),sum(to_number(FD6)),sum(TL1),sum(TL2), sum(TL1*TL2) from T2
Float Sum	107	select sum(TN21) from T1
	108	select sum(TN21), sum(TN22), sum(TN23), sum(TN24), sum(TN25), sum(TN26), sum(TN27), sum(TN28), sum(TN29) from T1
	109	select sum(TN3) from T2
	110	select sum(TN30.1), sum(TN30.5), sum(TN3), sum(TN31.5), sum(TN32), sum(TN310), sum(TN3100), sum(TN31000), sum(TN310000), sum(TN3*100000) fromT2
Value Sum	111	select sum(TN11) from T1
	112	select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1
	113	select sum(TN2) from T2
	114	select sum(TN1), sum(TN2), sum(TL1TN1), sum(TL1TN2), sum(TL2TN1), sum(TL2TN2) from T2
Integer Filter	115	select count(*) from T1 where FL6 between 100000 and 599999
	116	select sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 where FL6 between 100000 and 599999
	117	select count(*) from T2 where FL6 between 100000 and 599999
	118	select sum(L11),sum(FL6),sum(to_number(FD6)),sum(TL1),sum(TL2), sum(TL1*TL2) from T2 where FL6 between 100000 and 599999
Number Filter	119	select count(*) from T1 where FD6 between ‘100000’ and ‘599999’
	120	select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where FD6 between ‘100000’ and‘599999’
	121	select count(*) from T2 where FD6 between ‘100000’ and ‘599999’
	122	select sum(TN1), sum(TN2), sum(TL1TN1), sum(TL1TN2), sum(TL2TN1), sum(TL2TN2) from T2 where FD6 between ‘100000’ and ‘599999’
Character Filter	123	select count(*) from T1 where FC6 between ‘A00000’ and ‘E99999’
	124	select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where FC6 between ‘A00000’ and‘E99999’
	125	select count(*) from T2 where FC6 between ‘A00000’ and ‘E99999’
	126	select sum(TN1), sum(TN2), sum(TL1TN1), sum(TL1TN2), sum(TL2TN1), sum(TL2TN2) from T2 where FC6 between ‘A00000’ and ‘E99999’
Date Filter	127	select count(*) from T1 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date(‘1980-01-01’,’yyyy-mm-dd’) + 5000
	128	select sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date (‘1980-01-01’,’yyyy-mm-dd’) + 5000
	129	select count(*) from T2 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date(‘1980-01-01’,’yyyy-mm-dd’) + 5000
	130	select sum(TN1), sum(TN2), sum(TL1TN1), sum(TL1TN2), sum(TL2TN1), sum(TL2TN2) from T2 where R4 between to_date(‘1980-01-01’, ‘yyyy-mm-dd’) and to_date (‘1980-01-01’, ’yyyy-mm-dd’) + 5000

5.2. Use Case for big-data grouping with small result set

This use case is about big-task single machine test. It’s designed to test the performance of grouping computation involving large data volume, where the number of records in each group is small, smaller than the physical memory. Performances for integer grouping, number grouping, character grouping and date grouping are considered respectively.

Category	Use Case Number	SQL Description
Integer Grouping	201	select FL6,count(*) from T1 group by FL6
	202	select FL6, sum(TL11),sum(TL12),sum(TL13),sum(TL14),sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 group by FL6
	203	select FL6,count(*) from T2 group by FL6
	204	select FL6, sum(L11),sum(FL6),sum(to_number(FD6)), sum(TL1), sum(TL2), sum(TL1*TL2) from T2 group by FL6
Number Grouping	205	select FD6, count(*) from T1 group by FD6
	206	select FD6, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by FD6
	207	select FD6, count(*) from T2 group by FD6
	208	select FD6, sum(TN1), sum(TN2), sum(TL1TN1), sum(TL1TN2), sum(TL2TN1), sum(TL2TN2) from T2 group by FD6
Character Grouping	209	select FC6, count(*) from T1 group by FC6
	210	select FC6, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by FC6
	211	select FC6, count(*) from T2 group by FC6
	212	select FC6, sum(TN1), sum(TN2), sum(TL1TN1), sum(TL1TN2), sum(TL2TN1), sum(TL2TN2) from T2 group by FC6
Date Grouping	213	select R2,R4, count(*) from T1 group by R2,R4
	214	select R2,R4, sum(TN11), sum(TN12), sum(TN13), sum(TN14), sum(TN15), sum(TN16), sum(TN17), sum(TN18), sum(TN19) from T1 group by R2,R4
	215	select R2,R4,count(*) from T2 group by R2,R4
	216	select R2,R4, sum(TN1), sum(TN2), sum(TL1TN1), sum(TL1TN2), sum(TL2TN1), sum(TL2TN2) from T2 group by R2,R4

5.3. Use Case for big-data grouping with large result set

This use case is about big-task single machine test. It’s designed to test the performance of grouping computation involving large data volume, where the number of records in each group is large, larger than the physical memory, and thus the computation cannot be entirely completed within the memory.

Category	Use Case Number	SQL Description
Integer Grouping	301	select mod(FL11,10000000), count(*) from T1 group by mod(FL11,10000000)
	302	select mod(FL11,10000000), sum(TL11),sum(TL12),sum(TL13), sum(TL14), sum(TL15), sum(TL16), sum(TL17), sum(TL18), sum(TL19) from T1 group by mod(FL11,10000000)

5.4. Use Case for big-data joins

This use case is about big-task single machine test. It’s designed to test the performance of join computations involving large data volume. Performances for integer, number, character joins, as well as single-level and multi-level joins, are considered respectively.

Category	Use Case Number	SQL Description
Integer Join	401	select count(*) from T1 join DL6 on T1.FL6=DL6.P
	402	select sum(T1.TL11),sum(T1.TL12), sum(T1.TL13), sum(T1.TL14), sum(T1.TL15), sum(T1.TL16), sum(T1.TL17), sum(T1.TL18), sum(T1.TL19), sum(DL6.L2) from T1 join DL6 on T1.FL6=DL6.P
	403	select count(*) from T2 join DL6 on T2.FL6=DL6.P
	404	select sum(T2.L11), sum(T2.FL6), sum(to_number(T2.FD6)), sum(T2.TL1),sum(T2.TL2), sum(T2.TL1*T2.TL2), sum(DL6.L2) from T2 join DL6 on T2.FL6=DL6.P
Integer Multi-level Join	405	select count(*) from T1 join DL6 on T1.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
	406	select sum(T1.TL11),sum(T1.TL12), sum(T1.TL13), sum(T1.TL14), sum(T1.TL15), sum(T1.TL16), sum(T1.TL17), sum(T1.TL18), sum(T1.TL19), sum(DL6.L2), sum (DL2.L2) from T1 join DL6 on T1.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
	407	select count(*) from T2 join DL6 on T2.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
	408	select sum(T2.L11), sum(T2.FL6), sum(to_number(T2.FD6)), sum(T2.TL1),sum(T2.TL2), sum(T2.TL1*T2.TL2), sum(DL6.L2), sum(DL2.L2) from T2 join DL6 on T2.FL6=DL6.P join DL2 on DL6.FL2=DL2.P
Number Join	409	select count(*) from T1 join DD6 on T1.FD6=DD6.P
	410	select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DD6.N5) from T1 join DD6 on T1.FD6=DD6.P
	411	select sum(T2.TL2*DD6.L2) from T2 join DD6 on T1.FD6=DD6.P
	412	select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1T2.TN1), sum(T2.TL1T2.TN2), sum(T2.TL2T2.TN1), sum(T2.TL2T2.TN2), sum(DD6.N5) from T2 join DL6 on T2.FL6=DL6.P
Number Multi-level Join	413	select count(*) from T1 join DD6 on T1.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
	414	select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DD6.N5), sum (DD2.N5) from T1 join DD6 on T1.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
	415	select count(*) from T2 join DD6 on T2.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
	416	select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1T2.TN1), sum(T2.TL1T2.TN2), sum(T2.TL2T2.TN1), sum(T2.TL2T2.TN2), sum(DD6.N5), sum(DD2.N5) from T2 join DD6 on T2.FD6=DD6.P join DD2 on DD6.FD2=DD2.P
Character Join	417	select count(*) from T1 join DC6 on T1.FC6=DC6.P
	418	select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DC6.N5) from T1 join DC6 on T1.FC6=DC6.P
	419	select count(*) from T2 join DC6 on T2.FC6=DC6.P
	420	select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1T2.TN1), sum(T2.TL1T2.TN2), sum(T2.TL2T2.TN1), sum(T2.TL2T2.TN2), sum(DC6.N5) from T2 join DC6 on T2.FC6=DC6.P
Character Multi-level Join	421	select count(*) from T1 join DC6 on T1.FC6=DC6.P join DC2 on DC6.FC2=DC2.P
	422	select sum(T1.N11), sum(T1.TN12), sum(T1.TN13), sum(T1.TN14), sum(T1.TN15), sum(T1.TN16), sum(T1.TN17), sum(T1.TN18), sum(T1.TN19), sum(DC6.N5), sum (DC2.N5) from T1 join DC6 on T1.FC6=DC6.P join DC2 on DC6.FC2=DC2.P
	423	select count(*) from T2 join DC6 on T2.FC6=DC6.P join DC2 on DC6.FC2=DC2.P
	424	select sum(T2.TN1), sum(T2.TN2), sum(T2.TL1T2.TN1), sum(T2.TL1T2.TN2), sum(T2.TL2T2.TN1), sum(T2.TL2T2.TN2), sum(DC6.N5), sum(DC2.N5) from T2 join DC6 on T2.FC6=DC6.P join DC2 on DC6.FC2=DC2.P

5.5. Use cases for performing small concurrent tasks

These use cases test the performance of carrying out small concurrent tasks that access various physical tables or files to avoid OS cache. Tests are done for three types operations – data scanning, data grouping and joins – using the above use cases 5.1, 5.2 and 5.4 where the data volume is set as small (See the section of Data Volume).

Scenarios of single task, dual-task concurrency, and four-task concurrency are tested respectively.

6. Use Cases for Testing

6.1. Small concurrent scans

This use case tests Oracle and esProc for scanning performance against small tables (files). The test is done with a multi-task concurrent access mode. Each task is accessing a different table (file). Oracle executes 16-threads parallel processing, while esProc executes 4 tasks in parallel, because it is proved through the test that those are the very number of parallel tasks for highest performance.

Testing statistics:

Note: In the following table, the time unit is second. 2 Concurrencies indicates 2 SQL statements are executed simultaneously; and the same with 4 concurrencies.

Statistics interpretation:

1. Data with blue background is the peak values obtained by this use case. Both Oracle and esProc reach peak performance with 1 concurrency, but Oracle’s performance is several times higher than esProc.

2. Performance degrades significantly both for Oracle and esProc when the number of concurrencies increases to 2. Oracle still has advantage but much of the gap has been closed.

3. esProc handles each task with equal performance, while Oracle is extremely unstable.

6.2. Big-data scan

This use case tests Oracle and esProc for their performance in scanning big tables (files). Scenarios of 1, 2 and 4 parallel tasks are tested respectively.

Testing statistics:

Category	Use Case Number	Oracle		esProc
Category	Use Case Number	1 Parallel	2 Parallels	1 Parallel	2 Parallels	4 Parallels
Full Table Counting	101	280	329	243	235	272
	102	311	383	271	214	246
Integer sum	103	288	328	243	241	268
	104	324	385	289	210	197
	105	375	412	298	220	246
	106	928	1011	725	416	210
Float Sum	107	321	350	247	247	267
	108	389	424	297	206	200
	109	399	436	303	221	248
	110	1623	1816	885	492	321
Value Sum	111	313	347	260	248	270
	112	423	450	483	251	197
	113	428	463	457	243	248
	114	1332	1420	2148	1188	549
Integer Filter	115	277	322	256	251	266
	116	318	345	302	209	199
	117	335	364	376	231	245
	118	676	709	675	362	188
Number Filter	119	267	331	262	234	268
	120	350	377	450	260	201
	121	338	367	433	253	246
	122	830	909	1540	770	390
Character Filter	123	313	322	264	236	268
	124	361	388	486	236	201
	125	339	370	422	249	246
	126	887	973	1642	840	419
Date Filter	127	274	320	265	230	267
	128	273	321	363	209	219
	129	346	376	1413	711	378
	130	801	912	2482	1322	647

Statistics interpretation:

1. Oracle is at peak performance with 1 thread, but performance starts to degrade with 2 parallels. However, esProc’s performance with 4 parallel tasks is generally higher than with 1 task, sometimes even several times higher, excepting for a few instances.

2. esProc demonstrates an overall advantage over Oracle in this use case test. This is particularly obvious with narrow tables and computation-intensive cases, such as 106, 110 and 114. Use cases 129 and 119 are exceptions, however, where Oracle performs slightly better than esProc.

3. esProc is observed to be capable of increasing the computational performance several times with the increase of parallel number, as in the use cases 106, 110, 114, 118, 122, 126 and 130. These are all about narrow table access, and with more computations.

4. In some cases esProc’s performance also degrade, as in the use cases 103, 107, 111, 115, 119, 119, 123 and 127. These are all about wide table access with fewer computations.

6.3. Small concurrent grouping tasks

This use case tests Oracle and esProc for grouping computation performance against small tables (files). The test is done in a multi-task concurrent access mode. Each task accesses different tables (files).

Testing statistics:

Note: According to the data volume, esProc can choose to do pure in-memory computation or mixed-mode-of-in-memory-and-external-memory computation. This use case uses in-memory computation, while the grouping use case with large data volume in later part of this report is handled with a mixed mode. Pure in-memory computation has the risk of memory overflow. Thus use case uses 1, 2 and 4 concurrencies to avoid that. Data in green is for 1 concurrency, data in red for 2 concurrencies and data in black for 4 concurrencies. Green character reflects parallel 1, red character is for parallel 2, and black for parallel 4. Oracle forces a mixed mode computation to avoid memory overflow. Each task is performed with a fixed number of 16 parallel threads to achieve the best performance.

Statistics interpretation:

1. Oracle performs better than esProc in general, especially with 1 concurrency when the performance of Oracle can be several times better.

2. Oracle is extremely unstable in the performances of concurrent tasks, usually several times faster or slower between one and another. esProc’s performance, however, is generally stable.

3. All date grouping operations are two-level grouping, which, as we can see, Oracle performs better.

6.4. Big-data grouping

This use case tests Oracle and esProc for grouping computation performance against large tables (files). The test is done in a non-concurrent mode using parallel threads of 1, 2 and 4.

Testing statistics:

Category	Use Case Number	Oracle		esProc
Category	Use Case Number	1 Parallel	2 Parallels	1 Parallel	2 Parallels	4 Parallels
Integer Filter Grouping	201	365	399	401	252	285
	202	769	673	542	302	255
	203	930	1042	1018	567	342
	204	2660	1712	1867	1115	607
Number Grouping	205	368	408	387	252	286
	206	1028	1247	1124	690	486
	207	930	1041	1340	759	371
	208	4058	2781	4352	2459	1493
Character Grouping	209	365	409	374	258	293
	210	1012	1184	1017	651	488
	211	906	1032	1312	710	391
	212	3579	2427	4409	2488	1474
Date Grouping	213	304	332	278	259	271
	214	454	478	509	273	203
	215	1042	1164	1318	756	426
	216	4730	3224	4910	2542	1481

Statistics interpretation:

1. Oracle is at peak performance with 1 parallel, and performance starts to decrease with 2. However, esProc’s performance with 4 parallel tasks is normally higher than with 1 task, even several times higher in some cases.

2. esProc’s performance is several times higher than Oracle in this use case test.

6.5. Small concurrent joins

This use case tests Oracle and esProc for performance of joining small (files). The test is done in a multi-task concurrent access mode. Each task accesses different tables (files). Parallel processing won’t be used for Oracle because no performance improvement can be achieved with it. esProc uses 4-thread parallel processing because more threads won’t increase the performance.

Testing statistics:

Statistics interpretation:

1. esProc is one time or several times faster than Oracle.

2. esProc’s performance variation among concurrent tasks is very small. Oracle’s performance variation remains big, but less than the variations in other use cases.

3. Parallel mode yields no performance gain for Oracle, while esProc reaches peak performance with 4 parallel tasks.

6.6 Big-data joins

This use case tests Oracle and esProc for their performance in joining large tables (files) . The test is done in a single-task, non-concurrent mode. Oracle reaches peak performance with 1 parallel task, while esProc reaches its own with 4 parallel tasks.

Testing statistics:

Category	Use Case Number	Oracle		esProc
Category	Use Case Number	1 Parallel	2 Parallels	2 Parallels	4 Parallels
Integer Join	401	347	383	248	271
	402	446	469	210	204
	403	837	910	505	271
	404	1555	1709	810	402
Integer Multi-level Join	405	352	377	247	270
	406	453	507	213	205
	407	829	945	504	257
	408	1587	1810	847	412
Number Join	409	353	384	245	275
	410	509	549	324	199
	411	1077	1152	840	451
	412	1767	1926	1704/1792	903/889
Number Multi-level Join	413	354	385	243	275
	414	521	557	331	198
	415	863	946	827	450
	416	1830	2024	1825	943
Character Join	417	354	385	246	268
	418	509	554	322	198
	419	868	944	513	278
	420	1757	1932	1689	909
Character Multi-level Join	421	356	388	251	277
	422	523	549	320	201
	423	863	957	513	285
	424	1827	2027	1776	905

Statistics interpretation:

1. esProc performs better than Oracle.

2. With multi-level join, performance degradation for both is little.

6.7. Big-data grouping with big result set

This use case tests Oracle and esProc for grouping computation performance against large tables (files), with the grouping results are too big to be all stored in memory. The test is done in a single-task non-concurrent mode using 1, 2 and 4 parallel threads respectively.

Testing statistics:

Category	Use Case Number	Oracle			esProc
Category	Use Case Number	1 Parallel	2 Parallels	4 Parallels	1 Parallel	2 Parallels	4 Parallels
Integer Filter Grouping	301	585	570	667	649	419	385
	302	1059	1010	1098	1067	706	656

Statistics interpretation:

1. esProc performs better than Oracle.

2. Oracle is at peak performance with 2 parallel tasks. esProc is at peak performance with 4 parallel tasks.

7. Summary Analysis

From all use case tests we can generally reach the following conclusions about the testing statistics:

1. Oracle normally performs better than esProc with small concurrent tasks, even several times higher in some cases. But, there’s the exception of joins where esProc’s performance surpasses Oracle’s.

2. For big-data, single-task computation, esProc’s performance is always higher than Oracle’s, several times higher sometimes.

3. Performance degrades significantly both for Oracle and esProc when concurrent number changes from 1 to 2. Oracle still has advantage but the difference between the two is small enough.

4. For small concurrent tasks, esProc shows stable performance with each task completed using approximately the same time, while Oracle demonstrates great variation among different tasks (normally the variation is up to ten times). However, there are exceptions. Oracle has a very small performance variation among different tasks during join operations, and parallel processing won’t improve the performance in such situations.

Causes of the above conclusions could be as follows:

1. esProc is JAVA-coded and interpreted for execution, and thus has a lower efficiency than Oracle, which is coded in native C language. The latter has a sophisticated multi-tier cache mechanism which enables a much higher performance in in-memory computations. As small concurrent computations can normally be cached, Oracle generally performs better than esProc in such use cases.

2. For big-data, single-task computation, esProc’s performance is always higher than Oracle. This is because a large volume of data cannot be cached, and Oracle’s main method for performance improvement is inapplicable. But in such a situation, esProc has an obvious advantage with its more efficient function-based algorithm.

3. With the concurrent computation with small data volume, overall performance of both degrades significantly with the parallel number changed from 1 to 2, and then to 4. This is a signal of incompetence of the processing power of CPU. Under this circumstance, Oracle still has the advantage, but only slightly. This is because Oracle doesn’t have the time to read the cache, and the cache mechanism almost stops working though the only advantage of native code remains. From this we can see that, basically, the performance-enhancing method for Oracle is effectively leveraging cache.

4. Since Oracle’s cache mechanism and execution plan cannot be managed manually, thus with small concurrent tasks Oracle has a significant performance variation, which sometimes is up to ten times.

5. Small-data joins are an exception. In those cases, esProc’s performance is one or several times higher than Oracle’s and the Oracle performance variation among concurrent tasks is small with parallel processing being invalid. Probably this is caused by the same reason that Oracle’s cache mechanism and execution plan cannot be managed manually, making a join automatically treated as an external-memory computation. The external-memory computation involves competitions for hard disk resources, which leads to the lack of performance improvement with parallel processing. Parallel processing is effective in enhancing esProc’s performance because esProc can load the small dimension tables into the memory, and thus leverage the multi-core CPU for computation. The small variation among concurrent tasks is possibly also because that Oracle automatically treats a join as external-memory computation, which does not require cache, and that resource is equally distributed among concurrent processes. With join computations, Oracle’s advantage of using cache disappears, but the performance enhancement with native code is limited, whereas esProc gains an advantage with efficient function-based algorithm and thus achieves a higher performance.

Final conclusion: Oracle performs better in handling computational tasks with small data or with fewer computations and simple algorithms. esProc performs better in handling big-data tasks, computation-intensive tasks, or those with complicated logics.

Pleast Click here to download the entire document.

1. Testing purposes

2. Testing contents and methods

3. Testing Environment

4. Data Description

4.1Data tables and table association

4.2. Data volume

5. Use Cases Description

5.1. Use Case for big-data scan

5.2. Use Case for big-data grouping with small result set

5.3. Use Case for big-data grouping with large result set

5.4. Use Case for big-data joins

5.5. Use cases for performing small concurrent tasks

6. Use Cases for Testing

6.1. Small concurrent scans

6.2. Big-data scan

6.3. Small concurrent grouping tasks

6.4. Big-data grouping

6.5. Small concurrent joins

6.6 Big-data joins

6.7. Big-data grouping with big result set

7. Summary Analysis

Hi，You need to fill in the Username and Email！