1.Testing purposes
Testing esProc and Oracle on the same hardware for single machine performance, to compare the two for performance difference either in large data volume single task computation or small volume multiple tasks concurrent computation use cases.
2.Testing contents and methods
Data volume:
Small data volume: single fact table around 10 G. To avoid the testing results being affected by operating system cache, multiple concurrent request will be accessing different fact tables.
Large data volume: single data table is about 100 G.
Algorithm category:
Testing the performance of several typical SQL algorithms, including data scan, grouping, join and large grouping, etc.. Note that the use of these simple algorithms is for better understanding and direct comparison of the performance, not because esProc and SQL are identical to each other. In fact, the two of them focus on different things. esProc is good at procedure computation with more complicated business logic, while SQL is for computation of average complexity. Complicated algorithm can be realized with different SQL execution plan, which is out of manual control, and thus not good for comparison. We’ll not do such test.
Category of Use Cases:
Each algorithm are tested with multiple use cases according to the width of the table, data types, etc..
Among them, the purpose for large grouping use cases is to test the scenario when the resulting data sets of the grouping is too large to be fit in the memory. So this report will only test the situation of parallel computing with 100 G large data volume, rather than concurrent computing with small data volume.
Note: the article esProc Oracle Single Machine Performance Comparison Process is an appendix of this report. Please refer to it for details such as data structures, test code, test reproduction, etc..
3.Testing Environment
Testing Machine:Dell Power Edge T610
CPU:Intel Xeon E5620*2
RAM:20G
HDD:Raid5 1T
Operating System:CentOS 6.4
JDK:1.6
Oracle Version:11g
esProc Version:3.1
4.Data Description
The volume of the data is decided by the amount allowed by exported text file.
During the single machine test, esProc uses proprietary file format.
As the purpose is mainly to test big data computation and whole table scan performance, no primary key or index is build for any table in the database. For purpose of big data computation test, the primary key field described in data structure simply means that this field is the logic primary key. No data repetition is allowed. Primary key is not physically built.
4.1.Data tables and the associated tables
Facts TableT1, T11, T12, T13
Wide table T1, T11, T12, T13, is to simulate the fact table with large numbers of data fields. Total number of designed fields is 100. The four tables have identical structures. T11, T12, T13 are used for small data volume multi-task concurrent access to different tables to avoid system cache.
Fact Table T2, T21, T22, T23
Narrow tables T2, T21, T22, T23 are used to simulate fact tables with less fields. Total number of designed fields is 11. The four tables have identical structures. T21, T22, T23 are used for small data volume multi-task concurrent access to different tables to avoid system cache.
Fact tables are the main data source for this test, and are used in scan, group, join computation. Tests are done for both large and small data volumes, with are controlled by inserting different rows of records.
Dimensional table DL2, DL6, DD2, DD6, DC2, DC6
Dimension table is only used to test the join (and multiple joins) use cases. These dimension tables and fact tables will be joined, so will the dimension tables. These tables have fixed data volume.
4.2.Data volume
Numbers in the table stands is the number of record rows, not the number of occupied spaces in bytes.
Note: the goal for large data volume single machine test is to test the performance when the memory used by the data is well above physical memory. Our test standard is the size of a single fact table to be approximately 5 times the size of physical memory. The memory used by dimension table is far below the physical memory. Actual number of record rows used could be adjusted according to the configuration of the machine.
The goal for small task concurrent single machine test is to show the performance when memory occupied by the data is less than the physical memory, and the computation could be done completely in the memory. Our test standard is to use a single fact table of 50% the size of the physical memory. Memory used by dimension table is far less than the physical memory. Actual number of record rows used could be adjusted according to the configuration of the machine. T11/T12/T13 and T21/T22/T23 are only used for small data volume multi-task concurrent test.
To make sure that esProc and Oracle is computing against exactly the same data, we will export the data generated in Oracle to a binary file format defined by esProc, as the data source for esProc during the test.
Related Topics:
esProc Oracle Single Machine Performance Comparison Test Report(II)
esProc Oracle Single Machine Performance Comparison Test Report(III)
esProc Oracle Single Machine Performance Comparison Test Report(IV)