Application of Index Sequences in esProc

In databases, creating appropriate indexes for tables can greatly increase query efficiency. Similarly, index sequences can be created for record sequences or table sequences in esProc to increase efficiency in querying data repeatedly.

For example, we need to access the food order Order_Foods.txt repeatedly.

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)

Records of food orders queried by A1 are as follows (altogether 50,000 records):

esProc_deep_index_Sequence_1

A2 gets 1,000 random records of food orders with only their product names and purchase quantities. The two fields will be query conditions for use in the later test query (here repetitions are allowed). Data in A2 are as follows:

esProc_deep_index_Sequence_2

In the following, in order to explore the role of index sequences, we’ll query the data of food orders in A1 according to the 1,000 product names in A2 with and without an index sequence respectively.

First let’s look at the situation without an index sequence. Since records in Order_Foods.txt are sorted by Date, i.e. the order dates, binary search cannot be used when searching by product names, otherwise errors will occur.

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)	=now()
3	=A2.(A1.select@1(PName==A2.PName && Quantity ==A2.Quantity))	=interval@ms(B2,now())

Expressions in B2 and B3 get the current time through now() function and roughly estimate query time (millisecond). Query results are stored in B3 as follows:

esProc_deep_index_Sequence_3

Estimated time computed in B3 is as follows:

esProc_deep_index_Sequence_4

Then let’s move to the situation where an index sequence is used:

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)	=now()
3	=A2.(A1.select@1(PName==A2.PName && Quantity ==A2.Quantity))	=interval@ms(B2,now())	=now()
4	=A1.psort(PName, Quantity)
5	=A2.(A1(A4.select@b1(cmp(A1(~).([PName,Quantity]), A2.~.([PName,Quantity])))))	=interval@ms(C3,now())

First create an index sequence corresponding to PName and Quantity, thus binary search can be used to make query by making use of indexes. In order to compare the efficiency of the two situations, time for creating an index sequence is also included. The index sequence created in A4 is as follows:

esProc_deep_index_Sequence_5

As binary search is used in A5 to query data, the query condition should be modified to mode x==0. Results are the same as those in A3:

esProc_deep_index_Sequence_6

Estimated time computed in B5 is as follows:

esProc_deep_index_Sequence_7

By comparing results in B3 and B5, it can be seen that the second method is much more efficient. That is to say, query speed can be significantly increased by using binary search on the condition that an index sequence is created. Note that computations are involved to create index sequences. The more we use an index sequence to query, the more efficient the query becomes. It is not so necessary to create an index sequence if the computation is not query-intensive.

For specific databases and query modes, it is unnecessary to create the index sequence each time query is executed. The index sequence can be stored after it is created. For example:

	A
1	=file(“Order_Foods.txt”).import@t()
2	=A1.psort(PName, Quantity).new(~:Pos)
3	>file(“OF_Index_PQ”).export@b(A2)

Thus there is no need to recreate the index sequence for the next query. Simply import the index file.

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)	=now()
3	=file(“OF_Index_PQ”).import@b()
4	=A2.(A1(A3.select@b1(cmp(A1(A3.Pos).([PName, Quantity]), A2.~.([PName,Quantity]))).Pos))	=interval@ms(B2,now())

In this way, the query speed is faster than creating a new one.

During establishing the foreign key relationship with switch function for a table sequence or a cursor, each record needs to find its corresponding reference in the dimension table. This is similar to the query-intensive situation in the above example. esProc will create the index sequence automatically for the execution of switch function. For example:

	A	B
1	=file(“PersonnelInfo.txt”)	=demo.query(“select STATEID, NAME, ABBR from STATES”)
2	=now()
3	=A1.cursor@t()	=A3.switch(State,B1:ABBR)
4	for 50	=B3.fetch(1000)
5	>B3.close()	=interval@ms(A2,now())
6	=now()
7	=A1.cursor@t()
8	for 50	=A7.fetch(1000)
9		>B8.run(State=B1.select@1(ABBR==State))
10	>A7.close()	=interval@ms(A6,now())

From the 2^nd to the 5^th line, switch function is used to relate personnel information stored in the cursor to corresponding states information. 1,000 rows are fetched each of the 50 times by loop. B5 computes the time consumed to perform the processing. From the 6^th to the 10^th line, run function is used to query the corresponding states data by loop, and similarly, 1,000 rows are fetched each of the 50 times of loop. B10 computes the time consumed for this processing. The results of B5 and B10 are shown below:

esProc_deep_index_Sequence_8

In this case, the processing efficiency is apparently higher with switch function, despite that there is only a small amount of states data. Actually the bigger the data in the dimension table, the more efficient the processing will be by using switch function.

Hi，You need to fill in the Username and Email！