Index Sequences in esProc

With databases, creating appropriate indexes for tables can greatly increase query efficiency. Similarly, you can create index sequences for record sequences or table sequences in esProc to enhance the efficiency of querying data repeatedly.

For example, you need to query the food order file Order_Foods.txt repeatedly.

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)

The following are records of food orders imported by A1 (altogether 50,000 records):

esProc_index_sequence_1

A2 gets 1,000 random records of food orders to list only the product names and purchase quantities for being used as the query conditions in the later test query (here duplicate values are allowed). A2’s data are as follows:

esProc_index_sequence_2

Then in order to explore the role of index sequences in speeding up a query, we’ll query the food order table in A1 according to the 1,000 product names in A2 with and without an index sequence respectively. We specify that only the first-found record will be returned.

First let’s look at the situation without an index sequence. Since records in Order_Foods.txt are ordered by Date, binary search cannot be used when performing the query by product names, otherwise errors will occur.

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)	=now()
3	=A2.(A1.select@1(PName==A2.PName && Quantity ==A2.Quantity))	=interval@ms(B2,now())

Expressions in B2 and B3 get the current time through now() function and estimate the query time (millisecond). B3 stores the query results as follows:

esProc_index_sequence_3

The estimated query time in B3 is as follows:

esProc_index_sequence_4

Then let’s move to the situation where an index sequence is used:

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)	=now()
3	=A2.(A1.select@1(PName==A2.PName && Quantity ==A2.Quantity))	=interval@ms(B2,now())	=now()
4	=A1.psort(PName, Quantity)
5	=A2.(A1(A4.select@b1(cmp(A1(~).([PName,Quantity]), A2.~.([PName,Quantity])))))	=interval@ms(C3,now())

For the query, A4 creates an index sequence corresponding to both PName and Quantity, thus binary search can be used to make query based on the index. For the sake of comparing the all-around performances of the two situations, the time taken to create an index sequence is included. The index sequence created by A4 is as follows:

esProc_index_sequence_5

As A5 uses the binary search to query data, the query condition should be modified to the mode x==0. A5 gets the same results as A3:

esProc_index_sequence_6

The estimated query time in B5 is as follows:

esProc_index_sequence_7

By comparing results in B3 and B5, you see that the second method is much more efficient. That is to say, query speed can be significantly increased by using binary search based on an index sequence. Of course creating an index sequence is also one of the computing steps. The more you use an index sequence to query data, the more efficient the query becomes. It is not so necessary to create an index sequence if the computation is not query-intensive.

For specific databases and query modes, it is unnecessary to create the index sequence each time query is executed. You can store the index sequence after it is created. For example:

	A
1	=file(“Order_Foods.txt”).import@t()
2	=A1.psort(PName, Quantity).new(~:Pos)
3	>file(“OF_Index_PQ”).export@b(A2)

This way you just need to import the index file directly without having to re-create the index sequence for the next query.

	A	B	C
1	=file(“Order_Foods.txt”).import@t()	50000	1000
2	=A1(C1.(rand(B1)+1)).new(PName, Quantity)	=now()
3	=file(“OF_Index_PQ”).import@b()
4	=A2.(A1(A3.select@b1(cmp(A1(A3.Pos).([PName, Quantity]), A2.~.([PName,Quantity]))).Pos))	=interval@ms(B2,now())

By doing so, the query speed is faster than creating a new one.

When establishing a foreign key relationship with switch function for a table sequence or a cursor, each record needs to find its corresponding reference in the dimension table. This is similar to the query-intensive situation in the above example. esProc will create the index sequence automatically for the execution of switch function. For example:

	A	B
1	=file(“PersonnelInfo.txt”)	=demo.query(“select STATEID, NAME, ABBR from STATES”)
2	=now()
3	=A1.cursor@t()	=A3.switch(State,B1:ABBR)
4	for 50	=B3.fetch(1000)
5	>B3.close()	=interval@ms(A2,now())
6	=now()
7	=A1.cursor@t()
8	for 50	=A7.fetch(1000)
9		>B8.run(State=B1.select@1(ABBR==State))
10	>A7.close()	=interval@ms(A6,now())

From the 2^nd to the 5^th line, switch function relates the personnel information stored in the cursor to corresponding states information. The program cycles the association result 50 times, with 1,000 rows fetched each time. B5 computes the time taken to perform the processing. From the 6^th to the 10^th line, the run function cycles down through records of the table sequence 50 times to query the corresponding states data, and similarly, 1,000 rows are fetched each time for processing. B10 computes the time taken to perform this processing. The results of B5 and B10 are as follows:

esProc_index_sequence_8

In the example, the processing efficiency is apparently higher with switch function, despite that there is only a small amount of states data. Actually the bigger the data in the dimension table, the more efficient the processing will be by using switch function.

Similar to creating the index sequence, an index file can be created to increase efficiency when retrieving data from a large binary file. With the f.index(fi,k_i,…) function, you can first create an index file fi according to the primary field k_i,…, which is similar to storing an index sequence for later use. After that you can use the index file to retrieve data from the file as a cursor with f.icursor(fi,x;F_i,…) function. A much higher efficiency will be achieved in this way.

Hi，You need to fill in the Username and Email！