An Example of esProc Set Operations

There are a lot of occasions when set operations are needed to do structured data summarizing and analysis. For example, to list all students who have published papers, find the employees who have participated in all trainings and select students who passed the examination for the re-examination. esProc uses sets everywhere. Its most commonly used data types, such as sequences and table sequences, are all sets. Therefore, a better understanding and manipulation of sets helps to perform data computing in a more reasonable and faster way.

For example, the table below contains sales data:

esProc_inmemory_set_operation_1

We are to find the clients whose monthly sales amount is always in the top 20 in 2013. To do this we need to first retrieve the 2013 sales data, group it by the month, then select the top 20 clients of each month by loops and finally get the result by performing intersection. It is too difficult to handle such a complex problem in either SQL or the stored procedure.

esProc handles a complex problem by splitting it into different steps, and gets the final result step by step. First, from the sales data we retrieve that of 2013 and group it by the month:

	A
1	=file(“e:/SalesData.txt”).import@t()
2	=A1.select(year(Date)==2013)
3	=A2.group(month(Date))

esProc performs veritable data grouping by dividing data into multiple sets as required. This is different from SQL “group by” command, which returns only the aggregate results. The grouping result of A3 is as follows:

esProc_inmemory_set_operation_3

Data has been automatically sorted before it is grouped. Each group is a set of sales records. Below are the records of March:

esProc_inmemory_set_operation_4

To know the total sales amount of each client in each month, we need to group the data by clients. In esProc, we just loop through data of every month and group it by customers. We can use A.(x) to loop through set members without the need to write loop code.

4	=A3.(~.group(Client))

After the second grouping, in A4 every month’s data is a set of sets：

esProc_inmemory_set_operation_6

Now, the data of March is as follows:

esProc_inmemory_set_operation_7

We can see that each subgroup in March’s data is a set of transaction records of a certain client.

The set used in esProc is different from that in mathematical concept. esProc sets are ordered and therefore, based on them, sorting, selection by positions and other statistical operations can be performed. Then let’s find the top 20 clients of each month:

5	=A4.(~.top(-sum(Amount);20))
6	=A5.(~.new(Client,~.sum(Amount):MonthAmount))

A5 loops through each month’s data to get records of the top 20 clients of each month. A6 lists their names and monthly sales amount. The result of A6 is as follows:

esProc_inmemory_set_operation_9

Now it’s time to get the final result:

7	=A6.(~.(Client))
8	=A7.isect()

List names of the top 20 clients of every month in A7 and, finally, get the intersection of the 12 groups of top 20 names in A8:

esProc_inmemory_set_operation_11

From this example we can see that esProc ordered sets make the problem-solving more intuitive. With sets, we can easily perform grouping, sorting and other computations, making each step of data processing focused and easy to understand. Moreover, the use of set concept can reduce the complexity and coding workload of loop operation as well as the set operations like intersection.

Hi，You need to fill in the Username and Email！