esProc Assists SQL with the Non-Equi-Grouping

In SQL, usually we can only group a table automatically according to its own filed(s). When the grouping criterion comes from another table, or is an external parameter or a conditional list, SQL has to handle the grouping in a very roundabout way. Some cases even require the dynamic criteria, which need to be generated by resorting to the high-level languages. For some other cases, where the grouping criterion doesn’t entirely correspond to the source table (or the intervals involved don’t have any intersection), either the grouping result should be complemented or the difference should be excluded from it. Both scenarios are difficult to deal with in SQL.

With support of alignment grouping, enumeration grouping, and order-related computations, esProc can easily implement those scenarios of non-equi-grouping. esProc can operate alone, via the console, as well as through invocation by the reporting tool or Java code. More details are covered by How esProc Assists Writing SQL Queries.

Here are some typical problems involving non-equi-grouping in SQL development and their solutions in esProc.

Simple fixed grouping

The sales table stores ordering records, in which CLIENT column holds the client names and AMOUNT column has ordering amounts. We need to group the table according to the list of “potential clients” and to aggregate the AMOUNT column in each group. The list is an external parameter and contains some items that the CLIENT column doesn’t (Their aggregation results should be zero). Now suppose the list is TAS,BON,CHO,ZTOZ . Below is a selection of the source data:

esProc_sql_nonequi_group_1

When the grouping criterion is fixed and contains relatively few items, you can have union or decode work with SQL to perform the group. As with this example, the grouping criterion is a dynamic external parameter. We have to create a temporary table, parse each value of the parameter and insert the results into the temporary table before moving on to the subsequent computations. But with esProc, we don’t need to create a temporary table. Here’s the esProc code:

	A
1	=sales=db.query (“select * from sales”)
2	=arg1.array()
3	=sales.align@a(A2,Client)
4	=pjoin(A2,A3.(~.sum(Amount)))

The align function groups records by arg1, the external parameter list, and uses @a option to get all records from each group; without @a, the function will get the first record of each group. The pjoin function composes each record in order, as shown below:

esProc_sql_nonequi_group_3

Dynamic aggregate by intervals

Suppose we want to segment the sales table according to the ordering amounts and aggregate the amounts for each segment. The criterion of segmentation is a list parameter, which includes multiple items, such as four intervals separated by 0-1000, 1000-2000 and 2000-4000.

With a fixed criterion, we can write it directly in the SQL statement. But with a dynamic external parameter, we often need to compose the SQL statement using a high-level language such as JAVA. The process could be very complicated. esProc, however, supports the dynamic expression to produce smart code:

	A
1	=sales=db.query (“select * from sales”)
2	=sales.enum(byFac,Amount)
3	=pjoin(byFac,A2.(~.sum(Amount)))

byFac is a parameter, like [“?<=1000″ ,”?>1000 && ?<=2000″,”?>2000 && ?<=4000″,”?>4000″]. The enum function groups records according to the criterion. Here’s the result:

esProc_sql_nonequi_group_5

It happens that the above conditional intervals don’t overlap each other. But in practice it’s not uncommon to have overlapped conditions. For instance, suppose we want to group the ordering amounts according to the following rules:
1000 ~ 4000: Regular orders

Below 2000: Common orders

Above 3000: Important orders

Among these conditions, r1 and r2 overlap each other. Sometimes we don’t want duplicate records in the grouping result (that is, after finding records satisfying r1, then find records satisfying r2 from the rest of the records). Other times we do want the duplicate records (find records satisfying each condition from the whole table). For both scenarios, SQL needs a great amount of with, union, except or minus statements to produce extremely lengthy code. By default, the esProc enum function performs groups without duplicates; but it also allows them by using @r option.

Dynamic accumulation by intervals

The performance table holds the performance scores and performance bonuses of employees. Starting from zero, we divide the performance scores every 10 scores and then, from low to high, aggregate the bonuses cumulatively section by section. That is, the accumulated amount of the current section should cover the bonuses of all previous sections. Below is the source data:

esProc_sql_nonequi_group_6

Since the performance scores are not fixed, the number of intervals is unfixed too. Oracle and MSSQL are able to implement this by creating an auxiliary interval list and using the window functions, but they produce lengthy code. It’s even harder for databases that don’t support window functions, such as MySQL. esProc is a simpler alternative to deal with this type of non-equi-grouping cases. Below is the esProc code:

	A
1	$select score,bonus from performance order by score
2	=((A1.m(-1).score-1)\10+1).(~*10)
3	=A2.new(~:range,A1.select(score<range).sum(bonus):accum)

A2 generates intervals dynamically, in which m gets members by their sequence numbers, m(-1) gets the last member and “\” gets the integer part of the quotient. A3 creates a two-dimensional table based on A2 and accumulates bonuses by querying corresponding records in A1. Here’s the result:

esProc_sql_nonequi_group_8

Complementing the source table for fixed grouping

The building table contains records of project completion, in which year is the string type completion time in the format of “year the first half year\the second half year”. We need to count the number of buildings that have been completed for each type of project during every half year over a period specified by beginning and ending years. Below is the source data:

esProc_sql_nonequi_group_9

The years in the source table are inconsecutive. It’s not easy for SQL to fill in the missing years and then do the left join. We can use esProc to make the job simpler:

	A	B
1	$select type,year from building
2	=create(year,subtotal,type)
3	=to(argb,arge).conj([string(~)+” first half”,string(~)+” last half”])
4	for A1.group(type)	=A3.(A2.record(~\|A4.count(year==A3.~)\|A4.type))

A2 creates a new two-dimensional table. A3 creates a year list according to the beginning and ending parameters (argb, arge). A4 groups the building table by type and processes each group by loop. Each loop (B4) will insert into A2 a number of records that are the same with the number of items in the year list. Here’s A2’s final result:

esProc_sql_nonequi_group_11

Complement to the source table for fixed grouping and transposition

The onBusiness table stores the records of the employees’ business trips, in which Date and id_user are the main fields. The user table holds the user information, whose main fields are id and name. We want to show whether each employee has a business trip plan for each week in order, with a special requirement that each employee has a column. Below is a selection from the onBusiness table:

esProc_sql_nonequi_group_12

Suppose the beginning date and ending date are 2015-05-31 and 2015-06-28, the desired result would be like this:

esProc_sql_nonequi_group_133

esProc code:

	A
1	=$select user.name, onBusiness.date, onBusiness.id_user from onBusiness,user where onBusiness.id_user=user.id where date>=? And date<=?;begin,end
2	=A1.group(id_user)
3	=(interval(begin,end)\7).new(~:week,${A2.(“\”No\”:”+name).string()})
4	=A2.run(~.run(A3(inteval(begin,date)\7+1).field(A2.#,”yes”))

First query the desired data using a simple join statement. Next create a two-dimensional table A3 by the intervals, where each week has a row (automatically filling in missing values when dates are interrupted) and each employee has a column. The initial values are “No”. Then loop through A2’s groups to modify corresponding values in A3 to “Yes”.

Inter-group calculation by months

The work table stores information for a job. In the table, People field holds names of workers, Date field has the entry dates and Deleted field holds the quitting dates. We need to summarize from March to July how many people are on the job each month. Below is the source data:

esProc_sql_nonequi_group_15

The problem here is that the months in the source table may be discontinuous, yet we want a consecutive result. So we need to first generate the consecutive months and then perform the alignment grouping on the source table according to the consecutive months. A worker who quits the job will still be included in the on-the-job list of the current month, but will be excluded from that of the next month. To obtain the right number of the workers who are on the job in each month, we need to perform the inter-column calculation and inter-group accumulation.

esProc code:

	A
1	=$select * from tb
2	=to(3,7).new((m=~):month,A1.count(month(Date)<=m)-A1.count(Deleted && month(Deleted)<m):value)

The to function generates a consecutive sequence. The new function creates a two-dimensional table based on a sequence (or another two-dimensional table). ~ represents the current member in the original sequence. Here’s the final result:

esProc_sql_nonequi_group_17

Dynamic locating of an interval

The Transaction table records the time points of each transaction for each user. The Discount table holds the discount information after certain time points. There are multiple discount records, forming dynamic time intervals. Now we need to calculate the discount of each transaction for each user.

Below is a selection of the Transaction table:

esProc_sql_nonequi_group_18

Below is a selection of the Discount table:

esProc_sql_nonequi_group_19

In SQL, we need to first generate the time intervals through inter-row calculation and then perform the join, which is difficult to implement. Whereas in esProc, we can perform the equi-grouping on Transaction according to Discount, meaning the code is simpler:

	A
1	=$select TransID,TUser,Date from Transaction
2	=$select DiscountID,Discount from DISCOUNT order by Date
3	=A2.(Date)
4	=A1.new(TUser,Date,A2(A3.pseg(Date)).Discount:d)

The pseq function calculates the interval number to which a record belongs, and A2(…) gets records by their sequence numbers. Here’s the result:

esProc_sql_nonequi_group_21

In this example all users share the same discount. If each user has its own discount (there is a DUser field in the Discount table), the code should be like this:

	A
1	=$select TransID,TUser,Date from Transaction
2	=$select DiscountID,DUser,Date,Discount from DISCOUNT order by DUser,Date
3	=join(A1.group(TUser),TUser;A2.group(DUser),DUser)
4	=A3.conj((t1=#2,t2=t1.(Date), #1.new(TUser,Date,t1(t2.pseg(Date)).Discount:d) ))

Here’s the result:

esProc_sql_nonequi_group_223