The Standard esProc Method of Assisting SQL-style Transposition

371

In many cases SQL is used to implement a transposition algorithm, such as static/dynamic transposition, natural/inverse transposition, single-/multi-group data transposition, transposition after data alignment/complementing and transposition with/without computed columns. There are different SQL implementations for different scenarios. Some need the pivot function, some require group+case when only, some use the union, and others depend on high-level languages to achieve the dynamic SQL. The lack of a standard method tends to cause confusion among programmers, who find it difficult to write the code.

As esProc (free edition is available) supports dynamic scripting, order-related calculations and set operations, it has the ability to realize data transposition in various scenarios using a standard method, that is, retrieving data, creating empty result set and filling data in it. Moreover, esProc can operate independently, as well as support console execution, invocation by reporting tools and Java applications. More details can be found in How esProc Assists Writing SQL Queries.

These are common transposition problems in SQL programming and their solutions using standard esProc method.

Simple transposition

The database table SALES stores order data, a selection of which is shown below: You need to calculate the total order amount, the maximum/minimum order amount and the total number of orders each month in a specified year, and transpose the data into 13 columns and 4 rows. That is, the four algorithms constitute the first column named subtotal and each month is a column named 1, 2, 3, 4…

esProc script:

 A 1 \$select  month(ORDERDATE) as MONTH,sum(AMOUNT) as OSum,max(AMOUNT) as OMAX, min(AMOUNT) as OMIN ,count(ORDERID) as OCount from sales where year(ORDERDATE)=? group by MONTH order by MONTH; argYear 2 =create(subtotal,\${to(A1.len()).string()}) 3 >A1.fno().to(2,).run(A2.record(A1.fname(~)|A1.field(~)))

A1: Retrieve data and perform simple group and aggregate in SQL. Here’s the result: A2: Create an empty result set containing fields “subtotal,1,2,3,4,5,6,7,8,9,10,11,12”. set.string() concatenates members of a set into a string delimited by commas; \${} parses a string as an expression and execute it dynamically.

A3: Fill in records to the empty result set one by one. The fno function gets the number of fields; the fname function gets field names by sequence numbers; the field function gets a column by the sequence number; the run function loops through a set\result set; and the record function appends records. Here’s the result of A2 after data appending: According to this example, there are three steps to handle data transposition: retrieve data in SQL, create an empty result set using create function, and loop through the source data using run function and append records one by one using record function. The code for data appending will change a little in different scenarios. The following will explain each scenario in detail.

Dynamic transposition

The liquors table stores the production area, types and stock of liquors. Below is a selection from the source data: You need to calculate the stock for every type of liquor from every production area, with each type corresponding to a field.

esProc script:

 A 1 \$select Production, Type,sum(Stock)Amount from liquors group by Production,Type 2 =A1.id(Type) 3 =create(Production,\${A2.string()}) 4 =A1.group(Production) 5 >A4.run(A3.record(~.Production|~.align@a(A2,Type).(Amount)))

It can be handled still in steps of data retrieval, empty result set creation and data appending. A2 gets the names for the second to the Nth fields of the result set; the id function is equal to SQL distinct function. A4 groups A1’s data by Production. A5 loops through each group to append one record to A3 each time; the align function groups data according to a specified sequence and allows null members in a group.

A3 stores the final result, as shown below: Transposing multiple columns to one row

Below is the format of the Students table: You need to combine all records into one row in a format as shown below: esProc script:

 A 1 \$select name,age,sex,kg from student 2 =create(\${A1.(name).(~+”age,”+~+”sex,”+~+”kg”).string()}) 3 >A2.record(A1.(string(age)+”,”+sex+”,”+string(kg)).string().array())

A2 creates an empty table sequence dynamically. A3 composes A1 into a single record and appends it to A2. string(field) means converting data of other types into the string type data. set.string() means concatenating members of a set into a string. You can also append records with this line of code: >A2.record(A1.conj(~.array().to(2,4))) .
Multi-group transposition
In the database table kpi, every 4 records with the same f_site value is a group. You need to transpose each group of data to transform dataset_date values to fields and ioh_kpi、idh_kpi、iol_kpi fields to values of KPI Name field. Below is a selection from the source data: esProc script:

 A B 1 =myDB1.query(“select * from kpi order by f_site,dataset_date”) 2 =A1.id(dataset_date) =A1.fname().to(3,) 3 =create(site,’KPI Name’,\${A2.string()}) 4 for A1.group(f_site) = A4.align(A2,dataset_date) 5 >B2.run(A3.record([A4.f_site,~]|B4.field(~)))

A2 gets the distinct dataset_date values, that is [“2015-04-21 13:15″,”2015-04-21 13:30″,”2015-04-21 13:45″,”2015-04-21 14:00”]. B2 gets field names of A1 starting from the third field, i.e. [“ioh_kpi”,”idh_kpi”,”iol_kpi”]. A3 creates a two-dimensional table dynamically, with field names being site,KPI Name,”2015-04-21 13:15″,”2015-04-21 13:30″,”2015-04-21 13:45″,”2015-04-21 14:00″. A4-B5 use a loop statement to append data; the statement is equal to A1.group(f_site).run(…), but it is more clear than run function when steps are many. Here’s the result: Inverse transposition

A query finds that the tb1 table has one record with many fields, as shown below: You need to transpose the table into a two-dimensional table with three columns and multiple rows, as shown below: esProc script:

 A 1 \$select * from tb1 2 =create(project,operator,actionTime) 3 >((A1.fno()-1)/2).run(A2.record(A1.field(1)|A1.field(2*~)|A1.field(2*~+1)))

((A1.fno()-1)/2) calculates the number of records that need to be added to the result set and run function runs a loop to append them.

If there is more than one record in the tb1 table, you need to use the following code:

 A 1 \$select * from tb1 2 =create(project,operator,actionTime) 3 =((A1.fno()-1)/2) 4 >A1.run(A3.run(A2.record(A1.~.field(1)|A1.~.field(2*~)|A1.~.field(2*~+1))))

Here’s the result: Appending the subtable with irregular length into the main table

Table1 and Table2 are the main table and the subtable that are associated through ID field. Now you need to append the subtable as fields of the main table. The subtable has different numbers of rows with the same ID value to correspond one row in the main table.

Below is a selection from Table1: Below is a selection from Table2: The desired format: esProc script:

 A 1 =myDB1.query(“select  t1.ID,t1.Prob,t1.Cost,t1.Visible,t2.Item,t2.Count from table2 t2 left join table1 t1 on t1.ID=t2.ID”) 2 =A1.group(ID) 3 =A2.max(~.count()).(“item”+string(~)+”,”+”Count”+string(~)).string 4 =create(ID,Prob,Cost,Visible,\${A3}) 5 >A2.run(A4.record([ID,Prob,Cost,Visible]|~.([Item,Count]).conj()))

Join the main table and the subtable using a SQL statement; group the data by ID; create an empty table sequence dynamically in A4 according to the group with the maximum number of records; and then concatenate the records in the main table and field values in the subtable into new records and append them to A4. Unlike a SQL group operation, esProc group function is able to group data without aggregating it.

A4 gets the final result as follows: Inserting the subtable with specified maximum length dynamically into the main table

dColThread and dColQuestion are the main table and the subtable that are related through tID field. Each record of the main table corresponds to multiple but less than 5 status field values. You need to insert the status values between Phone and Decline fields and name them QuestionNo1、QuestionNo2…QuestionNo5.

Below is a selection of dColThread: Below is a selection of dColQuestion: esProc script:

 A B 1 =myDB1.query(“select * from dColThread t,dColQuestion q where t.tID=q.tID and t.ApplicationName=?”,arg1 ) 2 =A1.group(tID) 3 =create(ApplicationName,User,Phone,QuestionNo1,QuestionNo2,QuestionNo3,QuestionNo4,QuestionNo5,Decline) 4 for A2 =A4.(status)|[“”,””,””,””,””] 5 =A3.record(A4.ApplicationName|A4.User|A4.Phone|B4.to(5)|A4.Decline)

Create a two-dimensional table with fixed fields in A3; loop through each of A2’s groups to get the status value and complement the missing values to reach five; and then append the complete records to A3.

A3 gets the final result as follows: Complementing the missing months before transposition

The tb table, with two fields – time and quantity, contains the product’s sales quantity per day. In certain months, the quantity value may be null. Below is a selection from the source data: You need to rearrange the tb table into 12 rows, with fields including the fixed months (values are 1-12), the unfixed annual sales quantity (if a time period involves 2 or 3 consecutive years, create 2 or 3 fields), as shown below: esProc script:

 A 1 \$select year(time) Year,month(time) Mon,sum(quantity) Amount from tb group by Year,Mon 2 =A1.id(Year) 3 =A1.align@a(12,Mon) 4 =create(m,\${A2.(“Amount_for_”+string(~)).string()}) 5 >A3.run(A4.record(#|A2.(ifn(A3.~.select@1(Year==A2.~).Amount,0))))

A1 performs a SQL group and aggregate; A2 gets a list of years; A3 groups A1 according to the sequence of 12 months; A4 dynamically creates an empty two-dimensional table; A5 loops through every group of A3 and appends one record each time. Below is the result of A3: A4 gets the final result, as shown below: Transposition with computed columns

Below is a selection from the table tb1: You need to generate the total number of projects before a specified date in the current month. For instance, enter 2015-01-03 and here’s the desired result: esProc script:

 A 1 \$select * from tb1 where ddate>=? and ddate<=?; pdate@m(d_date) ,d_date 2 =create(Prjno,Subtask,\${periods(pdate@m(d_date),d_date,1).string()}) 3 =A1.group(Prjno,Subtask) 4 >A3.run(A2.record(Prjno|Subtask|~.groups(Ddate;sum(Num):Num).(Num)))

Query the source data and group it; loop through every group to insert one record to the empty two-dimensional table each time.
Transposition with dynamic locating
In the database table tb, every 3 records with the same userid are a group. You need to transform these groups into rows. Below is a selection from the table tb: The desired result: esProc script:

 A 1 \$select * from tb 2 =create(userid,type,hair,weight) 3 =A1.group(userid) 4 >A3.run(A11.record(userid|(t=~.align@n([“hair”,”weight”],type).(descr))(3)|t(1)|t(2) ))

The align function aligns data with members of a set ([‘hair’,’weight’]); @n means placing the unaligned data in a separate row. As with this example, such a row is composed of records corresponding to mouse and dog. Here’s the result: Column-to-row transposition with three related tables

There are three tables – Students table, Exam table and Retest table – which are related between each other through stu_id, as shown below:

Students Exam Retest You need to query the three tables to get the score of each subject, the total score and the retest score for every student. Below is the desired result: esProc script:

 A 1 \$select t1.stu_id,t1.stu_name,t2.subject,t2.score score1,t3.score score2 from stutents t1 left join exam t2 on  t1.stu_id=t2.stu_id left join retest t3 on t1.stu_id=t3.stu_id and t2.subject=t3.subject order by t1.stu_id,t2.subject 2 =A1.group(stu_id) 3 =A1.group(subject) 4 =create(stu_id,stu_name,\${(A3.(~.subject+”_score”)|”scores(scoresSum)”|A3.(~.subject+”Retest “)).string()}) 5 >A2.run(A4.record([stu_id,stu_name]|A3.(~(A2.#).score1)|A2.sum(score1)|A3.(~(A2.#).score2)))