esProc Simplifies Order-related SQL-style Computations: Examples

Blog 1119 0

It is difficult for SQL to handle order-related computations, for it does not support ordered sets. But as these computations are very common in real-world business, there are a lot of online discussions related to them. For example:

http://stackoverflow.com/questions/29750567/count-several-rows-as-one-if-the-next-row-also-are-less-then-a-value

https://www.linkedin.com/groups/Hello-All-can-anyone-help-86080.S.5904859230397018116?trk=groups_items_see_more-0-b-ttl

https://www.linkedin.com/groups/Hi-All-PLSQL-Developers-Please-4251681.S.5939282914863783936?trk=groups_items_see_more-0-b-ttl

http://stackoverflow.com/questions/29757904/my-sql-rally-up-rally-down

https://www.linkedin.com/groups/What-is-best-way-shuffle-4251681.S.5946674559905198082?trk=groups_items_see_more-0-b-ttl

      

Usually SQL approaches order-related computations by:

1.  Using window functions

In most cases, use of window functions can help simplify the SQL if the working database (such as Oracle MSSQL) supports window functions. The dependence on database support limits their working sphere.

2. Using complex statements

If the database does not support window functions, you need to use very complicated nested subqueries in composing SQL statements. Of course some databases (such as MySQL) that support variables and ordered execution could simplify certain part of the algorithm.

3.  Composing a stored procedure

But in handling a complicated scenario, it is really not easy to compile queries for the stored procedure step by step.

If the main program is Java-based, you can use esProc (free version is available) to assist the computing process. esProc supports ordered data. It is much simpler to perform order-related calculations with esProc than with SQL. Here are some examples.

Calculating year-on-year growth rate and link relative ratio

Inter-row calculations (such as calculating year-on-year growth rate and link relative ratio) are difficult to be performed if the current database does not support window functions. In that case you need to change the way of approaching the problem by performing join operations. But by doing so, the code becomes unreadable and inefficient. Even if you can use the window functions, you still have to write the nested subqueries, producing quite lengthy SQL statements.

Let’s look at how esProc handles the inter-row calculation. The sales table stores orders data in years. You need to calculate the link relative ratio and year-on-year growth rate of each month in a specified time period based on this table. Below is a selection of the source data:

esProc_sql_order_relate_1

esProc script:

  A
1 =esProc.query(“select sum(Amount) mAmount from sales group by year(OrderDate), month(OrderDate) where OrderDate>=? and OrderDate<=?”,begin,end)
2 =A2.derive(mAmount/mAmount[-1]:lrr,null:yoy)
3 =A3.sort(m)
4 =A4.run(if(m==m[-1],mAmount/mAmount[-1],null):yoy)

A1: Query the database by the specified time period, and group Amount by the year and month in OrderDate. Both begin and end are external parameters. Suppose begin=”2011-01-01 00:00:00″ and end=”2014-07-08 00:00:00″.

A2: Add a new field lrr to A1’s table for holding the monthly link relatives and the year-on-year growth rate of each month. The expression representing the former is mAmount/mAmount[-1]. esProc supports using [N] or [-N] to represent the Nth row after or before the current row. In the expression, mAmount represents the amount of the current month and mAmount[-1] represents that of the previous month. Note that the link relative ratio value of the first month (here it is January 2011) is null.

A3: Sort A2 by the month and the year to calculate year-on-year growth rate. The complete expression should’ve been =A2.sort(m,y), but as A2 has been sorted by the year, you can just sort it by the month with A2.sort(m), which is more efficient.

A4: Calculate year-on-year growth rate of the sales amount based on A3. Note that the calculation can only be performed between the same months.

The execution result of the esProc script can be the data source of the report, or can be called through JDBC from the Java application. The code for Java to call the esProc script is as follows:

     Class.forName(“com.esproc.jdbc.InternalDriver”);

con= DriverManager.getConnection(“jdbc:esproc:local://”);

//Call esProc script (which is similar to the stored procedure); p1 is the name of the script file

st =(com. esproc.jdbc.InternalCStatement)con.prepareCall(“call p1()”);

st.setObject(1,”2011-01-01 00:00:00″);

st.setObject(2,”2014-07-08 00:00:00″);

 

//Execute the script

st.execute();

//Get the result set

ResultSet rs = st.getResultSet();

……

The returned value is a ResultSet object in accordance with JDBC standard. The method of calling an esProc script is the same as that of accessing a database. Programmers can master it fast as long as they are familiar with JDBC.

More details about deploying esProc JDBC and calling script through it can be found in esProc Integration & Application: Java Invocation.

A special type of inter-row calculation

There is another type of inter-row calculation, which will be explained through an example. Below is the source data:

esProc_sql_order_relate_3

Target: To add columns store1 and store2. For the first record, the rule is store1=store and store2=store. From the second record up, it would be store1=store+store1[-1] and store2=store1+store2[-1] in which [-1] represents the previous record.

The targeted result:

esProc_sql_order_relate_4

esProc script:

  A
1 $select ID,store from tb order by ID
2 =A1.new(ID,store,store+store1[-1]:store1,store1+store2[-1]:store2)

A1: Execute the SQL statement to retrieve data.

A2: Create the result set and set the rule that store1=store+store1[-1] and store2=store1+store2[-1]. esProc uses [-1] to represent a position relative to the current one, that is, the previous record.

SQL approach for comparison:

SELECT T1.ID,T1.store

    ,SUM(DISTINCT T2.store)store1

    ,SUM(T3.store)store2

FROM test T1

    JOIN test T2 ON T1.ID>=T2.ID

    JOIN test T3 ON T2.ID>=T3.ID

GROUP BY T1.ID,T1.store

ORDER BY T1.ID

Calculating continuous rising/declining values

Below is the stock records, based on which you need to count the number of continuous days when Price values are positive or negative.

esProc_sql_order_relate_7

The targeted result:

esProc_sql_order_relate_8

esProc script:

  A
1 $select Date,Price from stock order by Date
2 =A1.derive(if(Price*Price[-1]<0,1,result[-1]+1):result)

A1: Execute the SQL statement to retrieve data and sort it by Date

A2: Add a result field. If both the current Price value and the previous Price value are positive or negative, add 1 to the result value; otherwise the result value remains 1.

esProc uses [-1] to represent the previous record relative to the current one, (-2) to reference the second-to-last record and {-1,1} to represent a dynamic interval. They can help simplify the complicated inter-row calculations.

Querying continuous same values

Below is the source data:

esProc_sql_order_relate_10

Target: To get the num values that have appeared continuously for at least three times.

The targeted result:

esProc_sql_order_relate_11

esProc script:

  A
1 $select id,num from tb order by id
2 >lx=1
3 =A1.select(lx=if(num==num[-1],lx+1,1):3).(num)

A1: Execute the SQL statement to retrieve data.

A2: Create the variable lx to which the initial value assigned is 1.

A3: Judge if the num value of the current record is the same as that of the previous record. If the result is true, add 1 to lx. Finally get num values whose corresponding lx value is 3.

Querying neighboring record

Below is the source data:

esProc_sql_order_relate_13

Target: To select the record whose Property2 value is greater than 100 and find its next record.

The targeted result:

esProc_sql_order_relate_14

esProc script:

  A
1 $select * from tb
2 =A1.pselect@a(Property2>100)
3 =A1(A2.union([~,~+1]))

A1: Execute the SQL statement to retrieve data.

A2: Get the position of the record where Property2>100.

A3: Get the record in the desired position and the one next to it based on A2’s sequence of position.

Counting number of rows by specified interval

Below is the table data:

esProc_sql_order_relate_16

Target: To count the number of rows where value<10 (or whatever value). If the value is less than 10 in multiple continuous rows, count them one row. Thus the computing result based on the above data should be 3.

esProc script:

  A
1 $select id,value from tb order by id
2 =A1.group@o(value<10).count(value<10)

A1: Execute the SQL statement to retrieve data.

A2: Group rows according to the condition that value>10 and value<10, in which @o means only comparing the neighboring rows. Then count the number of groups where value<10.

Different from SQL grouping which must be followed by an aggregate, esProc group operation can retain the members of each group for potential use. This design helps make an SQL algorithm simpler.

SQL approach:

select count(*)

from (select t.*,

             if(@pp := @prev,

                if(@prev := value, @pp, @pp)

               ) as prev_value

      from table t cross join

           (select @prev := NULL) vars

      order by id

     ) t

where value > 10 and (prev_value is null or prev_value < 10);

Filtering on dynamic intervals

Below is the source data:

esProc_sql_order_relate_19

Target: To sort the records by ID in ascending order and find the records where DDATE is non-ordered, that is, DDATE hasn’t been sorted in ascending order.

The targeted result:

esProc_sql_order_relate_20

esProc script:

  A
1 $select ID,DDATE from tb order by ID
2 =A1.select( DDATE!=max(DDATE{,0}) || DDATE!=min(DDATE{0,}))

A1: Execute the SQL statement to retrieve data and sort it by ID.

A2: Select the records where DDATE value is not the biggest compared with all previous records, and where DDATE value is not the smallest compared with all the following records. In this statement, DDATE{,0} represents a set consisting of all DDATE values from the first to the current record and DDATE{0,} represents a set consisting of all DDATE values from the current record to the last.

SQL approach:

WITH T1 as (

select id,ddate,max(ddate)over(order by id) d1

from t0042),T2 as (

select id,min(ddate)over(order by id desc) d2

from t0042)

select T1.id,T1.ddate

from T2,T1

WHERE T1.id=T2.id and (t1.ddate<>T1.d1 or t1.ddate<>T2.d2)

Finding lost values for an interval

Below is the source data:

esProc_sql_order_relate_23

Target: To find the lost numbers in an interval of numbers generated by the smallest value and the biggest value in field A

The targeted result:

esProc_sql_order_relate_24

esProc script:

  A
1 =db.query@i(“select A from table1 order by A”)
2 =[to(A1(1),A1.m(-1)),A1].merge@d()

A1: Execute the SQL statement to retrieve data and sort it by A.

A2: Generate an interval consisting of continuous numbers according to the value of the first record, and then get difference of this interval and A1.

SQL approach:

SELECT 1000+T2.number

FROM TB T1 RIGHT JOIN master..spt_values T2 ON T1.A=T2.number+1000

WHERE T2.type=’P’AND T2.number>0 AND 1000+T2.number<=(SELECT MAX(A)FROM TB) AND T1.A IS NULL

Merging intervals

Below is the source data:

esProc_sql_order_relate_27

Target: To judge whether intervals generated by values of num1 and num2 in each record are overlapped. Merge them if the result is true.

The targeted result:

esProc_sql_order_relate_28

The way of approaching it is to judge whether a row can merge with the previous row. If the result is false, make it a separate group (+1); if true, group them into one (+0). Finally get the biggest and the smallest values from each group to generate the interval.

esProc script:

  A
1 $select num1,num2 from tb order by num1
2 =a=0
3 =A1.groups(a+=if(num1>num2[-1],1,0):xh;min(num1):num1,max(num2):num2)

A1: Execute the SQL statement to retrieve data, and sort it by num1.

A2: Create a temporary variable a, and assign 0 to it as its initial value.

A3: In the sorted table, compare each num1 value with the num2 value in the previous row. If the num1 value is bigger, start a new group with the row; otherwise merge the row to which num1 belongs with the previous row.

SQL approach:

select row_number()over(order by rn) xh,

       min(num1) num1,

       max(num2) num2

  from

(

select num1,

       num2,

       last_value(case when rn is not null then rn end ignore nulls)over(order by rownum) rn

  from    

(

select num1,

       num2,

       lag(num2)over(order by xh),

       (case when num1 – lag(num2)over(order by xh) < 0 then null else xh end) rn

  from tt

) t

) t1

group by  rn

Generating intervals

Below is the source data:

esProc_sql_order_relate_31

Target: To divide RBD_VALUE values into segments and compose them into strings, which are like A-F,H,J-K,N-O,Q-S,U-V,X-Z.

Rules:

If RBD_VALUE= “ALL”, then discard the value;

According to the alphabetical order if certain RBD_VALUE values are consecutive letters , then make them a segment with both the first and last letter included and “-” between them, like “A-F”;

If two values of RBD_VALUE are inconsecutive, like K and N between which L and M are absent, then unite the former with the letter or letters before it and join the latter with the letter or letters after it. Separate the two segments with a comma. This is like “J-K”, “N-O”;

If a value in RBD_VALUE has not consecutive letters both before and after it, then take it as a separate segment, like the letter H. G and I are absent respectively before and after it.

The targeted result:

esProc_sql_order_relate_32

esProc script:

  A
1 $select * from tb
2 =A1.align(26.(char(64+~)),RBD_VALUE)
3 =A2.group@o(!RBD_VALUE)
4 =A3.select(RBD_VALUE)
5 =A4.(RBD_VALUE+if(~.len()>1, “-“+~.m(-1).RBD_VALUE))

A1: Execute the SQL statement to retrieve data.

A2: Align RBD_VALUE field in A1 with the alphabet.

A3: Group rows in A2. Group rows where RBD_VALUE is null together and put the other rows where RBD_VALUE is non-null into another group. @o means performing merge-style data grouping, which only compares the neighboring values.

A4: Select the group in which RBD_VALUE is non-null.

A5: Compose strings according to the number of members in each group. ~.m(-1) means getting the last row from the current group. 

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website