Related Computing in esProc – Alignment Grouping and Enumeration Grouping

esProc 940 0

In Related Computing in esProc – Data Grouping and Summarizing, we learned how to group data in a table as needed and to summarize the grouped data. Please note that, in data grouping and summarizing, data will be grouped by a single or multiple fields or an expression in ascending order based on equivalent values. Alignment grouping or enumeration grouping will be used in data analysis when sometimes data need to be sorted according to a specified order, or to be grouped by specified conditions.

1. Alignment

Usually sort function is used to sort data in a sequence or a table sequence, with a result being listed either in ascending order or descending order. If data need to be sorted in a specified order, esProc provides align function. For example:

  A B
1 [three,one,four,six,two] [one,two,three,four,five,six]
2 =A1.sort() =A1.align(B1)
3 =demo.query(“select EID,NAME,GENDER, STATE,DEPT from EMPLOYEE”) [California,Texas,Florida,Illinois]
4 =A3.align(B3,STATE)  

The sequence in A1 is composed of string members, and B1 provides a base sequence for alignment:

esProc_related_alignment_group_2

Both A2 and B2 re-sort members of the sequence in A1 according to the sequence in B1. A2 uses the common sort function, while B2 uses align function. Results of A2 and B2 are as follows after computing:

esProc_related_alignment_group_3

sort function only sorts the members alphabetically in ascending or descending order, while align function sorts them according to the positions of members of the base sequence. If there is not a corresponding one for a member in the base sequence, the corresponding position will be null.

align function is generally used to sort records of a table sequence or a record sequence. For example, A3 selects some employees, and A4 gets from them the employees of some states according to the specified order. The data in A3 are as follows:

esProc_related_alignment_group_4

According to the order of the sequence in B3, A4 gets the employees of some states:

esProc_related_alignment_group_5

With the alignment operation, the first employees of these states have been retrieved and a record sequence composed of these records has been returned.

2. Alignment grouping

In the previous example, only the first eligible row for each state has been got after the alignment operation. In many cases, however, we need to get all the eligible data. This requires the alignment grouping which can be realized with align@a. For example:

  A B
1 =demo.query(“select EID,NAME,GENDER, STATE,DEPT from EMPLOYEE”) [California,Texas,Florida,Illinois]
2 =A1.align@a(B1,STATE)  
3 =A2.new(STATE,~.count(GENDER==”M”):Male,~.count(GENDER==”F”):Female)  
4 =A1.align@n(B1,STATE)  

By adding @a option to the align function, A2 gets all eligible records for each state:

esProc_related_alignment_group_7

The return value of alignment grouping is a sequence consisting of all the groups, as the result of A2, which is a sequence composed of sub-sequences. Similar to the result of data grouping using group function, the result of alignment grouping using align@a is often used for further computation. For example, A3 further summarizes the data in A2 to get the number of male and female employees:

esProc_related_alignment_group_8

In alignment grouping, there may be some data that cannot be matched with members of the base sequence. For this problem, esProc provides another option for alignment grouping – align@n, the use of which will put all members that haven’t corresponding ones in the base sequence into a separate group. As the grouping result of A4 shown below:

esProc_related_alignment_group_9

By comparing the results of alignment grouping of A2 and A4, we can notice that the result of A4 has one more group which stores the records the employees who are not from the four specified states.

The base sequence for alignment grouping can be got by computing. For example:

  A
1 =demo.query(“select EID,NAME,GENDER, STATE,DEPT from EMPLOYEE”)
2 =A1.groups(STATE;count(~):Count)
3 =A2.sort(Count:-1)
4 =A1.align@a(A3:STATE,STATE)

A2 computes the number of employees of each state using groups function for data grouping and summarizing, and then A3 sorts the result by the number of employees in descending order. Results of A2 and A3 are as follows:

esProc_related_alignment_group_11

A4 gets the states from the result of A3 and computes the base sequence for alignment. The result is as follows:

esProc_related_alignment_group_12

Another method of alignment grouping is to divide data into n groups using P.align(n,y), and then directly compute the group numbers corresponding to these groups according to the grouping expression y. For example:

  A
1 =demo.query(“select EID,NAME,GENDER, STATE,DEPT from EMPLOYEE”)
2 =A1.align@a(26,asc(left(NAME,1))-64)
3 [HR,R&D,Finance,Marketing]
4 =A1.align@a(A3.len(),A3.pos(DEPT))

In the expression =A1.align@a(26,asc(left(NAME,1))-64) in A2, left(NAME,1) gets the initials of the names of the employees, which are converted by asc function into ASCII code, and, with the subtraction of 64 from it, the initials will correspond to the group numbers from 1 to 26. When the code for alignment grouping is executed, the following result will be got:

esProc_related_alignment_group_14

The employees in the same group have the same initials.

Sometimes position functions, like pos and pmax, are used to compute the group numbers while P.align(n,y) is used to perform alignment grouping. A4 groups the employees in alignment according to the sequence of specified departments. The result is as follows:

esProc_related_alignment_group_15

With group function for data grouping, each group of data will be sorted by values of the field for grouping in ascending order; while the alignment grouping can produce a result sorted in specified order.

3. Find satisfied condition(s)

Sometimes data are to be grouped not by the equivalent values but by some conditions, like the range of the order amount and the employees’ age groups.

The grouping conditions can be represented by strings, like “?>3“, “[2,3,5,7].pos(?)>0“, and etc. penum function is used to judge which condition(s) a number can satisfy in a sequence formed by strings of conditional expressions. For example:

  A
1 [?>=85,?>=70,?>=60]
2 =A1.penum(100)
3 =A1.penum(66)
4 =A1.penum(54)

There is a sequence consisting of strings of conditional expressions in A1:

esProc_related_alignment_group_17

Respectively, A2 and A3 find the condition(s) 100 and 66 satisfy. Results are as follows:

esProc_related_alignment_group_18

As can be seen from the result, when a certain number satisfy multiple conditions, only the sequence number of the first satisfied condition will be returned. 100, for example, can satisfy all the three conditions, but the returned result is 1.

As the number 54 in the expression in A4 doesn’t satisfy any of the conditions, the returned result will be null.

Similar to the alignment grouping, penum function also uses @n option, which returns k+1 when data don’t satisfy any of the k conditions. In particular, if a conditional expression is null, it can be satisfied by any data. For example:

  A
1 [?>=85,?>=70,?>=60]
2 =A1.penum@n(54)
3 [?>=85,?>=70,?>=60,null]
4 =A1.penum(54)

Results of A2 and A4 are as follows:

esProc_related_alignment_group_20

The number 54 in A2 doesn’t satisfy any of the three conditions, so penum@n function is used to return a result of 4. But in A4, it satisfies the fourth condition – a null. Please note if the null is listed in the conditions, it must be put at the end.

penum function also uses @r option to represent that the satisfied conditions for different numbers can be identical. In this case, a sequence composed of the sequence numbers of all satisfied conditions will be returned. For example:

  A
1 [?>=85,?>=70,?>=60,null]
2 =A1.penum(100)
3 =A1.penum(66)
4 =A1.penum(54)

A2, A3 and A4 return respectively a sequence composed of the sequence number(s) of the condition(s) 100, 66 and 54 satisfy. Results are as follows:

esProc_related_alignment_group_22

Note: Though the condition null is defined to be satisfied by any data, its sequence number won’t appear in the result if one (or above) more conditions can also be satisfied when penum@r function is doing its job. You can also notice that, when @r option works, the returned result is a sequence rather than a number even if there is only one satisfied condition.

4 . Enumeration grouping

Suppose there is a sequence E consisting of strings of conditional expression. You can group members of another sequence P using P.enum(E,y) function. The operation will compute the expression y using each member of sequence P, judge which condition(s) each computed result satisfies, and group members of P according to conditions they satisfy. By default, members of P will be grouped according to their first satisfied conditions. For example:

  A
1 =demo.query(“select EID,NAME,GENDER, BIRTHDAY,SALARY from EMPLOYEE”)
2 [?>=15000,?>=12000,?>=9000]
3 =A1.enum(A2,SALARY)

A1 gets a table sequence and A2 is the sequence of enumeration conditions. They are as follows:

esProc_related_alignment_group_24

According to the sequence of conditions, the result of enumeration grouping in A3 is as follows:

esProc_related_alignment_group_25

It can be seen that the data of each group produced by enumeration grouping don’t have the same value of SALARY, but they satisfy the same condition. Similarly, though a row may satisfy both the first condition and the other two conditions, it will, by default, be put into the group where only the first one is satisfied. For the data that don’t satisfy any of the conditions, like the first and the fourth row , they won’t be grouped.

Similar to the case where penum function is used, you can append a null condition at the end, under which all data that don’t satisfy the previous conditions can be placed, while performing enumeration grouping. Or you can use enum@n, which plays a similar role. For example:

  A
1 =demo.query(“select EID,NAME,GENDER, BIRTHDAY,SALARY from EMPLOYEE”)
2 [?>=15000,?>=12000,?>=9000]
3 =A1.enum@n(A2,SALARY)
4 [?>=15000,?>=12000,?>=9000,null]
5 =A1.enum(A2,SALARY)

Results of A3 and A5 are same:

esProc_related_alignment_group_27

It can be seen by comparing the result with that of the previous example, both the use of enum@n function and the appending of null at the end of the sequence of conditions will put all data that don’t satisfy the other conditions into a separate group.

By default, it is assumed that there are no overlapped groups when enum function is used to compute, that is, each member of P won’t satisfy two conditional expressions at the same time. @r option is needed if some members need to be grouped more than once. For example:

  A
1 =demo.query(“select EID,NAME,GENDER, BIRTHDAY,SALARY from EMPLOYEE”)
2 [?>=15000,?>=12000,?>=9000]
3 =A1.enum@r(A2,SALARY)

A3 performs overlapped enumeration grouping on members using enum@r. Result is as follows:

esProc_related_alignment_group_29

As can be seen from the result, members of the first group appear in all the three groups and members of the second group appear repeatedly in the latter two groups after the data are overlappedly grouped.

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website