The foreign key is a common concept for various relational databases. For a relational table, you can define one or multiple fields as the foreign key, through which an association with the data in another table can be created. For example, in a game scoring report, often players are recorded with their IDs. To get the detailed information of the players from another player profile table, you should query the player profile table according to the player IDs. In this case, the player ID field can be set as the foreign key.

The foreign key can link two relational tables and help ensure data consistency, facilitating the cascade operations. Take the above game scoring report as an example. If defining the player ID field as the foreign key, then you can obtain the names, nationalities, ages, and other information of the players while getting the game scoring records.

In this example, the foreign key definition is in effect equivalent to referencing the records in the player profile table from the game scoring table. Thus, **the essence of a foreign key lies in its mapping of records**.

Unlike the database, esProc sets no constrains on the data type of the field values of the records, and any type of value is allowable. Thus a certain field of an esProc table sequence can get assigned with the referenced records, establishing a foreign key association straightforwardly.

Here’s an example of creating an employee education background table comprising the employee ID field and education field. The education field values are set in order through a simple loop, just for illustrating how to use the foreign key reference in esProc.

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) |

To link the education background table with the EMPLOYEE table, a foreign key field Info can be added, whose values references the records of the EMPLOYEE table.

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) | ||

4 | $ select * from EMPLOYEE | =A1.derive(A4.select@1(EID==ID):Info) |

Below is the table sequence in B4 in which the Info field references the employee records. You can double-click the field values to view the detailed information.

Through record referencing, the foreign-key join queries are easier to perform and the query statements become clearer and more intuitive. A field whose values are records can be handled as any other fields and the records can be processed normally. For example, the following lists the ID, full name, education background and department for employees from the state of Ohio.

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) | ||

4 | $ select * from EMPLOYEE | =A1.derive(A4.select@1(EID==ID):Info) | |

5 | =B4.select(Info.STATE== “Ohio”) | =A5.new(ID,Info.NAME+” “+Info.SURNAME:FullName,Education,Info.STATE:State,Info.DEPT:Dept) |

The following is B5’s result:

esProc allows assigning not only the records of other table seqeunces but also those of the current table sequence to the foreign key field.

In esProc,** A.derive()** function can assign records to a newly-added field when generating a table sequence. And the

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) | ||

4 | $ select * from EMPLOYEE | >A1.run(ID=A4.select@1(EID==A1.ID)) |

B4 modifies the employee ID field in the education background table sequence to the corresponding records in the EMPLOYEE table. After that, A1’s table sequence is as follows:

If the primary key of the corresponding table is a single field, you can also use the function ** A.switch()**:

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) | ||

4 | $ select * from EMPLOYEE | >A1.switch(ID,A4) |

The table sequence in A1 is the same, but B4’s expression is much more concise. Note that the **switch() **function links the specified field and the primary key in another table sequence automatically. If the primary key of the to-be-related table sequence isn’t the first field, ** T.primary()** function is needed to set a primary for it. The following example creates an association between the STATE field of the EMPLOYEE table and the STATE table:

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) | ||

4 | $ select * from EMPLOYEE | >A1.switch(ID,A4) | |

5 | $ select * from STATES | >A5.primary(NAME) | >A4.switch(STATE,A5) |

Now you can view the state information from the employee information referenced in A1’s table sequence:

At this point, you can handle multi-table association queries or computations, like finding which states from which the employees come from have a population below 1,000,000:

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) | ||

4 | $ select * from EMPLOYEE | >A1.switch(ID,A4) | |

5 | $ select * from STATES | >A5.primary(NAME) | >A4.switch(STATE,,A5) |

6 | =A1.select(ID.STATE.POPULATION< 1000000) |

As can be seen, the record-referencing foreign key makes the multi-table association query syntax more concise and readable with an improved computing speed.

The **switch() **function can also switch over between the foreign key and a non-primary field. For example:

A | B | C | |

1 | =create(ID,Education) | [BS,BA,MS,MA,MBA,MEE,MPA,PhD] | =B1.len() |

2 | for 500 | =B1((A2-1)%C1+1) | |

3 | >A1.insert(0,A2,B2) | ||

4 | $ select * from EMPLOYEE | >A1.switch(ID,A4) | |

5 | $ select * from STATES | >A4.switch(STATE,A5:NAME) |

B5 specifies that the NAME field in A5’s STATES table sequence be related, instead of resetting its primary key, when calling the **switch() **function to create an association. The association is successfully created, and A1’s table sequence is as follows:

Note that the EMPLOYEE table’s STATE field value in blue stands for a state record. Here the STATES table’s primary key is a different field.

When linking two table sequences through the foreign key, sometimes not all key values can find their matching records. For example:

A | B | |

1 | [1,3,5,7] | [Alberta,Ontario,Nova Scotia,Manitoba] |

2 | $ select EID,NAME,SURNAME,STATE from EMPLOYEE | >A2(A1).field(4,B1) |

3 | $ select NAME,ABBR,CAPITAL from STATES | >A2.switch(STATE,A3) |

B2 modifies the STATE field values in several records of the EMPLOYEE table. Below are A2’s table sequences before and after the execution of B2:

The first field of STATES table sequence that A3 generates is the NAME. And B3 uses the **switch()** function to directly create an association. After the association, A2’s table sequence is as follows:

With the modification, these modified STATE values can’t find matching records in the STATES table and thus return nulls. To prevent this from happening, you can use **@i** option with the **switch()** function to return only the records that are matched successfully. For example:

A | B | |

1 | [1,3,5,7] | [Alberta,Ontario,Nova Scotia,Manitoba] |

2 | $ select EID,NAME,SURNAME,STATE from EMPLOYEE | >A2([1,3,5,7]).field(4,B1) |

3 | $ select NAME,ABBR,CAPITAL from STATES | >A2.switch@i(STATE,A3) |

In this case, the **switch()** function deletes those records that can’t find their matches, and A2’s table sequence is as follows:

But you might be only interested in those key values that can’t find matches. To get them, you can modify B3’s code into **>****A2.switch@d(STATE,A3)**. Now A2’s table sequence is as follows:

Apart from records, esProc also lets a field get assigned with sets – that is, referencing sets in a foreign key field. In most of the cases, the sets referenced by a foreign key field are record sequences that each is composed of certain records of another table sequence. esProc foreign key field is different from the foreign key concept in the context of the databases, in that the former is more like the SQL query where a primary table references data of one of its subtables.

Like the foreign key got assigned with records, a foreign key field can also get assigned with records sequences when it is added by the **derive()** function generating a table sequence. For example:

A | |

1 | $ select NAME,ABBR from STATES |

2 | $ select * from EMPLOYEE |

3 | =A1.derive(A2.select(STATE==A1.NAME):Employees) |

A3 adds a foreign key field Employees to A1 and assigns records of the employees who come from each state to it, and returns a new table sequence. The values of the Employees field of A3’s table sequence are all record sequences, as shown below:

The foreign key field referencing sets can be directly called as any field of a table sequence. At the same time, computations can be performed on the set-type values of the foreign key field. The following example adds a Count field to calculate the number of the employees in each state:

A | |

1 | $ select NAME,ABBR from STATES |

2 | $ select * from EMPLOYEE |

3 | =A1.derive(A2.select(STATE==A1.NAME):Employees) |

4 | =A3.derive(Employees.count():Count) |

The total number of employees can be computed based on the foreign key field. Then the table sequence in A4 is as follows:

As with the foreign key field referencing records, you can also handle multi-table-association-based queries or filtering operations with expressions referencing a foreign key field that references sets, which mean record sequences. To find the states having more than 50 employees, for example:

A | |

1 | $ select NAME,ABBR from STATES |

2 | $ select * from EMPLOYEE |

3 | =A1.derive(A2.select(STATE==A1.NAME):Employees) |

4 | =A3.derive(Employees.count():Count) |

5 | =A4.select(Employees.len()>50) |

By referencing sets in a foreign key field, the multi-table association query syntax becomes clearer and more intuitive. In A4’s table sequence, the sole purpose of introducing the Count field is to have a better viewing of the result. The field is useless for A5’s filtering operation. Below is A5’s result:

The **switch()** function doesn’t apply to a table sequence field referencing sets, but the ** A.run()** function can be used to assign the set-type values to an existing field. For example:

A | |

1 | =12.new(#:ID,string(date(“2000-“+string(#)+”-1″),”MMMM”):Month,Employees) |

2 | $ select EID,NAME+’ ‘+SURNAME FULLNAME, BIRTHDAY from EMPLOYEE |

3 | >A1.run(Employees=A2.select(month(BIRTHDAY)==A1.ID)) |

A3 assigns the null Employees field with the records of employees born in each month. Once the program is executed, the table sequence in A1 becomes as follows:

Here the result of grouping a record sequence is a sequence composed of sets. This is similar to getting a single field referencing sets. For example, the following groups the employee records by the birth month:

A | |

1 | $ select EID,NAME+’ ‘+SURNAME FULLNAME, BIRTHDAY from EMPLOYEE |

2 | =A1.group(month(BIRTHDAY)) |

A2’s grouping result is as follows:

Like the result in the previous example, this grouping result can be used for further handling as necessary:

A | |

1 | $ select EID,NAME+’ ‘+SURNAME FULLNAME, BIRTHDAY from EMPLOYEE |

2 | =A1.group(month(BIRTHDAY)) |

3 | =A2.new(#:Month,~.count():Count) |

4 | =A2.new(#:Month,~.minp(BIRTHDAY).FULLNAME:Oldest) |

A3 counts the employees born in each month. A4 lists the oldest employees born in each month. After computations, A3 and A4 get the following results:

]]>With databases, creating appropriate indexes for tables can greatly increase query efficiency. Similarly, you can create index sequences for record sequences or table sequences in esProc to enhance the efficiency of querying data repeatedly.

For example, you need to query the food order file **Order_Foods.txt **repeatedly.

A | B | C | |

1 | =file(“Order_Foods.txt”).import@t() | 50000 | 1000 |

2 | =A1(C1.(rand(B1)+1)).new(PName, Quantity) |

The following are records of food orders imported by A1 (altogether 50,000 records):

A2 gets 1,000 random records of food orders to list only the product names and purchase quantities for being used as the query conditions in the later test query (here duplicate values are allowed). A2’s data are as follows:

Then in order to explore the role of index sequences in speeding up a query, we’ll query the food order table in A1 according to the 1,000 product names in A2 with and without an index sequence respectively. We specify that only the first-found record will be returned.

First let’s look at the situation without an index sequence. Since records in **Order_Foods.txt** are ordered by **Date**, binary search cannot be used when performing the query by product names, otherwise errors will occur.

A | B | C | |

1 | =file(“Order_Foods.txt”).import@t() | 50000 | 1000 |

2 | =A1(C1.(rand(B1)+1)).new(PName, Quantity) | =now() | |

3 | =A2.(A1.select@1(PName==A2.PName && Quantity ==A2.Quantity)) | =interval@ms(B2,now()) |

Expressions in B2 and B3 get the current time through **now()** function and estimate the query time (millisecond). B3 stores the query results as follows:

The estimated query time in B3 is as follows:

Then let’s move to the situation where an index sequence is used:

A | B | C | |

1 | =file(“Order_Foods.txt”).import@t() | 50000 | 1000 |

2 | =A1(C1.(rand(B1)+1)).new(PName, Quantity) | =now() | |

3 | =A2.(A1.select@1(PName==A2.PName && Quantity ==A2.Quantity)) | =interval@ms(B2,now()) | =now() |

4 | =A1.psort(PName, Quantity) | ||

5 | =A2.(A1(A4.select@b1(cmp(A1(~).([PName,Quantity]), A2.~.([PName,Quantity]))))) | =interval@ms(C3,now()) |

For the query, A4 creates an index sequence corresponding to both **PName** and **Quantity**, thus binary search can be used to make query based on the index. For the sake of comparing the all-around performances of the two situations, the time taken to create an index sequence is included. The index sequence created by A4 is as follows:

As A5 uses the binary search to query data, the query condition should be modified to the mode * x*==0. A5 gets the same results as A3:

The estimated query time in B5 is as follows:

By comparing results in B3 and B5, you see that the second method is much more efficient. That is to say, query speed can be significantly increased by using binary search based on an index sequence. Of course creating an index sequence is also one of the computing steps. The more you use an index sequence to query data, the more efficient the query becomes. It is not so necessary to create an index sequence if the computation is not query-intensive.

For specific databases and query modes, it is unnecessary to create the index sequence each time query is executed. You can store the index sequence after it is created. For example:

A | |

1 | =file(“Order_Foods.txt”).import@t() |

2 | =A1.psort(PName, Quantity).new(~:Pos) |

3 | >file(“OF_Index_PQ”).export@b(A2) |

This way you just need to import the index file directly without having to re-create the index sequence for the next query.

A | B | C | |

1 | =file(“Order_Foods.txt”).import@t() | 50000 | 1000 |

2 | =A1(C1.(rand(B1)+1)).new(PName, Quantity) | =now() | |

3 | =file(“OF_Index_PQ”).import@b() | ||

4 | =A2.(A1(A3.select@b1(cmp(A1(A3.Pos).([PName, Quantity]), A2.~.([PName,Quantity]))).Pos)) | =interval@ms(B2,now()) |

By doing so, the query speed is faster than creating a new one.

When establishing a foreign key relationship with **switch** function for a table sequence or a cursor, each record needs to find its corresponding reference in the dimension table. This is similar to the query-intensive situation in the above example. esProc will create the index sequence automatically for the execution of **switch** function. For example:

A | B | |

1 | =file(“PersonnelInfo.txt”) | =demo.query(“select STATEID, NAME, ABBR from STATES”) |

2 | =now() | |

3 | =A1.cursor@t() | =A3.switch(State,B1:ABBR) |

4 | for 50 | =B3.fetch(1000) |

5 | >B3.close() | =interval@ms(A2,now()) |

6 | =now() | |

7 | =A1.cursor@t() | |

8 | for 50 | =A7.fetch(1000) |

9 | >B8.run(State=B1.select@1(ABBR==State)) | |

10 | >A7.close() | =interval@ms(A6,now()) |

From the 2^{nd} to the 5^{th} line, **switch** function relates the personnel information stored in the cursor to corresponding states information. The program cycles the association result 50 times, with 1,000 rows fetched each time. B5 computes the time taken to perform the processing. From the 6^{th} to the 10^{th} line,** **the** run** function cycles down through records of the table sequence 50 times to query the corresponding states data, and similarly, 1,000 rows are fetched each time for processing. B10 computes the time taken to perform this processing. The results of B5 and B10 are as follows:

In the example, the processing efficiency is apparently higher with **switch** function, despite that there is only a small amount of states data. Actually the bigger the data in the dimension table, the more efficient the processing will be by using **switch** function.

Similar to creating the index sequence, an index file can be created to increase efficiency when retrieving data from a large binary file. With the ** f.index(fi,k_{i},…)** function, you can first create an index file

An esProc set is an ordered set. Therefore, its members can be referenced by sequence numbers. A flexible exploitation of sequence numbers will give full play to esProc’s capability, handling a computation in a simpler and more efficient way.

Therefore it is recommended that certain esProc functions, like **delete()**, use a sequence number or an integer sequence (ISeq) of sequence numbers as a parameter.

The simplest application is to access members with their sequence numbers directly. This is the same as handling an array in a programming language.

A | |

1 | [1,3,5,7,9] |

2 | =A1(1) |

3 | =A1(3) |

4 | >A1(2)=4 |

5 | >A1(4)=8 |

A2 and A3 get members at specified positions from A1. Here’re their results:

A4 and A5 modify A1’s sequence. The following shows the changes step by step:

The m(**i**) function can get members backward or cyclically. It is a useful complement to ** A(i) **function.

A | |

1 | [1,3,5,7,9] |

2 | =A1.m(3) |

3 | =A1.m(-2) |

4 | =A1.m@r(6) |

5 | =A1.m@r(12) |

6 | =A1.m(6) |

A2 and A3 use ** A.m()** function to get values of members with specified sequence numbers. The number -2 represents the second-to-last member. With

Without the **@r** option, A6 returns the null value as the specified sequence number 6 exceeds the sequence’s length.

Additionally, esProc provides a series of **p**-headed functions for finding sequence numbers of members, as shown below:

A | |

1 | [3,5,1,9,7] |

2 | =A1.pos(5) |

3 | =A1.pmin() |

4 | =A1.pmax() |

5 | =A1.pselect(~%5==0) |

A2 finds the sequence number of a specified member. If there are multiple members with the same value, return only the sequence number of the first member. A3 and A4 respectively return sequence numbers of members having the maximum value and minimum value. A5 finds the sequence number of the first member that is a multiple of 5. Results of A2 to A5 are as follows:

The pos() function returns null if a specified member can’t found in a sequence. So it can be used to find whether a member is in a certain set or not.

A | |

1 | [3,5,1,9,7] |

2 | =A1.pos(1)!=null |

3 | =A1.pos(2)!=null |

Results of A2 and A3 are respectively as follows:

Through sequence numbers, you can obtain a subset of a set.

A | |

1 | [3,5,4,6,1] |

2 | =A1([1,3,5]) |

3 | =A1([3,5,2]) |

4 | =A1([4,1,3,1]) |

5 | >A1([1,3,5])=[12,43,28] |

6 | >A1([2,4,3])=0 |

A2, A3 and A4 get subsets of the sequence. Different from getting members, the functions use ISeqs composed of sequence numbers as their parameters. Results are as follows:

A5 and A6 modify members of the sequence in one go using sequence parameters. Below shows the changes of A1’s sequence step by step:

You can also use the m() function to get a subset by specifying a set of sequence numbers.

A | |

1 | [3,5,4,6,1] |

2 | =A1.m([1,-1]) |

3 | =A1.m@r([1,6,12]) |

4 | =A1.m@0([1,6,3]) |

You can use a negative number in ** A.m()** function’s ISeq parameter to represent a position counted backward, or you can use

If there is the @a option in a locate function, the sequence numbers of all members satisfying the specified condition will be returned.

A | |

1 | [3,2,1,9,6,9,1,2,8] |

2 | =A1.pos@a(2) |

3 | =A1.pmin@a() |

4 | =A1.pmax@a() |

5 | =A1.pselect@a(~%2==0) |

With **@a** option, A2 returns all positions of the value 2 in A1’s sequence. A3 returns the sequence numbers of all members having the smallest value. A4 returns the sequence numbers of all members having the biggest value. A5 returns the sequence numbers of all members that are multiples of 2. Their results are as follows:

To get the positions of multiple members at one time using pos function, the @i option may be required in certain cases.

A | |

1 | [3,2,1,9,6,9,1,2,8] |

2 | =A1.pos@i([2,9,8]) |

3 | =A1.pos@i([3,1,1,1]) |

4 | =A1.pos@i([1,2,3]) |

5 | =A1.pos([1,2,3]) |

6 | =A1.pos([1,1,2,2]) |

7 | =A1.pos([1,1,1,2,2,2,3,3]) |

**A****.pos@i()** finds positions of members given in the sequence parameter in one-way direction in order. Without @i, **A****.pos () **simply determines if every member of the parameter sequence is contained in sequence **A**. Results from A2 to A7 are as follows:

Both A3 and A4 return nulls, because A3 can’t find the third 1 specified by the parameter sequence in A1’s sequence and A4 can’t find 1, 2 and 3 in a one-way direction. In other words, **A****.pos@i()** returns only an increasing ISeq; if not all members can found in the rightward direction, it returns null.

The pos@i function returns null if a certain given member is not found in a sequence. Taking the order in which it works and the duplicate members into account, you cannot simply use this function to check if the specified subset is contained; instead, you should perform an intersection operation.

A | |

1 | [3,2,1,9,6,9,1,2,8] |

2 | =A1.pos@i([1,9,6])!=null |

3 | =A1.pos@i([1,2,3])==null |

4 | =A1.pos([1,2,3,4])==null |

5 | =A1.pos([1,2,3,3])!=null |

6 | [1,1,2,2] |

7 | [1,1,2,2,3,3] |

8 | =A6^A1==A6 |

9 | =A7^A1==A7 |

Results from A2 to A4 are as follows:

When **A****.pos@i(B****)** is used to make query and judgement, the non-null result indicates that members of *B* can be found in *A* in one-way direction sequentially and that *A* must contain *B*. But a null result only indicates that members of* B* cannot be found in *A* in one-way direction sequentially, it does not necessarily mean *A* does not contain *B*, as the case in A3.

If the result of *A***.pos( B)**

The following are results of A8 and A9:

It is feasible to determine if ** A** contains

Similar to the symbol ~, # in a loop function is used to represent the sequence number of the current member.

A | |

1 | [5,4,3,2,1] |

2 | =A1.(#) |

3 | =A1.(#+~) |

4 | =A1.select(#%3==2) |

5 | =A1.group(int((#-1)/2)) |

A2 gets an ISeq consisting of sequence numbers. A3 generates a sequence by adding each member and its sequence number. Using **select** function, A4 finds the second one of every three members in A1’s sequence, i.e. members whose sequence numbers are 2, 5, 8,…, to form a sequence. A5 groups A1’s sequence every two members. The following lists their results:

In a loop function, esProc uses the symbol [] to access a member in a relative position.

A | |

1 | [1,2,3,4,5] |

2 | =A1.(~[0]) |

3 | =A1.(~[1]) |

4 | =A1.((~-~[-1])/~[-1]) |

5 | =demo.query(“select * from STOCKRECORDS where STOCKID=000062”).sort(DATE) |

6 | =A5.((CLOSING-CLOSING[-1])/CLOSING[-1]) |

7 | 0 |

8 | =A5.max(if(CLOSING>CLOSING[-1],A7=A7+1,A7=0)) |

A2 lists each member of the sequence. A3 gets a member that is next to the current member. A4 calculates the increasing ratio of each member to the previous one. Their results are as follows:

A5 finds records of the stock with a specified ID. A6 calculates the growth rate of every day’s stock price. Then A8 calculates the maximum number of successive rising days. Results of A6 and A8 are as follows:

You can use the symbol {} to get members of a subset according to relative positions.

A | |

1 | [1,2,3,4,5] |

2 | =A1.(~{-1,1}) |

3 | =A1.(~{-1,1}.avg()) |

4 | =A1.(~{1-#,0}.sum()) |

5 | =A1.(~{,0}.sum()) |

6 | =A1.(~{0,}.sum()) |

For each position, A2 gets the current member, the member immediately preceding it and the one next to it. A3 calculates the moving average at each position. Both A4 and A5 calculate the cumulative sum. A6 calculates the reversed cumulative sum, that is, the sum of all the existing members as the position moves forward. Results from A2 to A6 are as follows:

The symbol # in a loop function represents the sequence number of the current member. So it is nothing but a number which can be calculated as any others. This sequence number can be used to map onto a member in another sequence, which is in effect accessing members in alignment.

A | |

1 | [1,2,3,4,5] |

2 | =A1.(A1(#)) |

3 | =A1.(A1.m(#-1)) |

4 | [5,4,3,2,1] |

5 | =A1.(~+A4(#)) |

6 | =A1++A4 |

7 | =10.(if(#%2==1,A1((#-1)/2+1),A4(#/2))) |

The sign # in an expression of a loop computation stands for the current sequence number. Results of A2, A3, A5, A6 and A7 are respectively as follows:

By accessing members of multiple sequences at the same time in alignment, an effect similar to accessing the fields of a record is created.

A | |

1 | [Bray,Jacob,Michael,John] |

2 | [65,87,98,72] |

3 | [76,82,78,88] |

4 | =A1.rank(A2(#)+A3(#)) |

5 | =A1.new(~:name,A4(#):rank) |

A4 calculates the ranking of total scores by getting scores stored at corresponding positions in different sequences. A5 creates a table sequence with name field and rank field by linking two sequences together according to positions of the members. Results of A4 and A5 are as follows:

Before an alignment access is executed, it is necessary that all the sequences are arranged in the same order. However, in practice, sequences are not always in the same order. Under such circumstances, you should use the align function to re-order the sequences according to the order of a certain standard sequence to allow proper alignment.

A | B | |

1 | =demo.query(“select * from EMPLOYEE”) | /The EMPLOYEE table |

2 | =demo.query(“select * from ATTENDANCE”).align(A1:EID,EMPLOYEEID) | /The ATTENDANCE table aligned with the EID field of the EMPLOYEE table |

3 | =demo.query(“select * from PERFORMANCE”).align(A1: EID, EMPLOYEEID) | /The PERFORMANCE table aligned with the EID field of the EMPLOYEE table |

4 | =A1.new(NAME,SALARY*(1+A2(#).ABSENCE+A3(#).EVALUATION):salaryPaid) | /A new table sequence is created to calculate salaries(here, A1, A2 and A3 are organized in the same order) |

5 | =demo.query(“select * from GYMSCORE where EVENT=’Vault'”) | /The Vault score table |

6 | =demo.query(“select * from GYMSCORE where EVENT=’Floor'”).align(A5:NAME,NAME) | /The Floor score table aligned with athlete NAME |

7 | =A5.(SCORE*0.6+A6(#).SCORE *0.4) | /Calculate weighted scores |

8 | =A7.rank() | /Return the ranking of weighted scores |

9 | =A5.new(NAME,A7(#):score,A8(#):rank) | /Create a table sequence having athlete names, weighted scores and ranks |

Both table sequences in A2 and A3 are aligned to A1’s employee EID. A4 creates a table sequence of employees’ salaries:

In A6, data is aligned with the athlete names in A5’s table sequence. Then A7 calculates weighted scores and A8 gets the ranking of weighted scores. Finally, A9 generates a resulting table sequence:

In fact, the align function align@a will also return a sequence aligned with the standard sequence. Each of its members is a set to which the alignment access applies.

A | B | |

1 | =demo.query(“select * from EMPLOYEE”) | //The EMPLOYEE table
/Alignment grouping by A2’s STATE field |

2 | [California,Texas,Pennsylvania] | |

3 | =A1.align@a(A2,STATE) | /Alignment grouping by A2’s STATE field |

4 | =A3.new(A2(#):STATE,~.count():Count,round(~.avg(age(BIRTHDAY)),2):Age) | /Search A2 for the corresponding field value through the sign # that represents a group number in A3 |

When the computation finishes, A4 gets the following result:

Without an option, the **align** function fetches the first member corresponding to each member of the standard sequence from the source sequence being aligned and returns a set consisting of these first members, instead of returning a set consisting of subsets. If we already know that there is only one member in each group, using this function is equivalent to sorting these members according to a standard sequence.

The alignment access also applies in enumeration grouping except in **enum@1**.

A | |

1 | =demo.query(“select * from EMPLOYEE”) |

2 | [AgeGroup1,AgeGroup2,AgeGroup3] |

3 | [?<=30,?>30 && ?<=40,?>40] |

4 | =A1.enum(A3,age(BIRTHDAY)) |

5 | =A4.new(A2(#):AgeInterval,~.count():Count) |

A5 calculates the number of employees in each of the three age groups:

An integer sequence is a special set that is applicable to all the set operations. In addition, its members can be used as sequence numbers to access a subset of another sequence. A flexible use of integer sequences is vital for starting approaching problems with sequence numbers.

A | |

1 | =to(10) |

2 | =to(3,8) |

3 | =A1.step(3,2) |

4 | =20.step(4,2,3) |

The** to** function generates a sequence consisting of consecutive integers. In **step** function, you can set the interval between members of an ISeq and other parameters. Results from A1 to A4 are as follows:

You can process a subset according to an integer sequence consisting of the positions of the subset’s members in the source set.

A | B | |

1 | =to(100) | /Return a sequence consisting of numbers 1, 2, …100. |

2 | =A1(100.step(14,7))=0 | /Assign 0 to the members whose sequence numbers are multiple of 7 (from the 14th member). |

3 | =A1.run(if(~>1,A1(100.step(~,~+~))=0,0)) | |

4 | =A1.select(~>1) | /Generate a list of prime numbers. Assign 0 to the members whose sequence numbers are composite numbers, and then the rest of the members are prime numbers. |

5 | =100.(rand()) | /Generate 100 random numbers. |

6 | =A5(to(50)) | |

7 | =A5(to(51,100)) | |

8 | >A5(100.step(2,1))=A6 | |

9 | >A5(100.step(2,2))=A7 | /Shuffle A5 by alternatively exchanging members in the first half and the second half. |

After a sequence is sorted, the previous order of the members of the sequence will be lost. However, this order could be useful in certain situations. For example, you might want to know the hiring order of the three oldest employees in the company, or the growth ratios for the top three trading days in terms of price, and so on.

esProc offers the psort function to return the original sequence numbers of the sorted members.

A | |

1 | [c,b,a,d] |

2 | =A1.psort() |

3 | =A1(A2) |

4 | =A1.sort() |

5 | =A3==A4 |

Results from A2 to A5 are as follows:

In other words, in an integer sequence returned by the psort function, the first number is the sequence number of the member which is in the first place in the sorted sequence; the second member is the sequence number of the member which stands in the second place, and so on.

From an ISeq returned by **psort() **function, you can use the inv() function to get an ISeq with the original order of members adjusted to undo the sorting operation.

A | |

1 | [c,b,d,a] |

2 | =A1.sort() |

3 | =A1.psort().inv() |

4 | =A2(A3) |

5 | =A4==A1 |

Results from A2 to A5 are as follows:

With the psort function, it’s convenient to solve the above problems requiring that the original order be kept.

A | B | |

1 | =demo.query(“select * from EMPLOYEE “).sort(HIREDATE) | |

2 | =A1.psort(BIRTHDAY:-1) | /Return an ISeq of sequence numbers of the records ordered by birthday. |

3 | =A2(to(3)) | /An ISeq containing the sequence numbers of the records of three youngest employees in A1. |

4 | =demo.query(“select * from STOCKRECORDS where STOCKID=000062”).sort(DATE) | |

5 | =A4.psort(CLOSING:-1) | /Return an ISeq of the original sequence numbers (in A4) of the records sorted in descending order by closing price. |

6 | =A5(to(3)) | /The sequence numbers of the three records in A4 with the highest closing prices. |

7 | =A6.(A4(~).CLOSING/A4.m@0(~-1).CLOSING-1) | /The growth rates in price for the three days. They should be calculated with the sequence numbers in A4. |

8 | =A6.(A4.calc(~,(CLOSING-CLOSING[-1])/CLOSING[-1])) | /Use the calc function to simplify A7’s expression. |

A binary search is widely recognized for its high search efficiency; however, it requires that an original sequence is sorted by the keyword. So, before a binary search is executed, the original sequence must be sorted, like searching for a member in the sequence. However, this is not suitable for all situations. For example, if you want to find the sequence numbers of the members in the original sequence, sorting before searching will disrupt the original order, which should be recovered by using the psort() function.

A | B | |

1 | =demo.query(“select * from EMPLOYEE “).sort(HIREDATE) | |

2 | =A1.psort(NAME) | /An ISeq of original sequence numbers of A1’s records in EMPLOYEE table sorted by NAME |

3 | =A1(A2) | /A record sequence formed after the EMPLOYEE table is sorted by NAME |

4 | =A3.pselect@b(NAME:”David”) | /Use the binary search to search for the sequence number of David in A3 |

5 | =A2(A4) | /David’s sequence number in the original record sequence |

In this case, psort creates a binary search index for the original sequence. There could be one or more search indexes based on different keywords for a single sequence.

An alignment grouping function can also return ISeqs that each contain sequence numbers, instead of a sequence of groups containing records.

A | B | |

1 | =demo.query(“select * from SALES”).sort(AMOUNT:-1) | /Sort SALES by AMOUNT. |

2 | [QUICK,ERNSH,HANAR,SAVEA] | |

3 | =A1.align@1p(A2,CLIENT) | /Alignment grouping by A2’s sequence of clients; sequence numbers are returned. |

4 | =A3.new(A2(#):NAME,A1(~).AMOUNT: Amount,~:Rank) | /Use A3’s sequence numbers to search for the corresponding amounts and ranks. |

After getting sequence numbers of the desired records, you can achieve the computing goal by performing the locating computation **A**.calc().The locating computation can avoid unnecessary computations and increase efficiency.

A | |

1 | =file(“VoteRecord”) |

2 | =A1.import@b() |

3 | [Califonia,Ohio,Illinois] |

4 | =A2.pselect@a(A3.pos(State)>0) |

5 | =A2.calc(A4,Votes[-1]-Votes+1) |

Results of A2, A4 and A5 are as follows:

In this case, the binary file VoteRecord stores the data of poll results, with a descending sort by the number of votes. A4 obtains a sequence of EIDs of the employees from the specified states. A5 calculates the number of votes they need in order to moving up according to A4’s EID sequence. For example, Ryan Williams, now ranking 3^{rd}, needs another 69 votes to move up one place. Here inter-row operation is needed, and the computation needs more than the data of selected employees.

esProc applies set theory in a uniquely deep, common way, compared with traditional programming languages. The sequence in esProc is in essence a type of set. It’s important to learn to think in sets all the time when working with esProc.

The sequence, as well as integer and string, is one of the most basic esProc data types. **A variable value, the result of an expression and the return value that a function returns could all be a sequence**.

For set type data, esProc provides basic operators for performing intersection, concatenation, union and difference operations between two sets *A* and *B*, respectively *A^B*，*A|B*，*A&B*，*A\B*. A deep understanding and skillful manipulation of these set operations will enable you to start approaching problems more by set thinking, fully and adroitly exploiting the supplied data to obtain a simple and easier solution.

The following is an example of using set operations to simplify code:

A | |

1 | =demo.query(“select * from EMPLOYEE”) |

2 | =A1.select(GENDER==”M”) |

3 | =A1.select(STATE==”California”) |

4 | =A2^A3 |

5 | =A1.select(GENDER==”M” && STATE==”California”) |

6 | =A2&A3 |

7 | =A1.select(GENDER==”M” || STATE==”California”) |

8 | =A2\A3 |

9 | =A1.select(GENDER==”M” && STATE!=”California”) |

A4 and A5 use different methods to find male employees from the state of California; A6 and A7 select male employees or employees from the state of California using different approaches; A8 and A9 finds male employees from states outside of California in different ways. Note that A6 and A7, despite having got the same employee information, present records in different orders in the results.

Unlike a mathematical set, the order of members in an esProc set, like a sequence or a table sequence, matters, and an esProc set can also have members which are identical.

A | |

1 | [1,2,3,4] |

2 | [1,3,3,2] |

3 | =[1,2,3]==[1,3,2] |

A2’s sequence has duplicate members. A3 returns a **false** because the two sequences are not equal, in that they have different orders.

Mathematically, the intersection operation and union operation are both commutative; in other words, A∩B = B∩A and A∪B= B∪A. However, this commutative property is not valid in esProc because the order of members counts. The order of members in the result sets of the intersection and union operations are determined by the order of the left operand.

A | |

1 | [1,2,3] |

2 | [3,1,5] |

3 | =A1^A2 |

4 | =A2^A1 |

5 | =A1&A2 |

6 | =A2&A1 |

The results of A3, A4, A5 and A6 are separately listed below:

Because an esProc sequence is ordered, you can’t simply use the comparison operator == to find whether two sequences have same members, but you should use the ** A.eq(B)** function.

A | |

1 | =[1,2,3]==[3,2,1] |

2 | =[1,2,3]==[3,2,1].sort() |

3 | =[1,2,3].eq([3,2,1]) |

4 | =[1,2,3].eq([3,2,2]) |

5 | =[1,2,2,3].eq([3,2,1,2]) |

6 | =[1,2,2,3].eq([3,2,3,1]) |

A1 and A2 check if two sequences are equal. The results are:

A3, A4, A5 and A6 determine if two sequences have same members. Results are respectively as follows:

With the set data type, you can handle many operations on members of sets in a single line of code, without having to write the loop code.

A | |

1 | [3,4,1,3,6] |

2 | =A1.sum() |

3 | =A1.avg() |

4 | =A1.max()-A1.min() |

A2 calculates the sum of members in the sequence. A3 calculates their average. A4 finds the difference between the maximum member and the minimum member. Their results are respectively as follows:

Sometimes, the loop functions perform operations not on members of the set but on the values computed based on each of the members. In this case, you can use arguments to specify an expression in which “~” represents the current member.

A | |

1 | [3,4,1,3,6] |

2 | =A1.sum(~*~) |

3 | =demo.query(“select * from EMPLOYEE”) |

4 | =A3.min(~.BIRTHDAY) |

5 | =A3.min(BIRTHDAY) |

6 | =A3.avg(age(BIRTHDAY,HIREDATE)) |

A2 calculates the sum of the squares of members. Here’s the result:

A4, A5 and A6 perform computations based on the table sequence of employee information got by A3’s query. A4 finds the birth date of the oldest employee. Here’s the result:

“~” in A4’s expression can be omitted. So A5 gets the same result as A4 does.

A6 calculates the average hiring age of the employees. Here’s the result:

The execution of an aggregate function with arguments can be divided into two steps:

1) Add a computed column according to the arguments;

2) Aggregate the column.

That is, * A*.

For the nested loop functions, “~” is interpreted as a member in the innermost sequence. In such cases, if you want to reference a member in an outer sequence, precede “~” with the name of the sequence.

A | |

1 | [1,2,3] |

2 | [-1,-2,-3] |

3 | =A1.sum(A2.sum(~*~)) |

4 | =A1.sum(A2.sum(A1.~*~)) |

5 | =A1.sum(~*A2.sum(~)) |

6 | =A1.sum(A1.sum(A1.~*~)) |

A3 calculates the result of multiplying 3 by the sum of squares of A2’s members. A4 calculates the sum of members of the cross product of sequence A1 and sequence A2, and A5 is another way of calculating the sum of members of the cross product of the two sequences. In A6, the expression in the inner loop can’t reference members of the sequence in the outer loop, so the result is 3 times of the sum of squares of A1’s members. The following are results of A3~A6:

A6’s case is also applicable to the field reference where ~ is omitted. The field will be interpreted as one in an inner record sequence; if such a field cannot be found in the inner record sequence, the program will search the next outer layer.

The program cycles through members of the original sequence, performing its computation on one member after another according to the arguments. This is a feature of which we can make full use.

A | |

1 | [1,3,2,5,4,8,7] |

2 | 0 |

3 | =A1.(A2=A2+~) |

4 | [1,1,0,0,1,0,0,0,1,0,1,0,0,0] |

5 | 0 |

6 | =A4.max(if(~==0,A5=A5+1,A5=0)) |

By running a loop, A3 calculates a sequence of cumulative sums of members in A1’s sequence:

A6 calculates the maximum number of successive 0s in A4’s sequence:

So in many cases, you can use just a single expression to achieve what a simple loop statement can do.

Different from such loop functions as **sum** and **avg** that return a single aggregate value, the * A*.(

A | |

1 | [1,2,3] |

2 | =A1.(~*~) |

3 | =A1.(~) |

4 | =A1.() |

5 | =A1.(1) |

6 | =A1.(if(~%2==0,~,0)) |

7 | I love you |

8 | =len(A7).(mid(A7,~,1)) |

9 | =A8.countif(;~==”o”) |

The code from A2 to A6 perform computations based on A1’s sequence: A2 calculates each member’s square; both A3 and A4 list members of the sequence in order; A5 lists 1 cyclically; A6 loops through members to return 0 if the member is an odd number, but return the current member value otherwise. Their results are respectively as follows:

A8 splits A7’s string to form a sequence of single characters. A9 counts how many times the letter o appears. Their results are as follows:

The** new** function is used to return a table sequence obtained by sequence computing.

A | |

1 | [1,2,3,4,5] |

2 | =A1.new(~:Origin,~*~:Square) |

3 | =demo.query(“select * from EMPLOYEE”) |

4 | =A3.new(NAME,age(BIRTHDAY):Age) |

5 | =A3.new(NAME) |

6 | =A3.(NAME) |

A2 returns a new table sequence by cycling through members of A1’s sequence to compute each of them. “~” in the expression represents the current member. Here’s the result:

Based on A3’s table sequence, A4 and A5 respectively create a two-field table sequence and single-field table sequence. Here’re the results:

A6 runs a loop to get a sequence composed of values of NAME field, based on A3’s table sequence. Note that the result is different from that of A5:

In addition, there is a** run** function for sequence computing that returns the original sequence instead of the result of loop computation. Generally, it is used to assign values to fields in a record sequence (or table sequence).

A | |

1 | =demo.query(“select * from EMPLOYEE”) |

2 | =A1.new(NAME,age(BIRTHDAY):Age) |

3 | =A2.run(Age=Age+1) |

A2 creates a new table sequence where names of employees are listed and their ages are calculated. Based on this table sequence, A3 adds 1 to each employee’s age. As the **run** function only modifies A2’s original table sequence, it returns the same table sequence A3 creates. The following step-by-step execution shows how A2’s table sequence changes:

In a nested loop function, you can’t use ~, or * A*.~, to represent members of the same sequence in different layers of the loop. To list all possible two-letter combinations according to the letters A, B and C, for example:

A | |

1 | [A,B,C] |

2 | =A1.(A1.(A1.~+~)).conj() |

3 | =A1.((a=~,A1.(a+~))).conj() |

4 | =A1.((a=~,A1.((b=~,A1.(a+b+~))))).conj().conj() |

A2 uses **A.~** to represent members of the outer loop. But since the inner loop cycles the same sequence A1, actually A2’s expression is equivalent to **=A1.(A1.(~+~)).conj()**, which returns the following result:

In order to tackle the problem, you can use two parenthesis operators to enclose a series of expressions – in the form of (* x_{1},x2,…,x_{k}*) – in performing a loop computation. The expressions will be evaluated in order and the function will return the result of the last expression. With the parentheses, A3 first assigns value to the program variable

esProc places no restriction on the data types of the members in a sequence, allowing numbers, strings and records to get together to form a sequence.

A | |

1 | [1,a3,2,5.4,$[4.5],2011-8-8] |

2 | =[A1,4] |

A1’s sequence contains members of various data types. A2’s sequence is composed of A1’s sequence and a single-value member. The following lists data in A1 and A2:

Most of the time in the real-world applications, however, there’s little significance to arrange data of various types in one sequence. Therefore, it is not necessary to pay too much attention to it.

Yet a record sequence – a sequence consisting of records – may consist of records from different table sequences. From this point of view, the feature offers concrete convenience.

A | |

1 | =demo.query(“select * from EMPLOYEE”) |

2 | =demo.query(“select * from FAMILY”) |

3 | =A1|A2 |

4 | =A3.count(left(GENDER,1)==”F”) |

A4 counts the females in employees and their families. You can perform the computation provided that both employee table and family table contain a GENDER field, regardless of their data structures.

In esProc, it doesn’t matter that records in a record sequence originate from different table sequences. As long as they have fields with the same names, the records can be processed uniformly. This enables simpler program writing, higher efficiency and less memory usage. In SQL, however, two tables with different structures must be united into a new one by using the UNION clause before any operations can be performed.

Since esProc lets any object to be a member of a set, a set itself can be a member of a larger set. If A is a set consisting of other sets, you can use functions ** A.conj()**,

A | |

1 | [[1,2,3,4,5],[1,3,5,7,9],[2,3,5,7]] |

2 | =A1.conj() |

3 | =A1.isect() |

4 | =A1.(~.sum()) |

5 | =A1.(~.(~*~)) |

A1 is a sequence comprising other sequences. According to A1’s sequence, A2, A3, A4 and A5 respectively obtain a concatenation sequence, an intersection sequence, a sequence containing sums of members in the sub-sequences, and a sequence composed of sequences generated each by calculating the squares of members in each sub-sequence. Their results are as follows:

A record sequence can also be a member of a sequence.

A | |

1 | =demo.query(“select * from EMPLOYEE”) |

2 | =A1.select(STATE==”California”) |

3 | =A1.select(STATE==”Indiana”) |

4 | =A1.select(STATE==”Florida”) |

5 | =[A2,A3,A4] |

6 | =A5.(~.count()) |

7 | =A5.(~(1).STATE) |

8 | =A5.(STATE) |

9 | =A5.new(STATE,~.count():Count) |

A2, A3 and A4 respectively retrieve records of employees from California, Indiana and Florida. A5 generates a sequence consisting of these three record sequences. It is a set of sets.

A6 counts the employees from each of the three states. Here’s the result:

A7 gets names the states. ~(1) in the expression can be omitted, so expressions in A8 and A7 are equivalent and they have the same results:

A9 creates a table sequence, based on A5’s sequence, to count the employees from each state. Here’s the result:

Data grouping is a common SQL operation. But its real meaning is still far from being fully grasped by most of us. In essence, grouping data is splitting a set into multiple subsets according to a certain rule. In other words, the return value of a group operation is a set consisting of subsets. Often what may be of interest to people isn’t the set itself, but certain aggregate values of its subsets. Hence, group operations are often followed by aggregate operations on subsets.

This is the very way that SQL handles data grouping. SQL **GROUP BY** clause is always followed by an aggregate operation. Another reason for SQL to impose an aggregate operation on a group operation is that it hasn’t the explicit set data type for returning a set containing subsets directly.

This explains why one has got used to the conception that the group operations should go hand in hand with the aggregate operations and forgotten that they are in fact two independent operations.

However, sometimes you might still be interested in the subsets, instead of the aggregate values. Even if we might only care about the aggregate values, the subsets are worth retaining for reuse in subsequent computations, rather than being discarded once an aggregate operation is completed and thus being re-generated if needed.

This requires restoring the true sense of group operations. With set-based thinking fully embedded, esProc has achieved this requirement. It separates the aggregate operations from the group operations in its basic group functions.

A | B | |

1 | =demo.query(“select * from EMPLOYEE”) | |

2 | =A1.group(month(BIRTHDAY),day(BIRTHDAY)) | /Group by Birthday |

3 | =A2.select(~.len()>1) | /Employees with same birthday as others |

4 | =A3.conj() | |

5 | =A1.group(STATE) | /Group by State |

6 | =A5.new(~(1).STATE:State,~.count():Count) | /Use the grouping result to create a TSeq and counts the employees from each state |

7 | =A5.new(STATE,~.avg(interval@y(BIRTHDAY,now())):Age) | /Use the grouping result again to create another TSeq and calculates the average age of employees from each state |

The grouping result is a set of subsets, and the subsets can still be grouped. Each member of the grouping result is also a set, which can also be further grouped. Both operations will produce a multilayer set.

A | B | |

1 | =demo.query(“select * from EMPLOYEE”) | |

2 | =A1.group(year(BIRTHDAY)) | /Group by the year of birth |

3 | =A2.group(int(year(~(1).BIRTHDAY)%100/10)) | |

4 | =A2.group(int(year(BIRTHDAY)%100/10)) | |

5 | =A2.(~.group(month(BIRTHDAY))) | /Group members of the grouping result; A3,A4 and A5 respectively return a sequence consisting of RSeqs |

Because these results have too many layers, they are rarely used in real-world business. The example is cited only to show the set-based thinking pattern and the nature of the set operations.

esProc group function groups data and sorts the grouping result according to the grouping expression. For example:

A | B | |

1 | $ select EID,NAME+’ ‘+SURNAME FULLNAME, DEPT from EMPLOYEE | |

2 | =A1.group(DEPT) | =A2.new(~.DEPT:DEPT,~.count():Count) |

3 | =A2.sort(~.DEPT:-1) | =A3.new(~.DEPT:DEPT,~.count():Count) |

4 | =A1.group@u(DEPT) | =A4.new(~.DEPT:DEPT,~.count():Count) |

5 | =A1.group@o(DEPT) | =A5.new(~.DEPT:DEPT,~.count():Count) |

A1 gets a table sequence as follows:

A2 groups employee records by department. By default, A2 will sort the grouping result by department names in alphabetically ascending order. For the sake of viewing, B2 counts how many employees in each department after the data grouping. Here’re results of A2 and B2:

A3 re-sorts A2’s grouping result by department in descending order. B3 also calculates the number of employees in each department. Here’re results of A3 and B3:

Apart from re-sorting the grouping result, you can change the orders of the resulting groups by adding certain options to the group function. By using **@u** option, A4 is able to produce groups of departments that are arranged in order of their appearance in the original employee table. With **@o** option, A5 simply places the neighboring records with the same department values into one group without first sorting the records, resulting in duplicate department groups. B4 and B5 count the employees in each group for the two group operations respectively, and here’re the results:

Besides common group functions, esProc also provides **align@a()** function for performing alignment grouping and **enum()** function for enumeration grouping.

The data grouping implemented by the **group** function is called equi-grouping, which has the following features:

1) Any member in the original set must and can only be in one subset;

2) There is no empty subset.

Both the alignment grouping and the enumeration grouping haven’t the two features.

Alignment grouping is an operation that calculates the grouping expression with members of a set and groups the result set by mapping members with values of a specified sequence. To perform the alignment grouping, the following steps are required:

1) Specify a set of values;

2) Put members in the set to be grouped whose certain attribute matches one of the specified values to one subset;

3) Each resulting subset must correspond to a pre-defined value.

It is possible that a certain member exists in none of the subsets, or that an empty set appears, or that a member appears in two subsets.

The following example groups records of employees by a specified sequence of states:

A | |

1 | =demo.query(“select * from EMPLOYEE”) |

2 | [California,Florida,Chicago] |

3 | =A1.align@a(A2,STATE) |

A3 performs alignment grouping according to A2’s sequence. It maps the name of each state with members in A2. With this type of grouping, it is probably that certain employees don’t belong to any of the groups, or that empty groups containing not a member appear. A3’s result is as follows:

Enumeration grouping is defined as follows: First, specify a set of conditions, take members in the set to be grouped as arguments to evaluate the conditional expressions, and group members satisfying different conditions into different subsets, each of which corresponds to one of these pre-defined conditions. Maybe a certain member can’t be placed any of these subsets, or can be classified in two subsets at the same time; and an empty set may appear.

The following example groups records of employees by specified age groups:

A | |

1 | =demo.query(“select * from EMPLOYEE”) |

2 | [?<=30,? <=40,?>40] |

3 | =A1.enum(A2,age(BIRTHDAY)) |

4 | [?<=30,?>20 && ?<=40,?>50] |

5 | =A1.enum@r(A4,age(BIRTHDAY)) |

A3 performs enumeration grouping according to A2’s conditional sequence. An employee record in A1 can be in none of the groups but is not allowed to exist in two groups at the same time. A3’s result is as follows:

In this case, an employee may be put into the group satisfying the first condition. Though employees who are 30 or below are at the same time satisfies the condition of 40 or below, they will not be distributed to the second group.

A5 also performs enumeration grouping according to A2’s conditional sequence. Here **@r** option is used to allow repeated grouping. So an employee record may appear in more than one group. A4’s result is as follows:

Though it seems that these two functions differ greatly with** group** function, you will understand, having the nature of group operations in mind, that they actually do the same thing – splitting a set into multiple subsets. The only difference is that they split sets in different ways.