Using Sequence Numbers and Locating Data in esProc

Blog 2004 0

An esProc set is an ordered set. Therefore, its members can be referenced by sequence numbers. A flexible exploitation of sequence numbers will give full play to esProc’s capability, handling a computation in a simpler and more efficient way.

1. Accessing members

Therefore it is recommended that certain esProc functions, like delete(), use a sequence number or an integer sequence (ISeq) of sequence numbers as a parameter.

The simplest application is to access members with their sequence numbers directly. This is the same as handling an array in a programming language.

  A
1 [1,3,5,7,9]
2 =A1(1)
3 =A1(3)
4 >A1(2)=4
5 >A1(4)=8

A2 and A3 get members at specified positions from A1. Here’re their results:

esProc_number_locate_1

A4 and A5 modify A1’s sequence. The following shows the changes step by step:

esProc_number_locate_2

The m(i) function can get members backward or cyclically. It is a useful complement to A(i) function.

  A
1 [1,3,5,7,9]
2 =A1.m(3)
3 =A1.m(-2)
4 =A1.m@r(6)
5 =A1.m@r(12)
6 =A1.m(6)

A2 and A3 use A.m() function to get values of members with specified sequence numbers. The number -2 represents the second-to-last member. With @r option, A4 and A5 find members cyclically if the specified sequence number exceeds the number of members. For example the sequence number 12 locates the second member after it cycles through A1’s members twice. Computations from A2 to A5 get the following results:

esProc_number_locate_3

Without the @r option, A6 returns the null value as the specified sequence number 6 exceeds the sequence’s length.

Additionally, esProc provides a series of p-headed functions for finding sequence numbers of members, as shown below:

  A
1 [3,5,1,9,7]
2 =A1.pos(5)
3 =A1.pmin()
4 =A1.pmax()
5 =A1.pselect(~%5==0)

A2 finds the sequence number of a specified member. If there are multiple members with the same value, return only the sequence number of the first member. A3 and A4 respectively return sequence numbers of members having the maximum value and minimum value. A5 finds the sequence number of the first member that is a multiple of 5. Results of A2 to A5 are as follows:

esProc_number_locate_4

The pos() function returns null if a specified member can’t found in a sequence. So it can be used to find whether a member is in a certain set or not.

  A
1 [3,5,1,9,7]
2 =A1.pos(1)!=null
3 =A1.pos(2)!=null

Results of A2 and A3 are respectively as follows:

esProc_number_locate_5

2. Accessing subsets

Through sequence numbers, you can obtain a subset of a set.

  A
1 [3,5,4,6,1]
2 =A1([1,3,5])
3 =A1([3,5,2])
4 =A1([4,1,3,1])
5 >A1([1,3,5])=[12,43,28]
6 >A1([2,4,3])=0

A2, A3 and A4 get subsets of the sequence. Different from getting members, the functions use ISeqs composed of sequence numbers as their parameters. Results are as follows:

esProc_number_locate_6

A5 and A6 modify members of the sequence in one go using sequence parameters. Below shows the changes of A1’s sequence step by step:

esProc_number_locate_7

You can also use the m() function to get a subset by specifying a set of sequence numbers.

  A
1 [3,5,4,6,1]
2 =A1.m([1,-1])
3 =A1.m@r([1,6,12])
4 =A1.m@0([1,6,3])

You can use a negative number in A.m() function’s ISeq parameter to represent a position counted backward, or you can use @r option with the function to indicate a turn-back if a specified sequence number exceeds the boundary of the sequence. Besides, @0 option means the null value corresponding to a sequence number that is beyond the sequence’s boundary will not appear in the result. Results of A2, A3 and A4 are as follows:

esProc_number_locate_8

If there is the @a option in a locate function, the sequence numbers of all members satisfying the specified condition will be returned.

  A
1 [3,2,1,9,6,9,1,2,8]
2 =A1.pos@a(2)
3 =A1.pmin@a()
4 =A1.pmax@a()
5 =A1.pselect@a(~%2==0)

With @a option, A2 returns all positions of the value 2 in A1’s sequence. A3 returns the sequence numbers of all members having the smallest value. A4 returns the sequence numbers of all members having the biggest value. A5 returns the sequence numbers of all members that are multiples of 2. Their results are as follows:

esProc_number_locate_9

To get the positions of multiple members at one time using pos function, the @i option may be required in certain cases.

  A
1 [3,2,1,9,6,9,1,2,8]
2 =A1.pos@i([2,9,8])
3 =A1.pos@i([3,1,1,1])
4 =A1.pos@i([1,2,3])
5 =A1.pos([1,2,3])
6 =A1.pos([1,1,2,2])
7 =A1.pos([1,1,1,2,2,2,3,3])

A.pos@i() finds positions of members given in the sequence parameter in one-way direction in order. Without @i, A.pos () simply determines if every member of the parameter sequence is contained in sequence A. Results from A2 to A7 are as follows:

esProc_number_locate_10

Both A3 and A4 return nulls, because A3 can’t find the third 1 specified by the parameter sequence in A1’s sequence and A4 can’t find 1, 2 and 3 in a one-way direction. In other words, A.pos@i() returns only an increasing ISeq; if not all members can found in the rightward direction, it returns null.

The pos@i function returns null if a certain given member is not found in a sequence. Taking the order in which it works and the duplicate members into account, you cannot simply use this function to check if the specified subset is contained; instead, you should perform an intersection operation.

  A
1 [3,2,1,9,6,9,1,2,8]
2 =A1.pos@i([1,9,6])!=null
3 =A1.pos@i([1,2,3])==null
4 =A1.pos([1,2,3,4])==null
5 =A1.pos([1,2,3,3])!=null
6 [1,1,2,2]
7 [1,1,2,2,3,3]
8 =A6^A1==A6
9 =A7^A1==A7

Results from A2 to A4 are as follows:

esProc_number_locate_11

When A.pos@i(B) is used to make query and judgement, the non-null result indicates that members of B can be found in A in one-way direction sequentially and that A must contain B. But a null result only indicates that members of B cannot be found in A in one-way direction sequentially, it does not necessarily mean A does not contain B, as the case in A3.

If the result of A.pos(B) is null, there must be certain members of B that you cannot find in A and it is necessary that A does not contain B. But from the non-null result you cannot deduce that A contains B, because it’s possible that there are duplicate members in B. A5 is such an example.

The following are results of A8 and A9:

esProc_number_locate_12

It is feasible to determine if A contains B using the formula B^A==B. According to the results of A8 and A9, A1 contains A6 but it does not contain A7. Note that the operands in the formula can’t be commuted, otherwise the order of members in the result of A^B could be different from the order of members in B and the judgment won’t be rightly made.

3. Locating data with loop functions

Similar to the symbol ~, # in a loop function is used to represent the sequence number of the current member.

  A
1 [5,4,3,2,1]
2 =A1.(#)
3 =A1.(#+~)
4 =A1.select(#%3==2)
5 =A1.group(int((#-1)/2))

A2 gets an ISeq consisting of sequence numbers. A3 generates a sequence by adding each member and its sequence number. Using select function, A4 finds the second one of every three members in A1’s sequence, i.e. members whose sequence numbers are 2, 5, 8,…, to form a sequence. A5 groups A1’s sequence every two members. The following lists their results:

esProc_number_locate_13

In a loop function, esProc uses the symbol [] to access a member in a relative position.

  A
1 [1,2,3,4,5]
2 =A1.(~[0])
3 =A1.(~[1])
4 =A1.((~-~[-1])/~[-1])
5 =demo.query(“select * from STOCKRECORDS where STOCKID=000062”).sort(DATE)
6 =A5.((CLOSING-CLOSING[-1])/CLOSING[-1])
7 0
8 =A5.max(if(CLOSING>CLOSING[-1],A7=A7+1,A7=0))

A2 lists each member of the sequence. A3 gets a member that is next to the current member. A4 calculates the increasing ratio of each member to the previous one. Their results are as follows:

esProc_number_locate_14

A5 finds records of the stock with a specified ID. A6 calculates the growth rate of every day’s stock price. Then A8 calculates the maximum number of successive rising days. Results of A6 and A8 are as follows:

esProc_number_locate_14

You can use the symbol {} to get members of a subset according to relative positions.

  A
1 [1,2,3,4,5]
2 =A1.(~{-1,1})
3 =A1.(~{-1,1}.avg())
4 =A1.(~{1-#,0}.sum())
5 =A1.(~{,0}.sum())
6 =A1.(~{0,}.sum())

For each position, A2 gets the current member, the member immediately preceding it and the one next to it. A3 calculates the moving average at each position. Both A4 and A5 calculate the cumulative sum. A6 calculates the reversed cumulative sum, that is, the sum of all the existing members as the position moves forward. Results from A2 to A6 are as follows:

esProc_number_locate_15

4. Alignment access

The symbol # in a loop function represents the sequence number of the current member. So it is nothing but a number which can be calculated as any others. This sequence number can be used to map onto a member in another sequence, which is in effect accessing members in alignment.

  A
1 [1,2,3,4,5]
2 =A1.(A1(#))
3 =A1.(A1.m(#-1))
4 [5,4,3,2,1]
5 =A1.(~+A4(#))
6 =A1++A4
7 =10.(if(#%2==1,A1((#-1)/2+1),A4(#/2)))

The sign # in an expression of a loop computation stands for the current sequence number. Results of A2, A3, A5, A6 and A7 are respectively as follows:

esProc_number_locate_17

By accessing members of multiple sequences at the same time in alignment, an effect similar to accessing the fields of a record is created.

  A
1 [Bray,Jacob,Michael,John]
2 [65,87,98,72]
3 [76,82,78,88]
4 =A1.rank(A2(#)+A3(#))
5 =A1.new(~:name,A4(#):rank)

A4 calculates the ranking of total scores by getting scores stored at corresponding positions in different sequences. A5 creates a table sequence with name field and rank field by linking two sequences together according to positions of the members. Results of A4 and A5 are as follows:

esProc_number_locate_18

5. Aligning sequences

Before an alignment access is executed, it is necessary that all the sequences are arranged in the same order. However, in practice, sequences are not always in the same order. Under such circumstances, you should use the align function to re-order the sequences according to the order of a certain standard sequence to allow proper alignment.

  A B
1 =demo.query(“select * from EMPLOYEE”) /The EMPLOYEE table
2 =demo.query(“select * from ATTENDANCE”).align(A1:EID,EMPLOYEEID) /The ATTENDANCE table aligned with the EID field of the EMPLOYEE table
3 =demo.query(“select * from PERFORMANCE”).align(A1: EID, EMPLOYEEID) /The PERFORMANCE table aligned with the EID field of the EMPLOYEE table
4 =A1.new(NAME,SALARY*(1+A2(#).ABSENCE+A3(#).EVALUATION):salaryPaid) /A new table sequence is created to calculate salaries(here, A1, A2 and A3 are organized in the same order)
5 =demo.query(“select * from GYMSCORE where EVENT=’Vault'”) /The Vault score table
6 =demo.query(“select * from GYMSCORE where EVENT=’Floor'”).align(A5:NAME,NAME) /The Floor score table aligned with athlete NAME
7 =A5.(SCORE*0.6+A6(#).SCORE *0.4) /Calculate weighted scores
8 =A7.rank() /Return the ranking of weighted scores
9 =A5.new(NAME,A7(#):score,A8(#):rank) /Create a table sequence having athlete names, weighted scores and ranks

Both table sequences in A2 and A3 are aligned to A1’s employee EID. A4 creates a table sequence of employees’ salaries:

esProc_number_locate_19

In A6, data is aligned with the athlete names in A5’s table sequence. Then A7 calculates weighted scores and A8 gets the ranking of weighted scores. Finally, A9 generates a resulting table sequence:

esProc_number_locate_20

In fact, the align function align@a will also return a sequence aligned with the standard sequence. Each of its members is a set to which the alignment access applies.

  A B
1 =demo.query(“select * from EMPLOYEE”) //The EMPLOYEE table

/Alignment grouping by A2’s STATE field

2 [California,Texas,Pennsylvania]  
3 =A1.align@a(A2,STATE) /Alignment grouping by A2’s STATE field
4 =A3.new(A2(#):STATE,~.count():Count,round(~.avg(age(BIRTHDAY)),2):Age) /Search A2 for the corresponding field value through the sign # that represents a group number in A3

When the computation finishes, A4 gets the following result:

esProc_number_locate_21

Without an option, the align function fetches the first member corresponding to each member of the standard sequence from the source sequence being aligned and returns a set consisting of these first members, instead of returning a set consisting of subsets. If we already know that there is only one member in each group, using this function is equivalent to sorting these members according to a standard sequence.

The alignment access also applies in enumeration grouping except in enum@1.

  A
1 =demo.query(“select * from EMPLOYEE”)
2 [AgeGroup1,AgeGroup2,AgeGroup3]
3 [?<=30,?>30 && ?<=40,?>40]
4 =A1.enum(A3,age(BIRTHDAY))
5 =A4.new(A2(#):AgeInterval,~.count():Count)

A5 calculates the number of employees in each of the three age groups:

esProc_number_locate_22

6. Interval integer sequence

An integer sequence is a special set that is applicable to all the set operations. In addition, its members can be used as sequence numbers to access a subset of another sequence. A flexible use of integer sequences is vital for starting approaching problems with sequence numbers.

  A
1 =to(10)
2 =to(3,8)
3 =A1.step(3,2)
4 =20.step(4,2,3)

The to function generates a sequence consisting of consecutive integers. In step function, you can set the interval between members of an ISeq and other parameters. Results from A1 to A4 are as follows:

esProc_number_locate_23

You can process a subset according to an integer sequence consisting of the positions of the subset’s members in the source set.

  A B
1 =to(100) /Return a sequence consisting of numbers 1, 2, …100.
2 =A1(100.step(14,7))=0 /Assign 0 to the members whose sequence numbers are multiple of 7 (from the 14th member).
3 =A1.run(if(~>1,A1(100.step(~,~+~))=0,0))  
4 =A1.select(~>1) /Generate a list of prime numbers. Assign 0 to the members whose sequence numbers are composite numbers, and then the rest of the members are prime numbers.
5 =100.(rand()) /Generate 100 random numbers.
6 =A5(to(50))  
7 =A5(to(51,100))  
8 >A5(100.step(2,1))=A6  
9 >A5(100.step(2,2))=A7 /Shuffle A5 by alternatively exchanging members in the first half and the second half.

7. ISeq comprising sequence numbers

After a sequence is sorted, the previous order of the members of the sequence will be lost. However, this order could be useful in certain situations. For example, you might want to know the hiring order of the three oldest employees in the company, or the growth ratios for the top three trading days in terms of price, and so on.

esProc offers the psort function to return the original sequence numbers of the sorted members.

  A
1 [c,b,a,d]
2 =A1.psort()
3 =A1(A2)
4 =A1.sort()
5 =A3==A4

Results from A2 to A5 are as follows:

esProc_number_locate_24

In other words, in an integer sequence returned by the psort function, the first number is the sequence number of the member which is in the first place in the sorted sequence; the second member is the sequence number of the member which stands in the second place, and so on.

From an ISeq returned by psort() function, you can use the inv() function to get an ISeq with the original order of members adjusted to undo the sorting operation.

  A
1 [c,b,d,a]
2 =A1.sort()
3 =A1.psort().inv()
4 =A2(A3)
5 =A4==A1

Results from A2 to A5 are as follows:

esProc_number_locate_25

With the psort function, it’s convenient to solve the above problems requiring that the original order be kept.

  A B
1 =demo.query(“select * from EMPLOYEE “).sort(HIREDATE)  
2 =A1.psort(BIRTHDAY:-1) /Return an ISeq of sequence numbers of the records ordered by birthday.
3 =A2(to(3)) /An ISeq containing the sequence numbers of the records of three youngest employees in A1.
4 =demo.query(“select * from STOCKRECORDS where STOCKID=000062”).sort(DATE)  
5 =A4.psort(CLOSING:-1) /Return an ISeq of the original sequence numbers (in A4) of the records sorted in descending order by closing price.
6 =A5(to(3)) /The sequence numbers of the three records in A4 with the highest closing prices.
7 =A6.(A4(~).CLOSING/A4.m@0(~-1).CLOSING-1) /The growth rates in price for the three days. They should be calculated with the sequence numbers in A4.
8 =A6.(A4.calc(~,(CLOSING-CLOSING[-1])/CLOSING[-1])) /Use the calc function to simplify A7’s expression.

A binary search is widely recognized for its high search efficiency; however, it requires that an original sequence is sorted by the keyword. So, before a binary search is executed, the original sequence must be sorted, like searching for a member in the sequence. However, this is not suitable for all situations. For example, if you want to find the sequence numbers of the members in the original sequence, sorting before searching will disrupt the original order, which should be recovered by using the psort() function.

  A B
1 =demo.query(“select * from EMPLOYEE “).sort(HIREDATE)  
2 =A1.psort(NAME) /An ISeq of original sequence numbers of A1’s records in EMPLOYEE table sorted by NAME
3 =A1(A2) /A record sequence formed after the EMPLOYEE table is sorted by NAME
4 =A3.pselect@b(NAME:”David”) /Use the binary search to search for the sequence number of David in A3
5 =A2(A4) /David’s sequence number in the original record sequence

In this case, psort creates a binary search index for the original sequence. There could be one or more search indexes based on different keywords for a single sequence.

An alignment grouping function can also return ISeqs that each contain sequence numbers, instead of a sequence of groups containing records.

  A B
1 =demo.query(“select * from SALES”).sort(AMOUNT:-1) /Sort SALES by AMOUNT.
2 [QUICK,ERNSH,HANAR,SAVEA]  
3 =A1.align@1p(A2,CLIENT) /Alignment grouping by A2’s sequence of clients; sequence numbers are returned.
4 =A3.new(A2(#):NAME,A1(~).AMOUNT: Amount,~:Rank) /Use A3’s sequence numbers to search for the corresponding amounts and ranks.

8. Locating computation

After getting sequence numbers of the desired records, you can achieve the computing goal by performing the locating computation A.calc().The locating computation can avoid unnecessary computations and increase efficiency.

  A
1 =file(“VoteRecord”)
2 =A1.import@b()
3 [Califonia,Ohio,Illinois]
4 =A2.pselect@a(A3.pos(State)>0)
5 =A2.calc(A4,Votes[-1]-Votes+1)

Results of A2, A4 and A5 are as follows:

esProc_number_locate_26

In this case, the binary file VoteRecord stores the data of poll results, with a descending sort by the number of votes. A4 obtains a sequence of EIDs of the employees from the specified states. A5 calculates the number of votes they need in order to moving up according to A4’s EID sequence. For example, Ryan Williams, now ranking 3rd, needs another 69 votes to move up one place. Here inter-row operation is needed, and the computation needs more than the data of selected employees.

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website