Using Sequence Numbers and Locating Data in esProc

Course 1027 0

1. Accessing members

Members in a set of esProc (sequence) are organized in order. Therefore, you can reference a member in a set with its sequence number. The more flexible use of sequence numbers, the better esProc functions to make operations simpler and more efficient. Thus sequence numbers or an integer sequence (ISeq) of sequence numbers are recommended to work with certain esProc functions, such as delete() function for deleting records, and compose() function for re-sorting a table sequence (TSeq). 

The simplest application is to access members with their sequence numbers directly; this is the same as what you would do to access an array with a programming language.

  A
1 [1,3,5,7,9]
2 =A1(1)
3 =A1(3)
4 >A1(2)=4
5 >A1(4)=8

A2 and A3 get members by specified positions. Their results are:

esProc_deep_number_locate_2

A4 and A5 modified A1’s sequence. The following shows the changing step by step:

esProc_deep_number_locate_3

You can use m(i) function to get members backwards or in a loop manner. It is a serviceable complement to A(i) function.

  A
1 [1,3,5,7,9]
2 =A1.m(3)
3 =A1.m(-2)
4 =A1.m@r(6)
5 =A1.m@r(12)
6 =A1.m(6)

A2 and A3 get values of members by specified sequence numbers, using A.m() function. -2 represents the second-to-last member. With @r option, A4 and A5 will get members cyclically if the specified sequence number exceeds the number of members. For example the sequence number 12 will get the second member after it circulates through A1’s members twice. Computations from A2 to A5 get the following results:

esProc_deep_number_locate_5

Without the @r option, A6 returns the null value as its specified sequence number 6 exceeds the sequence’s length.

In addition, esProc provides a series of p-headed functions for finding sequence numbers of members, as given below:

  A
1 [3,5,1,9,7]
2 =A1.pos(5)
3 =A1.pmin()
4 =A1.pmax()
5 =A1.pselect(~%5==0)

A2 finds the sequence number by a specified member value. If there are multiple members with the same value, return only the sequence number of the first member. A3 and A4 respectively return sequence numbers of members of the maximum and minimum value. A5 finds the sequence number of the first member that is a multiple of 5. Computations from A2 to A5 are as follows:

esProc_deep_number_locate_7

pos function will return null if a specified member is not found in a sequence. This function can be used to judge whether a member is in a certain set or not.

  A
1 [3,5,1,9,7]
2 =A1.pos(1)!=null
3 =A1.pos(2)!=null

Computations of A2 and A3 are respectively as follows:

esProc_deep_number_locate_9

2. Accessing subsets

With sequence numbers, you can access the subsets of a set.

  A
1 [3,5,4,6,1]
2 =A1([1,3,5])
3 =A1([3,5,2])
4 =A1([4,1,3,1])
5 >A1([1,3,5])=[12,43,28]
6 >A1([2,4,3])=0

A2, A3 and A4 get subsets of the sequence. Different from getting members, here the operations use ISeqs composed of sequence numbers as their parameters. Results are as follows:

esProc_deep_number_locate_11

A5 and A6 modify members of the sequence in one go using sequence parameters. A1’s sequence changes step by step as follows:

esProc_deep_number_locate_12

You can also use the m() function to access a subset by specifying the corresponding sequence numbers.

  A
1 [3,5,4,6,1]
2 =A1.m([1,-1])
3 =A1.m@r([1,6,12])
4 =A1.m@0([1,6,3])

You can use a negative number in A.m() function’s ISeq parameter to represent a position counted backwards, or you can use @r option with the function to indicate a turn-back if a specified sequence number exceeds the boundary. Besides, @0 option means the corresponding null value of a sequence number that is beyond the sequence, if any, will not appear in the result. Computations of A2, A3 and A4 are:

esProc_deep_number_locate_14

If there is the @a option in a locate function, the sequence numbers of all members satisfying the specified condition will be returned.

  A
1 [3,2,1,9,6,9,1,2,8]
2 =A1.pos@a(2)
3 =A1.pmin@a()
4 =A1.pmax@a()
5 =A1.pselect@a(~%2==0)

With @a option, A2 returns all positions of 2 in A1’s sequence; A3 returns sequence numbers of all members having the smallest value; A4 returns sequence numbers of all members having the biggest value; and A5 returns sequence numbers of all members that are multiples of 2. Their computations are:

esProc_deep_number_locate_16

If you want to get the positions of multiple members at one time using pos function, the @i option may be required in certain cases.

  A
1 [3,2,1,9,6,9,1,2,8]
2 =A1.pos@i([2,9,8])
3 =A1.pos@i([3,1,1,1])
4 =A1.pos@i([1,2,3])
5 =A1.pos([1,2,3])
6 =A1.pos([1,1,2,2])
7 =A1.pos([1,1,1,2,2,2,3,3])

A.pos@i() finds positions of members given in the sequence parameter in one-way sequence. Without @i,  A.pos () simply judges if every member of the parameter sequence is included in sequence A. Computations from A2 to A7 are:

esProc_deep_number_locate_18

Both A3 and A4 return nulls, because A3 didn’t find the third 1 specified by the parameter sequence in A1’s sequence and A4 couldn’t find 1, 2, 3 in a one-way sequence. A.pos@i() returns only an increasing ISeq; if not all members are found, it returns null.

The pos@i function returns null if a certain given member is not found in a sequence. Considering the order it requires and the duplicate members, you cannot simply use this function to judge if the specified subset is contained; instead, you should use an intersection operation.

  A
1 [3,2,1,9,6,9,1,2,8]
2 =A1.pos@i([1,9,6])!=null
3 =A1.pos@i([1,2,3])==null
4 =A1.pos([1,2,3,4])==null
5 =A1.pos([1,2,3,3])!=null
6 [1,1,2,2]
7 [1,1,2,2,3,3]
8 =A6^A1==A6
9 =A7^A1==A7

Computations from A2 to A4 are:

esProc_deep_number_locate_20

When A.pos@i(B) is used to query and judge data, the non-null result indicates that members of B can be found in A sequentially and that A must contain B. While a null result only indicates that B cannot be found in A sequentially, it does not necessarily mean A does not contain B. One example is A3’s case.

If result of A.pos () is null, there must be certain members of B that you cannot find in A and it is necessary that A does not contain B. But from the non-null result you cannot deduce that A contains B, if there are duplicate members in B. A5 is such an example.

The following are computations of A8 and A9:

esProc_deep_number_locate_21

It is feasible to judge if A contains B using the formula B^A==B. Results of A8 and A9 prove that A1 contains A6 but it does not contain A7. Note that the operands in the formula should not be commuted, otherwise the order of members in computation of A^B may be different from that in B and the judgment cannot be rightly made.

3. Locating data using loop functions

Like the symbol ~, # in a loop function represents the sequence number of the current member.

  A
1 [5,4,3,2,1]
2 =A1.(#)
3 =A1.(#+~)
4 =A1.select(#%3==2)
5 =A1.group(int((#-1)/2))

A2 gets an ISeq consisting of sequence numbers; A3 gets a result sequence by adding each member and its sequence number. Using select function, A4 finds the second ones of every three members in A1’s sequence, i.e. members whose sequence numbers are 2, 5, 8,…, and creates a sequence with them. A5 groups A1’s sequence every two members. The following lists their computations:

esProc_deep_number_locate_23

In a loop function, esProc allows using the symbol [] to access members in relative positions.

  A
1 [1,2,3,4,5]
2 =A1.(~[0])
3 =A1.(~[1])
4 =A1.((~-~[-1])/~[-1])
5 =demo.query(“select * from STOCKRECORDS where STOCKID=000062”).sort(DATE)
6 =A5.((CLOSING-CLOSING[-1])/CLOSING[-1])
7 0
8 =A5.max(if(CLOSING>CLOSING[-1],A7=A7+1,A7=0))

A2 lists each member of the sequence; A3 gets every member that is next to the current member; A4 calculates the ratio between each member and the previous one. Their computations are:

esProc_deep_number_locate_25

A5 finds records of the stock with a specified ID; A6 calculates growth rate of every day’s stock price; then A8 calculates the maximum number of successive rising days. Computations of A6 and A8 are:

esProc_deep_number_locate_26

In addition, you can use the symbol {} to get members of subsets by relative positions.

  A
1 [1,2,3,4,5]
2 =A1.(~{-1,1})
3 =A1.(~{-1,1}.avg())
4 =A1.(~{1-#,0}.sum())
5 =A1.(~{,0}.sum())
6 =A1.(~{0,}.sum())

For each position, A2 gets the current member, the member before it and the one after it. A3 calculates moving average for each position. Both A4 and A5 calculate the cumulative sum. A6 calculates the reversed cumulative sum, that is, the sum of a decreasing number of members. Computations from A2 to A6 are:

esProc_deep_number_locate_28

4. Alignment access

As we know, the symbol # in a loop function represents the sequence number of the current member. In fact, it is nothing but a number which can be calculated as any others. In particular, it can be used as a sequence number to access a member in another sequence, which makes alignment access possible.

  A
1 [1,2,3,4,5]
2 =A1.(A1(#))
3 =A1.(A1.m(#-1))
4 [5,4,3,2,1]
5 =A1.(~+A4(#))
6 =A1++A4
7 =10.(if(#%2==1,A1((#-1)/2+1),A4(#/2)))

In loop computations, you can use # in an expression to represent the current sequence number. Computations of A2, A3, A5, A6 and A7 are respectively as follows:

esProc_deep_number_locate_30

When independent sequences are arranged in the same order, the use of alignment access may create an effect similar to accessing fields of a record.

  A
1 [Bray,Jacob,Michael,John]
2 [65,87,98,72]
3 [76,82,78,88]
4 =A1.rank(A2(#)+A3(#))
5 =A1.new(~:name,A4(#):rank)

A4 calculates ranks of total scores, getting related scores by positions. A5 creates a table sequence with name field and rank field by linking two sequences together according to member positions. Computations of A4 and A5 are:

esProc_deep_number_locate_32

5. Aligning sequences

Before an alignment access is executed, it is necessary that all the sequences are arranged in the same order. However, in practice, sequences are not always in the same order. Under such circumstances, you should use the align function to re-order sequences according to the order of a certain sequence to make them in the same order.

  A B
1 =demo.query(“select * from EMPLOYEE”) /The EMPLOYEE table
2 =demo.query(“select * from ATTENDANCE”).align(A1:EID,EMPLOYEEID) /The ATTENDANCE table is aligned with the EID field of the EMPLOYEE table
3 =demo.query(“select * from PERFORMANCE”).align(A1: EID, EMPLOYEEID) /The PERFORMANCE table is aligned with the EID field of the EMPLOYEE table
4 =A1.new(NAME,SALARY*(1+A2(#).ABSENCE+A3(#).EVALUATION):salaryPaid) /A new table sequence is created to calculate salaries(here, A1, A2 and A3 are organized in the same order)
5 =demo.query(“select * from GYMSCORE where EVENT=’Vault'”) /The score table for Vault
6 =demo.query(“select * from GYMSCORE where EVENT=’Floor'”).align(A5:NAME,NAME) / The score table for Floor; it is aligned with NAME
7 =A5.(SCORE*0.6+A6(#).SCORE *0.4) / Calculate weighted scores
8 =A7.rank() /Return a ranking of weighted scores
9 =A5.new(NAME,A7(#):score,A8(#):rank) / Create a table sequence to get athlete names, weighted scores and ranks

Both table sequences in A2 and A3 have been aligned to A1’s employee IDs. A4 gets the table sequence of employees’ salaries:

esProc_deep_number_locate_34

In A6, data is aligned with names of the athletes in A5’s table sequence. Then A7 calculates weighted scores and A8 gets the ranking of weighted scores. Finally, A9 organizes the data and gets the result table sequence:

esProc_deep_number_locate_35

In fact, an align function align@a will also return a sequence aligned with the standard sequence. With members being sets, it can also be accessed with the alignment access approach.

  A B
1 =demo.query(“select * from EMPLOYEE”) /The EMPLOYEE table
2 [California,Texas,Pennsylvania]  
3 =A1.align@a(A2,STATE) /Align grouping by A2’s STATE field
4 =A3.new(A2(#):STATE,~.count():Count,round(~.avg(age(BIRTHDAY)),2):Age) /Use # to search for, in A2, the values of the fields by group numbers in A3

When computations are completed, A4 gets the following result:

esProc_deep_number_locate_37

Without option, align function fetches the first member corresponding to each member of the standard sequence from the source sequence and returns a set consisting of these first members, instead of returning a set consisting of subsets. If we already know there is only one member in each subgroup, using this function is to sort these members according to a standard sequence.

Similarly, the alignment access can be used in enumeration group operation; here, enum@1 is not commonly used.

  A
1 =demo.query(“select * from EMPLOYEE”)
2 [AgeGroup1,AgeGroup2,AgeGroup3]
3 [?<=30,?>30 && ?<=40,?>40]
4 =A1.enum(A3,age(BIRTHDAY))
5 =A4.new(A2(#):AgeInterval,~.count():Count)

A5 calculates the number of employees for each of the three age groups:

esProc_deep_number_locate_39

6. Interval integer sequence

An integer sequence is a special set that is applicable to all the set operations. In addition, it can be used as sequence numbers for accessing subsets in another sequence. A flexible use of integer sequences is vital for starting approaching problems with sequence numbers.

  A
1 =to(10)
2 =to(3,8)
3 =A1.step(3,2)
4 =20.step(4,2,3)

to function generates a sequence consisting of consecutive integers. With step function, you can set intervals between members of an ISeq and other parameters. Computations from A1 to A4 are:

esProc_deep_number_locate_41

You can process subsets by an integer sequence consisting of the subsets’ positions in the original set.

  A B
1 =to(100) /Return a sequence consisting of numbers 1, 2, …100
2 =A1(100.step(14,7))=0 /Assign 0 to the members whose sequence numbers are multiple of 7 (from the 14th member)
3 =A1.run(if(~>1,A1(100.step(~,~+~))=0,0))  
4 =A1.select(~>1) /Generate a list of prime numbers; then, assign 0 to the members whose sequence numbers are composite numbers; other non-zero numbers are prime numbers
5 =100.(rand()) /Generate 100 random numbers
6 =A5(to(50))  
7 =A5(to(51,100))  
8 >A5(100.step(2,1))=A6  
9 >A5(100.step(2,2))=A7 /Shuffle A5; that is, alternatively exchange members in the first half and the second half

7. ISeq consisting of sequence numbers

After a sequence is ordered, the previous order of the members in the sequence will be lost. However, in certain conditions, this order information may be required. For example, we may need to know the entry order of the three oldest employees in the company, the amount of increase of a share’s price for the three trading days when the share prices are on the highest level, and so on.

This problem can be solved by using the psort function in esProc; the function returns the previous order of the ordered members.

  A
1 [c,b,a,d]
2 =A1.psort()
3 =A1(A2)
4 =A1.sort()
5 =A3==A4

Computations from A2 to A5 are:

esProc_deep_number_locate_44

In plain words, in an integer sequence returned by the psort function, the first number is, relative to the original sequence, the sequence number of the member which should be placed in the first place in the sorted sequence; the second member is, relative to the original sequence, the sequence number of the member which should be placed in the second place; the rest may be deduced by analogy.

For the sequence resulting from the sequence number ISeq, you can also use the inv() function to get the inverse ISeq of the sequence number ISeq, which can be used to undo operations.

  A
1 [c,b,d,a]
2 =A1.sort()
3 =A1.psort().inv()
4 =A2(A3)
5 =A4==A1

Computations from A2 to A5 are:

esProc_deep_number_locate_46

You can use the psort function to solve the above problem which requires that the original sequence numbers should be kept.

  A B
1 =demo.query(“select * from EMPLOYEE “).sort(HIREDATE)  
2 =A1.psort(BIRTHDAY:-1) /Return an ISeq of sequence numbers of the records ordered by Birthday
3 =A2(to(3)) /The youngest three employees’ positions in A1
4 =demo.query(“select * from STOCKRECORDS where STOCKID=000062”).sort(DATE)  
5 =A4.psort(CLOSING:-1) /Return an ISeq of sequence numbers of the records reversely sorted by ClosingPrice
6 =A5(to(3)) /The positions of the highest three CLOSING in A4
7 =A6.(A4(~).CLOSING/A4.m@0(~-1).CLOSING-1) /The increase for these three days; the increase shall be computed according to the positions of records sorted by TradeDate, that is, the positions in A4
8 =A6.(A4.calc(~,(CLOSING-CLOSING[-1])/CLOSING[-1])) /Use the calc function to simplify A7’s expression

A binary search is widely recognized for its high efficiency; however, it requires that an original sequence is sorted by keywords. So, before a binary search is executed, the original sequence must be sorted. However, this is not suitable for all. For example, if you want to search for a member in a sequence, you can of course run the sort function before the searching; but if you want to search for an index of a member, running the sort function before searching would damage order; in this case ,you should use psort() function.

  A B
1 =demo.query(“select * from EMPLOYEE “).sort(HIREDATE)  
2 =A1.psort(NAME) /The integer sequence of sequence numbers, after the EMPLOYEE table is sorted by NAME
3 =A1(A2) /The record sequence formed after the EMPLOYEE table is sorted by NAME
4 =A3.pselect@b(NAME:”David”) / Use the binary search to search for the sequence number of David in A3
5 =A2(A4) / David’s sequence number in the original record sequence

In this case, psort creates a binary search index for the sequence; there could be one or more search indexes, depending on different keywords, for a single sequence.

An alignment group function can also return an ISeq consisting of sequence numbers, instead of the sequence aligned.

  A B
1 =demo.query(“select * from SALES”).sort(AMOUNT:-1) /The ranking of AMOUNT
2 [QUICK,ERNSH,HANAR,SAVEA]  
3 =A1.align@1p(A2,CLIENT) /Alignment grouping by A2’s sequence of clients; sequence numbers are returned
4 =A3.new(A2(#):NAME,A1(~).AMOUNT: Amount,~:Rank) / Use A3’s group number to search for the corresponding amount and ranks

8. Locating computation

After working out sequence numbers of records needed, we can compute the required results with locating computation A.calc().The locating computation can avoid unnecessary computation and increase efficiency.

  A
1 =file(“VoteRecord”)
2 =A1.import@b()
3 [Califonia,Ohio,Illinois]
4 =A2.pselect@a(A3.pos(State)>0)
5 =A2.calc(A4,Votes[-1]-Votes+1)

Computations of A2, A4 and A5 are as follows:

esProc_deep_number_locate_51

In this case, the binary file VoteRecord stores poll results, with a descending sort of the votes. A4 is the computed result of employee ID sequence of a specified state. A5 calculates the number of votes they need in order to moving up. For example, Ryan Williams, now ranking 3rd, needs another 69 votes to move up one place. Here inter-row operation is needed, because the computation needs more than the data of selected employees.

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website