esProc Variable Scope

Blog 1901 0

esProc uses various variables, including the cell variable, in all its operations. Each type of variable has its scope, which is the topic the article discusses.

1. esProc Parameters and Variables

esProc allows defining the globally-used data as cellset constant, like the following sequence-type constant weekdays:

esProc_variable_scope_1

The constant-type data that should be set before writing a cellset program is often defined as a program parameter, like the following settingDate:

esProc_variable_scope_2

The following program computes what day the current day is according to the predefined settingDate:  

  A B
1 =B1 =settingDate
2 =day@w(B1) =weekdays(A2)

Besides the cellset constant weekdays and the program parameter settingDate, cell names are also used in A1, A2 and B2 as the variables for referencing cell values to engage in the computation. According to the default value of settingDate, A1, B1, A2, and B2 respectively get their results:

esProc_variable_scope_3

You see that the cellset constant, program parameter and the cell variable can all be referenced for computation across the cellset program. Both the cellset constant and the program parameter are predefined and won’t change during the computation, while a cell variable often has different values. For instance, A1 obtains null for B1’s value but A2 gets the specified date value for B1. The detailed information about the change of the cell value will be covered in a later section.

Apart from the cell variable, esProc lets programmers to use the user-defined cellset variable. For example:

  A B
1 [1,2,3,4,5]  
2 >average=A1.avg() =A1.(~-average)

A2 defines average as the average value of members of A1’s sequence, and, by referencing average, B2 calculates the difference between each of A1’s members and the average value. Below is B2’s result:

esProc_variable_scope_4

Only an assigned cellset variable is eligible for use. So average must get assigned before it is referenced, otherwise an error will arise. For example:

  A B
1 [1,2,3,4,5] =average
2 >average=A1.avg() =A1.(~-average)

The computation fails because the program can’t identify average when trying to parse B1. In this case an error report follows:

esProc_variable_scope_5

To prevent the error, you can check if the variable has been defined. For example:

  A B
1 [1,2,3,4,5] =if(ifv(average),average,null)
2 >average=A1.avg() =if(ifv(average),average,null)

B1 and B2 have the same expressions that first use ifv(average) to check if the variable average exists, and then set the cell value as average or null depending on the judging result.

2. The cell variable in loop procedures

esProc uses a variety of constants, parameters and variables among which the cell value, without being declared in advance, is the intrinsic variable of cellsets. It is convenient, safe, and the most frequently-used. In a loop procedure, members of a sequence are assigned one by one to the master cell to be accessed by the loop code. For example:

  A B C
1 $ select * from CITIES =A1.(NAME)  
2 for B1 if left(A2,1)==”C” >C1=C1+1
3 =A2    

B1 obtains a sequence of city names. A2 loops through these city names to calculate the number of those starting with “C” and stores the result in C1. Below are B1’s sequence and the results of C1 and A3:

esProc_variable_scope_6

The result shows that B2 successfully gets each city name from A2 in order during the loop and accomplish the computation, with the result being 14. After the loop finishes, A2’s data is already cleared and thus A3’s result is null.

This indicates that the data in the master cell of a loop procedure can only be used by the loop code.

The value of the master cell of a loop procedure will only be cleared when the loop stops normally. But when a loop is terminated halfway through the computation, the value won’t be cleared. For example:

  A B C
1 $ select * from CITIES =A1.(NAME)  
2 for B1 if left(A2,1)==”C” break
3 =A2    

The above cellset program ends the loop using the break statement as soon as it finds a city name starting with “C”. In this case the master cell of the loop procedure will keep the value when the loop exits, from which the first city name beginning with “C” can be obtained. A2 and A3 have the same values:

esProc_variable_scope_7

The for statement written in a cellset for performing a loop operation can loops over not only members of a sequence, but also records of a table sequence or a cursor. For example:

  A B C
1 $ select * from CITIES    
2 for A1 if left(A2.NAME,1)==”C” >C1=C1|A2
3 =demo.cursor(“select * from CITIES”)    
4 for A3 if left(A4.NAME,1)==”C” >C3=C3|#A4

A2 cycles down through the records of A1’s table sequence to search for records of the cities whose initials are “C” and put them together to form a record sequence. A4 loops through the records of A3’s cursor and stores the sequence numbers of the records of the cities whose initials are “C” in C3. Instead of directly referencing the records themselves, C4 gets their sequence numbers through #A4 which is also the number of the current loop. Below are results of C1 and C3:

esProc_variable_scope_8

When the loop finishes, values of both A2 and A4 are cleared.

3. The cell variable in subroutines

  A B C D
1 [A,B] [a,b] =func(A2,[A1,B1])  
2 func      
3   =A2(1) =A2(2)  
4   for B3 for C3 >D3=D3|(B4/C4)
5   return D3    

A2 is the master cell of the subroutine covering lines from the 2nd to the 5th. Under execution of the subroutine, the parameters used in calling the subroutine will be copied to the master cell from which both B3 and C3 get their parameters respectively. The subroutine concatenates each member of the first sequence and each member of another sequence in order.

C1 calls the subroutine to cross combine the two sequences A1 and B1. Here’s C1’s result:

esProc_variable_scope_9

While a subroutine is working, it can reference not only a parameter through the master cell but also the cells within the code block of the subroutine. But when the subroutine call is over, the values of all cells within the code block holding the subroutine will be cleared. For this reason, cell variables used in a subroutine are only valid within the subroutine code block.

A cell variable used in a subroutine gets assigned separately for each subroutine call. So values won’t confuse with each other when the cell variable is accessed recursively. For example:

  A B C D
1 [A,B] =func(A2,[A1,A1]) =func(A2,[A1,A1,A1]) =func(A2,[A1,A1,A1,A1])
2 func      
3   =A2(1) =A2(2)  
4   for B3 for C3 >D3=D3|(B4/C4)
5   if A2.len()==2 return D3  
6   =[D3]|A2.to(3,) return func(A2,B6)  

The subroutine from the 2nd line to the 6th line interlaces characters from multiple sequences through the recursive calling. For the situation where there are more than two sequences specified by the parameters, the 6th line interlaces the first two sequences and concatenates interlacedly the result and another sequence, and then another sequence, by calling A2’s subroutine recursively.

B1, C1 and D1 respectively perform the interlaced concatenation on two, three and four sequences. Here’re their results:

esProc_variable_scope_10

Because multiple calls of the same subroutine will share same cell variables, if a subroutine references a cell outside of it and each execution of subroutine call alters the value of this cell, then error will appear. For example:

  A B C D
1 [A,B]   =func(A2,[A1,A1,A1])  
2 func      
3   =A2(1) =A2(2)  
4   for B3 for C3 >D1=D1|(B4/C4)
5   if A2.len()==2 return D1  
6   =[D1]|A2.to(3,) return func(A2,B6)  

This is a different subroutine, where the result of interlacing members of two sequences is stored in cell D1, which belongs to the main program, instead of cell D3 within the subroutine. Below is the result of C1:

esProc_variable_scope_11

C1 gets the wrong result for interlacing three sequences because each subroutine call alters D1’s value by performing recursive calling.

4. The variable scope in parallel processing

With parallel processing, a cellset program will be executed simultaneously by several threads. Since each thread performs its part of computation independently and separately, the cell variables used in different threads won’t interact. For example:

  A B C
1 $ select * from CITIES [C,D,E]  
2 fork B1 =A1.select(left(NAME,1)==A2)  
3   return B2  

A2 performs a multithreaded computation and gets the following result:

esProc_variable_scope_12

In a parallel computation, there won’t be any problem if multiple threads access the same cell in the code block of main program. This is different from performing a subroutine call. For example:

  A B C
1 $ select * from CITIES [C,D,E]  
2 fork B1 >C1=A1.select(left(NAME,1)==A2)  
3   return C1  

Here A2 gets the same result as in the previous example, while C1 doesn’t have value. This is because each thread in a multithreaded computation handles its task on its own, with the cell variables, used by either a thread or the main program, being valid only across their respective area and having independent values.

Cluster computing is one type of the parallel computing, issuing threads on different servers to separately execute the cellset program. To share data across a cluster, you can define a global parameter or a task parameter. Here’s an example RandomAddup.dfx:

  A B
1 =file(“record.txt”) =zone(“z”)
2 =if(ifv(Total),Total,0) =A2+rand(1000)
3 >A1.write@a(“Total “/A2/” to “/B2) >env(Total,B2)

This cellset program uses a global parameter Total to calculate the random accumulative values on one server, and records the result of each calculation in the record.txt file.

Now let’s arrange 3 servers on which the numbers of parallel tasks are respectively 2, 1, and 4, and call them in the main program mainAddup.dfx as follows:

  A
1 [192.168.0.86:4001,192.168.0.66:4001,192.168.0.86:4004]
2 =callx(“RandomAddup.dfx”,to(10);A1)

Execute the program two times and the three servers get their record.txt file as follows:

esProc_variable_scope_13

According to the log, the global parameter allows public access to it on each server. The value of the global parameter Total continues to accumulate on each server when the cluster program executes again, showing that the parameter will be valid until the server restarts. It’s not proper to call RandomAddup.dfx because conflict could happen if multiple threads on a server access to the global parameter at the same time.

To avoid the conflict between multiple threads, reset the number of the parallel tasks as 1 for all three servers and restart them. Besides, RandomAddup.dfx is modified as follows:

  A B
1 =file(“record.txt”) =zone(“z”)
2 =if(ifv(Total),Total,0) =A2+rand(1000)
3 >A1.write@a(“Total “/A2/” to “/B2) >env@j(Total,B2)

The program is modified by adding @j option to B3’s env function, making Total a task parameter. Then execute the mainAddup.dfx two times and the three servers get their record.txt file as follows:

esProc_variable_scope_14

You see that a task parameter in a cluster computation will always be valid across each server during the whole process of calling the main program. But, the task parameter will become invalid when the main program finishes execution. Initialization is required to call it again.

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website