Reference-based Thinking in esProc

Blog 2537 0

1.Foreign key reference

The foreign key is a common concept for various relational databases. For a relational table, you can define one or multiple fields as the foreign key, through which an association with the data in another table can be created. For example, in a game scoring report, often players are recorded with their IDs. To get the detailed information of the players from another player profile table, you should query the player profile table according to the player IDs. In this case, the player ID field can be set as the foreign key.

The foreign key can link two relational tables and help ensure data consistency, facilitating the cascade operations. Take the above game scoring report as an example. If defining the player ID field as the foreign key, then you can obtain the names, nationalities, ages, and other information of the players while getting the game scoring records.

In this example, the foreign key definition is in effect equivalent to referencing the records in the player profile table from the game scoring table. Thus, the essence of a foreign key lies in its mapping of records.

Unlike the database, esProc sets no constrains on the data type of the field values of the records, and any type of value is allowable. Thus a certain field of an esProc table sequence can get assigned with the referenced records, establishing a foreign key association straightforwardly.

Here’s an example of creating an employee education background table comprising the employee ID field and education field. The education field values are set in order through a simple loop, just for illustrating how to use the foreign key reference in esProc.

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  

To link the education background table with the EMPLOYEE table, a foreign key field Info can be added, whose values references the records of the EMPLOYEE table.

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  
4 $ select * from EMPLOYEE =A1.derive(A4.select@1(EID==ID):Info)  

Below is the table sequence in B4 in which the Info field references the employee records. You can double-click the field values to view the detailed information.

esProc_dataset_reference_1

Through record referencing, the foreign-key join queries are easier to perform and the query statements become clearer and more intuitive. A field whose values are records can be handled as any other fields and the records can be processed normally. For example, the following lists the ID, full name, education background and department for employees from the state of Ohio.

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  
4 $ select * from EMPLOYEE =A1.derive(A4.select@1(EID==ID):Info)  
5 =B4.select(Info.STATE== “Ohio”) =A5.new(ID,Info.NAME+” “+Info.SURNAME:FullName,Education,Info.STATE:State,Info.DEPT:Dept)  

The following is B5’s result:

esProc_dataset_reference_2

esProc allows assigning not only the records of other table seqeunces but also those of the current table sequence to the foreign key field.

In esProc, A.derive() function can assign records to a newly-added field when generating a table sequence. And the A.run() function can assign value to an existing field. For example:

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  
4 $ select * from EMPLOYEE >A1.run(ID=A4.select@1(EID==A1.ID))  

B4 modifies the employee ID field in the education background table sequence to the corresponding records in the EMPLOYEE table. After that, A1’s table sequence is as follows:

esProc_dataset_reference_3

If the primary key of the corresponding table is a single field, you can also use the function A.switch():

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  
4 $ select * from EMPLOYEE >A1.switch(ID,A4)  

The table sequence in A1 is the same, but B4’s expression is much more concise. Note that the switch() function links the specified field and the primary key in another table sequence automatically. If the primary key of the to-be-related table sequence isn’t the first field, T.primary() function is needed to set a primary for it. The following example creates an association between the STATE field of the EMPLOYEE table and the STATE table:

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  
4 $ select * from EMPLOYEE >A1.switch(ID,A4)  
5 $ select * from STATES >A5.primary(NAME) >A4.switch(STATE,A5)

Now you can view the state information from the employee information referenced in A1’s table sequence:

esProc_dataset_reference_4

At this point, you can handle multi-table association queries or computations, like finding which states from which the employees come from have a population below 1,000,000:

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  
4 $ select * from EMPLOYEE >A1.switch(ID,A4)  
5 $ select * from STATES >A5.primary(NAME)  >A4.switch(STATE,,A5)
6 =A1.select(ID.STATE.POPULATION< 1000000)    

As can be seen, the record-referencing foreign key makes the multi-table association query syntax more concise and readable with an improved computing speed.

The switch() function can also switch over between the foreign key and a non-primary field. For example:

  A B C
1 =create(ID,Education) [BS,BA,MS,MA,MBA,MEE,MPA,PhD] =B1.len()
2 for 500 =B1((A2-1)%C1+1)  
3   >A1.insert(0,A2,B2)  
4 $ select * from EMPLOYEE >A1.switch(ID,A4)  
5 $ select * from STATES >A4.switch(STATE,A5:NAME)  

B5 specifies that the NAME field in A5’s STATES table sequence be related, instead of resetting its primary key, when calling the switch() function to create an association. The association is successfully created, and A1’s table sequence is as follows:

esProc_dataset_reference_5

Note that the EMPLOYEE table’s STATE field value in blue stands for a state record. Here the STATES table’s primary key is a different field.

When linking two table sequences through the foreign key, sometimes not all key values can find their matching records. For example:

  A B
1 [1,3,5,7] [Alberta,Ontario,Nova Scotia,Manitoba]
2 $ select EID,NAME,SURNAME,STATE from EMPLOYEE >A2(A1).field(4,B1)
3 $ select NAME,ABBR,CAPITAL from STATES >A2.switch(STATE,A3)

B2 modifies the STATE field values in several records of the EMPLOYEE table. Below are A2’s table sequences before and after the execution of B2:

esProc_dataset_reference_6

The first field of STATES table sequence that A3 generates is the NAME. And B3 uses the switch() function to directly create an association. After the association, A2’s table sequence is as follows:

esProc_dataset_reference_7

With the modification, these modified STATE values can’t find matching records in the STATES table and thus return nulls. To prevent this from happening, you can use @i option with the switch() function to return only the records that are matched successfully. For example:

  A B
1 [1,3,5,7] [Alberta,Ontario,Nova Scotia,Manitoba]
2 $ select EID,NAME,SURNAME,STATE from EMPLOYEE >A2([1,3,5,7]).field(4,B1)
3 $ select NAME,ABBR,CAPITAL from STATES >A2.switch@i(STATE,A3)

In this case, the switch() function deletes those records that can’t find their matches, and A2’s table sequence is as follows:

esProc_dataset_reference_8

But you might be only interested in those key values that can’t find matches. To get them, you can modify B3’s code into >A2.switch@d(STATE,A3). Now A2’s table sequence is as follows:

esProc_dataset_reference_9

2.Referencing sets

Apart from records, esProc also lets a field get assigned with sets – that is, referencing sets in a foreign key field. In most of the cases, the sets referenced by a foreign key field are record sequences that each is composed of certain records of another table sequence. esProc foreign key field is different from the foreign key concept in the context of the databases, in that the former is more like the SQL query where a primary table references data of one of its subtables.

Like the foreign key got assigned with records, a foreign key field can also get assigned with records sequences when it is added by the derive() function generating a table sequence. For example:

  A
1 $ select NAME,ABBR from STATES
2 $ select * from EMPLOYEE
3 =A1.derive(A2.select(STATE==A1.NAME):Employees)

A3 adds a foreign key field Employees to A1 and assigns records of the employees who come from each state to it, and returns a new table sequence. The values of the Employees field of A3’s table sequence are all record sequences, as shown below:

esProc_dataset_reference_10

The foreign key field referencing sets can be directly called as any field of a table sequence. At the same time, computations can be performed on the set-type values of the foreign key field. The following example adds a Count field to calculate the number of the employees in each state:

  A
1 $ select NAME,ABBR from STATES
2 $ select * from EMPLOYEE
3 =A1.derive(A2.select(STATE==A1.NAME):Employees)
4 =A3.derive(Employees.count():Count)

The total number of employees can be computed based on the foreign key field. Then the table sequence in A4 is as follows:

esProc_dataset_reference_11

As with the foreign key field referencing records, you can also handle multi-table-association-based queries or filtering operations with expressions referencing a foreign key field that references sets, which mean record sequences. To find the states having more than 50 employees, for example:

  A
1 $ select NAME,ABBR from STATES
2 $ select * from EMPLOYEE
3 =A1.derive(A2.select(STATE==A1.NAME):Employees)
4 =A3.derive(Employees.count():Count)
5 =A4.select(Employees.len()>50)

By referencing sets in a foreign key field, the multi-table association query syntax becomes clearer and more intuitive. In A4’s table sequence, the sole purpose of introducing the Count field is to have a better viewing of the result. The field is useless for A5’s filtering operation. Below is A5’s result:

esProc_dataset_reference_12

The switch() function doesn’t apply to a table sequence field referencing sets, but the A.run() function can be used to assign the set-type values to an existing field. For example:

  A
1 =12.new(#:ID,string(date(“2000-“+string(#)+”-1″),”MMMM”):Month,Employees)
2 $ select EID,NAME+’ ‘+SURNAME FULLNAME, BIRTHDAY from EMPLOYEE
3 >A1.run(Employees=A2.select(month(BIRTHDAY)==A1.ID))

A3 assigns the null Employees field with the records of employees born in each month. Once the program is executed, the table sequence in A1 becomes as follows:

esProc_dataset_reference_13

Here the result of grouping a record sequence is a sequence composed of sets. This is similar to getting a single field referencing sets. For example, the following groups the employee records by the birth month:

  A
1 $ select EID,NAME+’ ‘+SURNAME FULLNAME, BIRTHDAY from EMPLOYEE
2 =A1.group(month(BIRTHDAY))

A2’s grouping result is as follows:

esProc_dataset_reference_14

Like the result in the previous example, this grouping result can be used for further handling as necessary:

  A
1 $ select EID,NAME+’ ‘+SURNAME FULLNAME, BIRTHDAY from EMPLOYEE
2 =A1.group(month(BIRTHDAY))
3 =A2.new(#:Month,~.count():Count)
4 =A2.new(#:Month,~.minp(BIRTHDAY).FULLNAME:Oldest)

A3 counts the employees born in each month. A4 lists the oldest employees born in each month. After computations, A3 and A4 get the following results:

esProc_dataset_reference_15

FAVOR (1)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website