Rules of Resource Release in esProc

Program Language 6094 4

During data computing in esProc, raw data, intermediate data, and the final resulting data consume certain system resources. That includes the memory consumption, database connection usage and the space occupation by the file objects. Too much resource consumption may cause memory overflow, database access failure and other problems to the computation. It’s important during the computing process to timely remove useless raw data and intermediate data, close unwanted database connection and delete unnecessary file objects so that resources can be released.

1. Variables

esProc supports user-defined variables and using cell values as variables. Both types of variables occupy memory space with data stored in them. If they consume a large amount of memory by holding a big volume of data, computations would thus be affected, resulting in low execution speed and even memory overflow.

The following program is used to test the normal memory consumption by variables:

  A B
1 $ select NAME from CITIES 15000000
2 =A1.(NAME) =A2.len()
3 2000-1-1 =create(ID,Name,City,Birthday)
4 for B1 >B3.insert(0,#A4,”N”/#A4,A2(rand(B2)+1),after@d(A3,-rand(10000)))
5 / =create(ID,Name,City,Birthday)
6 for B1 >B5.insert(0,#A6,”N”/#A6,A2(rand(B2)+1),after@d(A3,-rand(10000)))
7 / =create(ID,Name,City,Birthday)
8 for B1 >B7.insert(0,#A8,”N”/#A8,A2(rand(B2)+1),after@d(A3,-rand(10000)))
9 / =create(ID,Name,City,Birthday)
10 for B1 >B9.insert(0,#A10,”N”/#A10,A2(rand(B2)+1),after@d(A3,-rand(10000))

B3, B5, B7, B9 and B11 respectivley create a 4-field testing table sequence. Line 4,6,8,10 and 12 run loops to insert 15,000,000 records to each table sequence. The data types of the field values include integer, string and date. Each record will occupy space of about 100 bytes after it is objectified. A table sequence should be cleared from the cell used as the variable as soon as it is generated for the sake of smooth execution. With a memory space of 4G, momory overflow happens as B7 tries to geneate the third table sequence.

esProc_rules_sources_release_1

In actual execution, a table sequence, after it is generated, will be stored in a file and this step of computation is over. At that point it can be deleted from the cell, as shown below:

  A B
1 $ select NAME from CITIES 15000000
2 =A1.(NAME) =A2.len()
3 2000-1-1 =create(ID,Name,City,Birthday)
4 for B1 >B3.insert(0,#A4,”N”/#A4,A2(rand(B2)+1),after@d(A3,-rand(10000)))
5 >B3=null =create(ID,Name,City,Birthday)
6 for B1 >B5.insert(0,#A6,”N”/#A6,A2(rand(B2)+1),after@d(A3,-rand(10000)))
7 >B5=null =create(ID,Name,City,Birthday)
8 for B1 >B7.insert(0,#A8,”N”/#A8,A2(rand(B2)+1),after@d(A3,-rand(10000)))
9 >B7=null =create(ID,Name,City,Birthday)
10 for B1 >B9.insert(0,#A10,”N”/#A10,A2(rand(B2)+1),after@d(A3,-rand(10000))

As this is just a test, generated table sequences won’t be exported but just be cleared in A5, A7 and A9 by setting the values of B3, B5 and B7 as nulls. By removing these table sequences, resources are released and memory overlow is avoided to pave the way for subsequent compuations. After the code is exected, the intermediate data in B3, B5 and B7 will have been cleared:

esProc_rules_sources_release_2

But if multiple variables store the same data object, the actual storage consumption will be very small and won’t burden the memory. For example:

  A B
1 $ select NAME from CITIES 15000000
2 =A1.(NAME) =A2.len()
3 2000-1-1 =create(ID,Name,City,Birthday)
4 for B1 >B3.insert(0,#A4,”N”/#A4,A2(rand(B2)+1),after@d(A3,-rand(10000)))
5 =B3 =B3
6 =B3 =B3
7 >testTable=B3 >testTable.to(-10).run(Name=”Change”)

In this cellset program, all variables of B3, A5, B5, A6 and B6, as well as the cellset variable testTable contain a table sequence that has 15,000,000 records. But as they store the same data object there is no risk of memory overflow during the computation. When B7 modifies the last ten records stored in testTable, the table sequences stored in B3, A5, B5, A6 and B6 are also modified at the same time because they are actually the same object.

esProc_rules_sources_release_3

2. Database connections

Generally a database puts a limit on the number of connections to it. A number exceeding the limit will make the effort of establishing a new connection fail. Even if a database has no such restriction, too many connections will slow down the execution of database query and affect the efficiency.

In the following the MySQL database is used to make a test in which the maximum number of database connections is set as 10:

max_connections=10

The setting can be viewed in MySQL’s console:

esProc_rules_sources_release_4

max_connections, the maximum number of connections the database allows, has been modified. The default maximum number of connections MySQL allows is over 100.

Below is the esProc test code for adding database connections:

  A B C
1   connect(“mysql”)  
2 for 20 =”arg”/A2/”=”/B1 =eval(B2)

A2 runs a loop, and B2 composes a statement to establish a connection to MySQL database with each loop. With the first execution, B2 generates the following expression:

esProc_rules_sources_release_5

For each loop C2 executes B2’s expression with the eval function to establish a database connection and name it arg1, arg2, arg3,… The testing aims simply to establish the connections without querying data through them.

But soon an error arises:

esProc_rules_sources_release_6

The error appears because the number of connections exceeds the maximum number of connections. A view of the MySQL console tells a clear story:

esProc_rules_sources_release_7

The number of current connections, Threads_connected, is greater than the maximum number of connecitons, so an extra database connection fails. The database connection in esProc can’t close automatically to release resource unless the esProc program exits:

esProc_rules_sources_release_8

The remaining connection in Threads_connected is the console connection.

To completely remove the database connections, call the db.close() function. For example:

  A B C
1   connect(“mysql”)  
2 for 20 =”arg”/A2/”=”/B1 =eval(B2)
3   =”arg”/A2/”.close()” =eval(B3)

In the above cellset code, the database connection established in C2 is closed in C3. This way the program can be executed normally. The connection status can be viewd from the MySQL console:

esProc_rules_sources_release_9

Therefore, the database connection established in esProc should be closed using the db.close() function as soon as it has done its job so that resource can be released timely. Alternatively, you can effectively control the number of database connections by using the default connection to query the database.

3. Cursors

Cursors are commonly used in esProc to handle big data through batch processing.

Within one database connection, only one database cursor is allowed, as the following program shows:

  A B C
1 =connect(“mysql”) =A1.cursor(“select * from States”) =A1.cursor(“select * from States”)
2 >A1.close()    

In this cellset program, an error is reported when C1 tries to create a second cursor:

esProc_rules_sources_release_10

Similar to a database connection, a database cursor should be specifically closed using the cs.close() function to release the database resource it occupies after it completes its mission. For example:

  A B C
1 =connect(“mysql”)    
2 =A1.cursor(“select * from States”) =A2.fetch(5) >A2.close()
3 =A1.cursor(“select * from States”) =A3.fetch(5) >A3.close()
4 >A1.close()    

Since A2’s cursor is closed in C2 after desired data is fetched from it, A3’s cursor is able to work normally.

Note that when a cursor’s data is entirely fetched out, the cursor will close automatically. For example:

  A B C
1 =connect(“mysql”)    
2 =A1.cursor(“select * from States”) =A2.fetch()  
3 =A1.cursor(“select * from States”) =A3.fetch()  
4 >A1.close()    

This cellset program can be executed normally because B2 will fetch all data from the cursor with the fetch function and then A2’s cursor will close automatically.

However, there’s no such a limit on database cursors as on file cursors. esProc allows multiple cursors created based on one file and data retrieval from them won’t affect each other. For example:

  A B C
1 =file(“PersonnelInfo”)    
2 =A1.cursor@b() =A2.fetch(5)  
3 =A1.cursor@b() =A3.fetch(5)  

Since a file cursor uses very small resource, it is unnecessary to close it during the computation.

But if new data is written to a file, then the multiple cursors generated based on it will have mutual interference between them. More information about file objects will be covered in the next section.

4. File objects

According to the previous section, there’s no need to close a file cursor because a file object won’t consume database connection resource as a database cursor does. Besides, multiple processes are allowed to retrieve data from the file object. But when additional data is written into the file, the result of data retrieval in a same way will be different. For example:

  A B C
1 =file(“test”) =demo.query(“select * from EMPLOYEE”).(NAME/” “/SURNAME) =create(ID,Name)
2 for 10000 =rand(500)+1 >C1.insert(0,A2,B1(B2))
3 >A1.export@z(C1) =A1.import@b(;2:5)  
4 for 10000 =rand(500)+1 >C1.insert(0,A4+10000,B1(B4))
5 >A1.export@z(C1) =A1.import@b(;2:5)  

A3 and A5 respectively export data to a binary file test. Both B3 and B5 import the second segment of the same file by dividing the file into five segments, but B5’s result is different from that of B3 due to the updating of the data in the file:

esProc_rules_sources_release_11

It’s not necessary to close a file object during the computation as it consumes very small resource. The resource it uses will be freed after the cellset program exits.

It’s a different situation when a file cursor is used:

  A B C
1 =file(“test”) =demo.query(“select * from EMPLOYEE”).(NAME/” “/SURNAME) =create(ID,Name)
2 for 10000 =rand(500)+1 >C1.insert(0,A2,B1(B2))
3 >A1.export@z(C1) =A1.cursor@b(;2:5) =A1.cursor@b(;2:5)
4   =B3.fetch(10) =C3.fetch(10)
5 for 10000 =rand(500)+1 >C1.insert(0,A4+10000,B1(B4))
6 >A1.export@z(C1) =A1.import@b()  

B3 and C3 respectively create a cursor to retrieve data from the file. With open file cursors, the source file refuses data to be written into it with the export function. After the code in line 5 is executed, the number of records in C1’s table sequence increases to 20,000. But with open cursors in B3 and C3, A6’s export statement can’t actually be executed. So the number of all records imported from the file by B6 is still 10,000.

esProc_rules_sources_release_12

5. Resource release in cellset invocation

Besides executing a single dfx file, esProc also supports calling another cellset program using call function or callx function. Here involves the rules of resource release during the invocation of the cellset program.

Below is a cellset progam CountSub.dfx:

  A B
1 $ select NAME from CITIES 15000000
2 =A1.(NAME) =A2.len()
3 2000-1-1 =create(ID,Name,City,Birthday)
4 for B1 >B3.insert(0,#A4,”N”/#A4,A2(rand(B2)+1),after@d(A3,-rand(10000)))
5 =B3.groups(year(Birthday):Year;count(~):Count)         return A5

The first four lines create a table sequence containing 15,000,000 records and store it in cell B3, the variable. A5 calculates the number of people born in each year based on B3’s table sequence. B5 returns the aggregate result to the main program, without first clearing the table sequence from B3.

This cellset program is then called in the main program:

  A
1 =10.(call(“CountSub.dfx”))
2 =A1.conj().groups(Year;sum(Count):Count)

A1 calls the subroutine CountSub.dfx ten times. A2 concatenates the aggregate results to get the count of the people born in each year for the 150,000,000 random records. Here are results of A1 and A2:

esProc_rules_sources_release_13

The main program runs smoothly without memory overflow. This means the variable is cleared from the cellset program as soon as the invocation of it is completed. For an executed subroutine, its variable will be regarded as useless and automatically cleared to avoid memory consumption, without specifically setting the variable as null. In the meantime, the data returned to the main program from the subroutine is still available. But if the subroutine CountSub.dfx returns the B3’s source table sequence instead of A5’s aggregate result, then the memory will be unable to hold so much data, leading to overflow.

Here’s another cellset program QuerySub.dfx:

  A
1 =connect(“mysql”)
2 =A1.query(“select * from States where left(Capital,1) = ?”,arg1)
3 return A2

A2 establishes a connection to MySQL database for querying records of states whose capitals begin with the specified letter. The program uses a parameter:

esProc_rules_sources_release_14

It is then called by the main program:

  A
1 =20.(call(“QuerySub.dfx”,char(64+#)))
2 =A1.conj()

Before the invocation, you can view the conncetion status through MySQL console:

esProc_rules_sources_release_15

An error occurs during the execution of the main program:

esProc_rules_sources_release_16

Because the current number of connections exceeds the maximum number allowed, as can be seen from the console:

esProc_rules_sources_release_17

During the invocation of the cellset program, the database connections in it won’t automatically close. To solve the issue, make some modification on QuerySub.dfx, where the database connections will be closed after the query is accomplished.

  A
1 =connect(“mysql”)
2 =A1.query(“select * from States where left(Capital,1) = ?”,arg1)
3 >A1.close()
4 return A2

Now restart MySQL database to execute the main program. This time the computation performs well. Here’s A2’s result in the main program:

esProc_rules_sources_release_18

Below is the connection status that MySQL console displays:

esProc_rules_sources_release_19

Since the database connections in the subroutine are closed once they’ve done their jobs, Max_used_connections only increases by 1 while 20 database connections are added during the execution of the main program.

Apart from closing database connections timely in the subroutine, an efficient way of using database connection is through default database connection, like the the database configured in the data source manager and the one connected by default on the server-side.

esProc_rules_sources_release_20

The following subroutine DefQuerySub.dfx queries data through a default database:

  A
1 =mysql.query(“select * from States where left(Capital,1) = ?”,arg1)
2 return A1

A1 queries the default MySQL database to get the records where the capitals begin with the specified letter. A2 retruns the result to the main program:

The subroutine is then called by the main program:

  A
1 =20.(call(“DefQuerySub.dfx”,char(64+#)))
2 =A1.conj()

In this way the main program works as well and gets the same result:

Here’s the connection status on the MySQL console:

esProc_rules_sources_release_21

Compared with the previous connection status, only 1 connection is added and Threads_connected is 2 because the default conncetion to MySQL is still on. Therefore the dabase connection can be effectively used with the default connection during a query, reducing the uncessary resource consumption.

The following subroutine QueryOneSub.dfx uses a cursor to query data but doesn’t close it after the query finishes:

  A
1 =mysql.cursor(“select * from States where left(Capital,1) = ?”,arg1)
2 =A1.fetch(1)
3 return A2

The subroutine is called in the main program as follows:

  A
1 =20.(call(“QueryOneSub.dfx”,char(64+#)))
2 =A1.conj()

But an error is reported during the execution:

esProc_rules_sources_release_22

This is because each database connection only allows one database cursor, which causes resource consumption problem if it isn’t closed in time.

The problem can be sovled by modifying the subroutine QueryOneSub.dfx:

  A
1 =mysql.cursor(“select * from States where left(Capital,1) = ?”,arg1)
2 =A1.fetch(1)
3 >A1.close()
4 return A2

Alternatively, you can make the subroutine return a database cursor directly but let the main program conroll the query, as shown by the cellset program QueryCursorSub.dfx

  A
1 =mysql.cursor(“select * from States where left(Capital,1) = ?”,arg1)
2 return A1

It is called in the main program like this:

  A B C
1 []    
2 for 20 =call(“testCellValueMemSub4.dfx”,char(64+A2))  
3   >A1=A1|B2.fetch(1) >B2.close()

A2 runs a loop in which for each circle B2 obtains the cursor returned from the subroutine, B3 feches the first-found record from the cursor and C3 closes the cursor. With an open cursor that uses certain resource, error will arise when the subroutine is called next time to generate a cursor.

A main program can operate normally if it closes the cursor each time the data retrievel is accomplished. After execution A1’s result is as follows:

esProc_rules_sources_release_23

There are some differences between the uses of file objects and those of the database connection and the cursor. A file object occupies very small memory space and won’t consume any database resource, so it’s no need to close it to release system resource. But, similar to a database cursor, a file cursor should be closed after it has done its job; otherwise the export of data to it could fail.

FAVOR (2)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website
(4)
  1. test
    admin2018-04-20 16:40 Reply
    • test2
      admin2018-04-20 16:49 Reply
    • test3Q
      mars2018-04-20 16:55 Reply
    • test4
      mars2018-04-20 17:05 Reply