esProc Accesses Databases: Using SQL

In esProc, we can use not only the SQL to retrieve data from databases, but also the preliminary database query results to perform further analyses and operations to solve some complicated problems which are difficult to deal with in SQL alone.

1. Database connection and disconnection

When SQL accesses databases, it should first connect to the designated database. Usually, there are two ways to connect to the database: direct connection in the data source manager or calling functions to connect in cellsets.

esProc_database_sql_1

	A
1	=connect(“demo”)

In practice, a database connection in the data source manager can be referenced by directly using data source name and is valid while the connection is on; for a database connected through functions, the connection object will be stored as cell value, and it can be referenced by using the cell name and is valid before the referenced connection object closes.

	A
1	=connect(“demo”)
2	=demo.query(“select * from STATES”)
3	=A1.query(“select * from STATES”)
4	>A1.close()

Similarly, there are two ways for database disconnection as well. Besides calling db.close() function to close the connection mentioned above, another way is to close the specified data source in data source manager:

esProc_database_sql_3

2. Using simple SQL

Using db.query() function can execute SQL commands in designated database. SQL statements can contain various query sub-clauses and database functions.

	A
1	=connect(“demo”)
2	=A1.query(“select * from STATES where ABBR like ‘N%’ order by POPULATION desc”)
3	>A1.close()

A2 queries states whose abbreviations begin with N, and sorts them by population in descending order. Results are as follows:

esProc_database_sql_5

A SQL statement can also uses other data in the cellset as parameters:

	A
1	=connect(“demo”)
2	[CA,ME,NM,SC,LA]
3	=A1.query(“select * from STATES where ABBR in (?) order by AREA”,A2)
4	>A1.close()

A4 queries states whose abbreviations fall into designated sequences and sorts them by area in ascending order. Results are as follows:

esProc_database_sql_7

In particular, we can make query function return only the first record by using @1 option with it. In this case, the query result will be returned as a sequence whose members are values of columns of the first record. If the record has only one field, a single value will be returned. For example:

	A
1	=demo.query@1(“select * from CITIES”)
2	=demo.query@1(“select NAME from CITIES where STATEID=5”)

Results of A1 and A2 will be those as follows when @1 option is used:

esProc_database_sql_9

Note that the return value is a single value or a sequence, instead of a table sequence, by using this method.

3. SQL that returns no results

If SQL statements that return no results are to be used to access a database, like create, update, delete and so on, then db.execute() function should be used. Meanwhile, since it is not necessary to assign values to cells, expressions begin with “>” instead of “=”. For example:

	A
1	=connect(“demo”)
2	>A1.execute(“update STATES set ABBR=’CAA’ where ABBR=’CA'”)
3	=A1.query(“select * from STATES where NAME = ‘California'”)
4	>A1.close()

After the statement in A2 modifies records in database table STATES, query result of A3 is as follows：

esProc_database_sql_11

SQL that returns no results can use parameters, too:

	A
1	=connect(“demo”)
2	CA
3	CAA
4	>A1.execute(“update STATES set ABBR=? where ABBR=?”,A2,A3)
5	=A1.query(“select * from STATES where NAME = ‘California'”)
6	>A1.close()

The statement in A4 restores the modified record in database STATES to its original values, and query result of A5 is as follows:

esProc_database_sql_13

4. Use SQL directly in cells

Besides db.query(sql) function and db.execute(sql) function, esProc can also use $(db)sql;… to directly execute SQL. If (db) is omitted, connect to the last database used. If parameters are used in the sql statement, put them after the semicolons. This method doesn’t add an equal sign before the sql statement and surround the statement with double quotation marks; nor does it support @1 option any more. In this case, execute function or query function is not needed to judge whether the statement returns a result set or not. select statement will return a result set, but other statements will return different values respectively. For example:

	A
1	$(demo)select * from STATES where ABBR like ‘N%’ order by POPULATION desc
2	[CA,ME,NM,SC,LA]
3	$select * from STATES where ABBR in (?) order by AREA;A2

The code has the same effect as that in section 2: Using simple SQL and gets the same result when executed. The code in A3 doesn’t specify the datasource name and thus use the last database used, which is still (demo).

Another situation:

	A
1	$(demo) create table TESTTEMP (ID int, NAME varchar(20))
2	$(demo) insert into TESTTEMP values (2,’Tom Smith’)
3	=demo.query(“select * from TESTTEMP”)
4	$(demo)update TESTTEMP set NAME=’Jane White’ where ID=2
5	=demo.query(“select * from TESTTEMP”)
6	$(demo)drop table TESTTEMP

A1 creates a new table sequence. A2 inserts a record into it. A4 modifies the records and A6 drops the table sequence. The updating of the database in A3 and A5 is as follows:

esProc_database_sql_16

This seems different from the case in the above section. Their return values are as follows:

esProc_database_sql_17

Values of A1 and A6 mean that the SQL statements don’t return result sets. Values of A2 and A4 mean one record has been updated respectively.

Code will become more concise using the format of $(db)sql;…. Please note that parameters should be separated from each other by semicolons and different returned results need to be identified.

5. Usage of query results of SQL

Query results of SQL can be used in esProc to perform operations, like filtering, sorting and combination, etc., to increase query efficiency or solve some complicated problems.

In the following examples, executions are performed by connecting to data source demo in data source manager based on query results of cell A1:

	A
1	=demo.query(“select * from STATES order by POPULATION desc”)

esProc_database_sql_18

For example, filter the data and select states with designated abbreviations:

	A
1	=demo.query(“select * from STATES order by POPULATION desc”)
2	[CA,ME,NM,SC,LA]
3	=A1.select(A2.pos(ABBR)>0)

esProc_database_sql_20

Also, aggregation computations can be performed on the data. For example, count the number of states whose abbreviations begin with C:

	A
1	=demo.query(“select * from STATES order by POPULATION desc”)
2	=A1.count(left(ABBR,1)==”C”)

esProc_database_sql_21

More significantly, we can group data in databases according to certain requirements, for example, group the data according to initials of abbreviations:

	A
1	=demo.query(“select * from STATES order by POPULATION desc”)
2	=A1.group(left(ABBR,1))

esProc_database_sql_22

A2 groups the data according to the initial of each state’s abbreviation. Double-click each group and see details.

It is thus clear that, different from the SQL grouping method that doesn’t provide real grouping and summarizing function, grouping with esProc is the real one, on which further computations can be performed. For example, select groups that contain three or more states, compute the total number and population of the states in each of these groups:

	A
1	=demo.query(“select * from STATES order by POPULATION desc”)
2	=A1.group(left(ABBR,1))
3	=A2.select(~.count()>=3)
4	=A3.new(left(ABBR,1):Initial, ~.count():Count, ~.sum(POPULATION):TotalPopulation)

The final results of A4 are:

esProc_database_sql_24

6. Comparison between common SQL statements and esProc syntax

1) Select * from

	A
1	=demo.query(“select * from EMPLOYEE”)

Query results are as follows:

esProc_database_sql_26

2) Select … from

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.new(EID, NAME, SURNAME, GENDER, BIRTHDAY, DEPT)
3	=demo.query(“select EID, NAME, SURNAME, GENDER, BIRTHDAY, DEPT from EMPLOYEE”)

Get designated fields from the table. A2 and A3 have the same query results as follows:

esProc_database_sql_28

3) As

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.new(EID, NAME+” “+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)
3	=demo.query(“select EID, NAME+’ ‘+SURNAME as FULLNAME, GENDER, year(now())-year(BIRTHDAY) as AGE, DEPT from EMPLOYEE”)

Compute FULLNAME according to NAME and SURNAME, and meanwhile, compute AGE according to BIRTHDAY. Basically, A2 and A3 have the same query results as follows:

esProc_database_sql_30

Note that AGE is computed in A3 by simply subtracting years and exact computation will be complicated since SQL hasn’t functions to directly compute age.

4) Where

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.new(EID, NAME+” “+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)
3	=A2.select(AGE<30)
4	=demo.query(“select EID, NAME+’ ‘+SURNAME as FULLNAME, GENDER, year(now())-year(BIRTHDAY) as AGE, DEPT from EMPLOYEE where year(now())-year(BIRTHDAY)<30”)

Select employees who are younger than 30 years old. In esProc, we can perform further computations using the existing results. Query results of A3 are as follows:

esProc_database_sql_32

Query the same results in A4 with SQL but the syntax is much more complicated. And we cannot get accurate results due to the inexact method of computing age.

5) Count, sum, avg, max and min

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.new(EID, NAME+” “+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)
3	=A2.count(AGE<30)
4	=demo.query(“select count(EID) from EMPLOYEE where year(now()) – year(BIRTHDAY)-(case when month(now())<month(BIRTHDAY) then 1 WHEN month(now())=month(BIRTHDAY) and day(now())<day(BIRTHDAY) then 1 else 0 end)<30”)

Count the total number of employees who are younger than 30 years old. In esProc, the result can be used to perform further computation. Query result of A3 is as follows:

esProc_database_sql_34

This time, A4 uses a more exact method to compute AGE and gets a query result that is consistent with that of A3. But with this method, we cannot use the existing results and the statements are more complicated.

The usage of SQL functions, such as sum, avg, max and min, is similar to that of count.

6) Distinct

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.id(DEPT)
3	=demo.query(“select distinct DEPT from EMPLOYEE”)

Query which departments the employee information come from. A2 and A3 have the same results. Query results are as follows:

esProc_database_sql_36

7) Order by

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.new(EID, NAME+” “+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)
3	=A2.select(AGE<30).sort(AGE:-1, FULLNAME ).new(FULLNAME, AGE)
4	=demo.query(“select FULLNAME, AGE from (select NAME+’ ‘+SURNAME as FULLNAME, year(now())-year(BIRTHDAY)-(case when month(now()) < month(BIRTHDAY) then 1 WHEN month(now())=month(BIRTHDAY) and day(now())<day(BIRTHDAY) then 1 else 0 end) as AGE from EMPLOYEE) where AGE<30 order by AGE desc, FULLNAME”)

Select employees who are younger than 30 years old, sort them by age in descending order; meanwhile, sort employees of the same age by FULLNAME in ascending order. A3 and A4 have the same query results as follows:

esProc_database_sql_38

In SQL, it is complicated to compute age and existing results cannot be used, so this time A4 tries to simplify the statements with nested query. However, the process is still complex.

8) And, or, not and <>

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.new(EID, NAME+” “+SURNAME: FULLNAME, GENDER, age(BIRTHDAY):AGE, DEPT)
3	=A2.select(AGE<30 && left(FULLNAME,1)== “S” ).new(FULLNAME, AGE)
4	=demo.query(“select FULLNAME, AGE from (select NAME+’ ‘+SURNAME as FULLNAME, year(now())-year(BIRTHDAY)-(case when month(now()) < month(BIRTHDAY) then 1 WHEN month(now())=month(BIRTHDAY) and day(now())<day(BIRTHDAY) then 1 else 0 end) as AGE from EMPLOYEE) where AGE<30 and left(FULLNAME, 1)=’S'”)

Select employees who are younger than 30 years old and whose initials of full names are S. Results are as follows:

esProc_database_sql_40

It can be seen that and is represented in esProc by the operator && and two equal signs == are used to judge whether things are equal or not. These are in line with the customs of many program languages. Similarly, in esProc, or is represented by the operator “||“, not by“!”, and<>by“!=”.

9) Like

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.select(like(NAME,”*a”)).new(NAME+” “+SURNAME:FULLNAME)
3	=demo.query(“select NAME+’ ‘+SURNAME as FULLNAME from EMPLOYEE where NAME like ‘%a'”)

Query the full names of employees whose names are ended by a. Query results are as follows:

esProc_database_sql_42

The use of like function requires different wildcard characters in different databases. In this example, for instance, percent “%” is used to represent zero or multiple arbitrary characters; while in some other databases, asterisk “*” is used to represent the same things. esProc defines the same syntax for any database.

10) Group

	A
1	=demo.query(“select * from EMPLOYEE”)
2	=A1.new(NAME+” “+SURNAME: FULLNAME, DEPT).group(DEPT)
3	=A1.groups(DEPT;count(~):ECOUNT)
4	=demo.query(“select DEPT, count(*) as ECOUNT from EMPLOYEE group by DEPT order by DEPT”)

group function can be used in esProc to group records of employees by departments as shown below:

esProc_database_sql_44

It can be seen that the result of grouping with esProc is that records are divided into multiple groups. These groups can be used to perform further computations as required in esProc.

A3 directly computes data grouping and summarizing with the esProc function while A4 does the job with SQL. They get the same results. SQL doesn’t have the real “group” concept, so it can only perform aggregate computations during query by groups. Results are as follows:

esProc_database_sql_45