Structured data processing is the most common computation mode in applications. Such computations are generally performed in databases with SQL, but we often get into such situations where SQL fails to work or is unworkable, for example, it is not suitable to use SQL in the case of complex computation logic and enormous process steps. And SQL cannot be used directly where the original data are stored in the files other than in the database, or multiple heterogeneous databases are involved, it is high time that we adopt script languages to achieve these computations.
There are four common script languages: esProc,Perl,Python,R, with their own characteristics, each of which can be used to complete such operations. In this paper, the writer, as a beginner, will make a test on the 4 script languages for 20 common structured data operations. In the end, their performances will be presented hereby comparison in a simple way.
What is described in this paper covers only the structured data computation capability, the full functions of these 4 script languages are not mentioned here. Most of parts only involve some information about 2D structured data table. As I got to study some products only for a short time, the related information can be available only from the official website or documents. It would be possible that some results could be flawed.
Part (I):Retrieving Table from Database
Connect the database, load 2D table from the database into memory, and store the datain the objects provided by the product.
esProc
>db=connect(“oracle”);
//esProc configures detail definition for data connection in the environment information; only the name is written here
=db.query(“select * from tbl”);
//A sql statement is executed to return the table sequence (object of 2D structured data table in esProc) automatically, and then generate column name automatically
>db.close()
Perl
$dbh = DBI->connect(“dbi:Oracle:host=192.168.1.11;sid=ORCL”,$oracleid,”,{AutoCommit=>0})||
die “Don’t connect database ! “; #Create a database connection
$sql=”select * from tbl”;
$sth=$dbh->prepare($sql);
$sth->execute(); # Execute sql
while(@result=$sth->fetchrow){ # Fetch data row-by-row
push @e,[@result]; # The row array pointer is stored in another array, thus to form a 2D table; [] represents array reference (pointer)
}
$sth->finish;
$dbh->disconnect; #Close the connection
Notes:
1. The syntax of [@result]makes the beginner confused.
2. As there is no column name, the column number needs to be noted manually.
3. The array itself is @array, while the array element value uses $array[] with different prefix. Beginners tend to make mistakes.
4. If an associative array or structure is used to store the column name, this array will be twice as long as it is used before and take up too much memory.
Python
It is a little more advanced than Perl in terms of the interface. When executing sql, it returns a 2D array directly.
#!python
import cx_Oracle
import os
def connectDB(dbname=’ORCL’): #Define a function to connect the database
if dbname==’ORCL’:
connstr=’yj/yj@192.168.1.11:1521/ORCL’
db=cx_Oracle.connect(connstr)
returndb
def sqlSelect(sql,db): # Define a function to execute select statement
cr=db.cursor()
cr.execute(sql)
rs=cr.fetchall() #Return a 2D array directly, which does not include metadata
cr.close()
returnrs
if __name__==’__main__’: #Main program, to connect the database, and execute sql
db=connectDB()
sql=’select * from dave’
rs=sqlSelect(sql,db)
for x in rs:
print (x)
The object cursor of Python also provides the property of column name, which can be obtained by cursor.description, for example:
def sqlSelect(sql,db):
#include:select
cr=db.cursor()
cr.execute(sql)
names=[f[0] for f in cr.description]
for row in cr.fetchall():
for pair in zip(names,row):
print (‘%s:%s’ %pair)
Output by calling this function is as follows:
ID:1
NAME:anhui
PHONE:13888888888
ID:2
NAME:dave
PHONE:138888888888
……
Some beginners will be also puzzled over program blocks of Python, as it does not use {} to define the blocks, no semicolon at the end of each statement. It relies on indentation to define the blocks, so that indentation becomes more significant. When I first wrote the Python program, the Tab is used to indent the same spaces as the previous line(whose code is copied).As a result, an error occurred. Then I changed to use the space for indentation, but the error still repeated. In the end, this issue was removed by duplicating indentation from previous line. Later I found out that all the indentations in a program block must be kept consistent, whether you use Tab, or space.
Although esProc also defines the code blocks by using natural indentation instead of symbols such as {}, etc., the code has been written into the cell to make sure the scope of the code block will be clear at a glance, so that it is much easier for us to avoid malposition due to misused characters.
R
As convenient as esProc, R can load data into the object of data frame directly, while data frame is similar to the table sequence in esProc.
library(RODBC)
odbcDataSources()
conn<-odbcConnect(“sqlsvr”,uid=”sa”,pwd=”root”)
originalData<-sqlQuery(conn,”select * from Customers”) #Directly generate a data frame
odbcClose(conn)
Conslusions: