The More Convenient Data Computing Scripts I
III Ordered Set
Case Description
Suppose that a telecommunication equipment manufacturer needs to compute the monthly link relative ratio of sales value (i.e. the increase percent of sales value of each month compared with that of the previous month). The sales data is stored in the sales table with the main fields including salesMonth, and salesAmount.
SQL solution
select salesAmount, salesMonth,
(case when
prev_price !=0 then ((salesAmount)/prev_price)-1
else 0
end) compValue
from (select salesMonth, salesAmount,
lag(salesAmount,1,0) over(order by salesMonth) prev_price
from sales) t
The popular SQL-92 has not introduced the concept of serial number, which adds many difficulties to the computation. Considering this, the designer of SQL-2003 has partly remedied this drawback. For example, the window function lag() is used to retrieve the next record in this example.
In addition, in the above statement, the “case when” statement is used to avoid the error of division by zero on the first record.
esProc Solution
sales.derive(salesAmount / salesAmount [-1]-1: compValue)
The derive() is an esProc function to insert the newly computed column to the existing data. The new column is compValue by name, and the algorithm is “(Sales value of this month/Sales value of previous month)-1”. The “[n]” is used to indicate the relative position, and so [-1] is to represent the data of the previous month.
On the other hand, for the data of the first record, the additional procedure for division by zero is not required in esProc.
From the above example, even if using SQL-2003, the solution to such problem is lengthy and complex, while the esProc solution is simple and clear owing to its support for the ordered set.
Moreover, SQL-2003 only provides the extremely limited computation capability. For example, esProc user can simply use the ”{startPosition,endPosition}” to represent the seeking of a range, and simply use ”(-1)” to represent the seeking of the last record. Regarding the similar functionality, it will be much harder for SQL user to implement.
In the practical data analysis, a great many of complex computations are related to the order of data. SQL users are unable to handle such type of computations as easily as esProc users because SQL lacks of the concept of Being Ordered.
IV Object Reference
An insurance enterprise has the below analysis demands: to pick out the annual outstanding employees (Employee of the Year) whose Department Manager has been awarded with the President Honor. The data are distributed in two tables: department table (main fields are deptName, and manager), and employee table (main fields are empName, empHonor, and empDept).
empHonor has three types of values: null value; ”president’s award”, PA for short; and ”employee of the year”, EOY for short. There are 2 groups of correspondence relations: empDept and deptName, and Manager and empName.
SQL solution
SELECT A.*
FROM employee A,department B,employee C
WHERE A.empDept=B.deptName AND B.manager=C.empName AND A.empHonor=‘EOY’ AND C.empHornor=‘PA’
SQL users can use the nested query or associated query to solve such kind of problems. In this case, we choose the association query that is both concise and clear. The association statement behind the “where” has established the one-to-many relation between deptName and empDept, and the one-to-one relation between manager and empName.
esProc Solution
employee.select(empHonor:”EOY”,empDept.manager.empHornor:”PA”)
esProc solution is intuitive: select the employee of “EOY” whose Department Manager has be awarded with “PA”.
The SQL statement to solve such kind of question is lengthy and not intuitive. In fact, the complete association query language is “inner join…on…” style. This statement is simplified in the above example. Otherwise it will be much hard to understand.
esProc users can use ”.” for object reference. Such style is intuitive and easy to understand. The complex and lengthy association statement for multiple tables can thus be converted to the simple object access, which is unachievable for SQL. When there are more and more tables, the complexity of SQL association query will rise in geometric series. By comparison, the esProc user can always access the data intuitively and easily by taking the advantage of object reference.
Regarding the multi-table associations of complex computation, esProc can handle it more intuitively and conveniently than SQL.
From the comparison of the above four examples, we can see that esProc is not only characterized with step-by-step computation, complete set-lization, sorted sets, and object reference. The analysis style is intuitive, the syntax style is agile, and the function is powerful. esProc is a tool especially designed for mass data computation, and a more convenientl database computing script.