SQL is a sophisticated and all-around database programming language, making most instances of structured-data computing a painless experience. Yet there are still some instances that are difficult to handle in SQL in computer programming.
Here’s an example. duty is a MySQL table of shift schedule, in which an employee ...
2016-03-16 154 0
There are many different types of report data sources, including relational databases, NoSQL databases, local files, HDFS files and JSON data stream. It’s easy to build a report with a single data source, but it’s difficult to build one that needs data from more than one type of data source, i.e. heterogeneous data sou...
2016-03-15 115 0
JSON format multilevel semi-structured data is commonly seen in internet applications. Java provides just the class library for parsing JSON data, but to perform in-depth calculations, complex hardcoding is required.
esProc supports set-operations, order-related calculations and dynamic script execution, so it can be u...
2016-03-11 148 0
A reporting architecture consists of three layers from bottom to top – storage layer, computing layer and displaying layer. The storage layer contains raw data, which may be stored in a relational database (RDB), a NoSQL database, and a local or HDFS file, or may just be a JSON stream. The computing layer can access th...
2016-03-07 123 0
This article aims to test performance of esProc in processing text files, using an example of data query and filtering and through the comparison with Java and Perl doing the same processing.
Test data is some order records stored in orders.txt file. The imported data is as follows:
ORDERID CLIENT &nbs...
2016-02-15 232 0
Sometimes you need to transpose a database table (or a text file) in JAVA before exporting it out. Different types of transposition require different SQL techniques, and at times you have to do low-level programming in JAVA. That is quite difficult.
As esProc that serves as JAVA class library supports dynamic scripting...
2015-12-14 247 0
The group operations performed on tabular data generated from text files include algorithms like grouping and aggregation, obtaining distinct values, group merging and so on, which can be realized using basic JAVA class libraries. But JAVA provides only limited support for the structured-data computing, generating comp...
2015-12-11 148 0
It’s hard to develop code for performing file comparisons – including finding common values or modified records, comparison of big files or multiple fields or files with different structures, and other scenarios, because generally they involve set operations, structured-data handling and multithreaded parallel pr...
2015-12-11 167 0
Having complex formats and unstandardized data, many of the text files are incomputable. They, when used as the data source, need preprocessing to be converted to the structured data or the database table for further query or statistics. Though we can perform this conversion using high-level languages like JAVA, or scr...
2015-11-09 200 0
The group operations performed on tabular data generated from text files include algorithms like group and aggregate, obtaining distinct values, group merging and so on, which can be realized through high-level languages like JAVA or scripting languages like Python. But these two types of languages provide only limited...
2015-11-07 108 0
In esProc, constants can be directly stored in cells and be referenced in expressions. For the basic usage of constants, please refer to esProc Getting Started: Constants. Actually, when a constant stored in a cell is referenced, it means the value of the cell is used as a variable. The parameters and variables in esPr...
2015-10-28 94 0
esProc External Memory Computing: Merge and Join Cursor Data discussed how to have several big data tables joined and retrieve data from them. But we notice that, though it’s convenient to handle big data with the cursor, there’s the restriction of one-way retrieval from front to back. Hence data sorting becomes the pr...
2015-10-16 140 0
Through esProc Parallel Computing: Data Redundancy, we know that for cluster computation data files can be stored in separate servers, with data redundancy solution adopted. Thus the parallel program will find the appropriate node for a subtask according to where the needed files are stored. The fact is, during real wo...
2015-10-13 188 0
esProc Parallel Computing: Cluster Computing explains how to use cluster computing to process massive data or handle tasks involving a large amount of computation.
1. The problem with parallel computing
Parallel computing requires that the dfx file to be invoked be stored in all nodes, and that the data files the subta...
2015-10-09 165 0
In SQL, usually we can only group a table automatically according to its own filed(s). When the grouping criterion comes from another table, or is an external parameter or a conditional list, SQL has to handle the grouping in a very roundabout way. Some cases even require the dynamic criteria, which need to be generate...
2015-09-29 118 0