esProc Integrates Heterogeneous Data Sources for Report Development

In addition to conventional databases, data sources of a reporting tool could also involve JSON files, MongoDB, txt files, Excel and HDFS files. Normally reporting tools can handle a single data source, but they are unable to manage various data sources requiring consolidation. Even though the data sources are of the same type, you still need to write a lot of code for the report development if they come from a database without effective computability.

However, esProc (free edition is available) can solve both problems. It offers a large number of functions for manipulating (semi)structured data, and supports heterogeneous data sources with the ability of integrating them. Besides, esProc provides a simple and easy-to-use JDBC interface, through which a reporting tool will call an esProc script as a database stored procedure, pass parameters to it, execute it and get the result set.

Below is the structure of integration of an esProc script and a reporting tool:

esProc_report_heterogeneous_datasource_1

This is an example of how esProc implements querying a multi-level subdocument in a JSON file for creating a report:

jsonstr.json has a subdocument, runners field, which has three fields – horseId, ownerColours and trainer – in which trainer contains a subfield – trainerId. The report needs to present the horseId, ownerColours and trainerId field for each subdocument within runners filed according to its serial number.

The source data:

[

{

“race”: {

“raceId”: “1.33.1141109.2”,

“startDate”: “2014-11-09T13:15:00.000Z”,

“raceClassification”: {

“classification”: “Novices'”

“raceType”: {

“key”: “H”

“raceClass”: 4,

“course”: {

“courseId”: “1.33”

“meetingId”: “1.33.1141109”

“numberOfRunners”: 2,

“runners”: [

{

“horseId”: “1.00387464”,

“trainer”: {

“trainerId”: “1.00034060”

“ownerColours”: “Maroon, pink sleeves, dark blue cap.”

{

“horseId”: “1.00373620”,

“trainer”: {

“trainerId”: “1.00010997”

“ownerColours”: “Black, emerald green cross of lorraine, striped sleeves.”

}

]

……

]

esProc script:

1	=file(“D: \\jsonstr.json”).read().import@j()
2	=A1(which).runners
3	=A2.new(horseId,trainer.trainerId:trainerId,ownerColours)

A1: Read in the JSON file.

A2: Retrieve runners field according to the serial number of each of its subdocument. Here which is a report parameter. The result is like this:

esProc_report_heterogeneous_datasource_3

A3: Get the desired fields to generate the result set the report needs. The result is as follows:

esProc_report_heterogeneous_datasource_4

The reporting tool calls the esProc script via JDBC, in a same manner as it calls the stored procedure from a normal database. The syntax is this: call esProc script name (para1…paraN). The result returned from the script participates in report creation in the form of a normal data set. Details are covered in the following documents: esProc Integration & Application: Integration with JasperReport and esProc Integration & Application: Integration with BIRT.

As a professional tool for processing data sources of reports, esProc can be used to implement more scenarios, as shown by the following examples.

Create a grouped report from a multi-level JSON file

Cells.json is a multi-level nested JSON file, which you want to display with a grouped report. The grouping fields are name, type and image.”xlink:href”. There is also a field with 3 subdocuments: custom. Identifier, custom. Classifier and custom. Output, which are of the same structure but contain different number of documents each.

The source data:

{

“cells”: [

{

“name”: “b”,

“type”: “basic.Sensor”,

“custom”: {

“identifier”: [

{

“name”: “Name1”,

“URI”: “Value1”

{

“name”: “Name4”,

“URI”: “Value4”

}

“classifier”: [

{

“name”: “Name2”,

“URI”: “Value2”

}

“output”: [

{

“name”: “Name3”,

“URI”: “Value3”

}

]

“image”: {

“width”: 50,

“height”: 50,

“xlink:href”: “data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIAAAACACAY
AAADDPmHLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAABEJAAARCQBQGfEV
AAAABl0RVh0U29mdHdhcmUAd3Vi8f+k/EREURQtsda2Or/+nFLqP6T5Ecd
i0aJFL85msz2Qxyf4JIumMAx/ClmWt23GmL1kO54CXANAVH+WiN4Sx7EoN
VkU3Z41BDHMeXAxjvOxNr7RJjzHX7S/jAflwBxkJr/RwiOpWZ883Nzd+Wp
ld7tkBr/SJr7ZHZbHZeuVweSnPfniocMAWYwcGBafH0OoPamFGAaY4ZBZj
mmFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY7/B94QnX08zxKLAAAAAElFTkSuQmCC”

}

……

]

}

esProc merges the three subdocuments into a single two-dimensional table, gives them a new field name ctype to be identified and joins them with the grouping fields. By doing so, a typical “table with subtables” will be created. esProc code is as follows:

	A
1	=file(“D:\\cells.json”).read().import@j()
2	=A1.cells.conj(custom.( identifier.new(name:cname,URI:cURI,”identifiler”:cType,A1.cells.name:name,A1.cells.type:type, A1.cells.image.’xlink:href’:image)\|classifier.new(name:cname,URI:cURI,”classifier”:cType, A1.cells.name:name,A1.cells.type:type,A1.cells.image.’xlink:href’:image)\|output.new (name:cname,URI:cURI,”output”:cType,A1.cells.name:name,A1.cells.type:type,A1.cells.image. ‘xlink:href’:image)))

A1: Import the JSON file. The relationships between different fields are shown below:

esProc_report_heterogeneous_datasource_6

A2: Convert the multi-level nested JSON file to a simple two-dimensional table. The sign “|” means concatenation. new function creates a two-dimensional table based on the source data. conj function calculates based on each record of the source table and concatenates the results. A2’s resulting two-dimensional table is what you need to create the report, as shown below:

esProc_report_heterogeneous_datasource_7

Then it’s easy for you to build a grouped report according to this esProc result.

Create a report with subreports using different JSON files

You want to create a report containing multiple subreports, where the main report and each subreport use different JSON files as their sources. Below is a selection of the source data:

MainReport.json

{“menu”: [

{

“id”: “A1”,

“value”: “File”,

“popup”: “Yes”

{

“id”: “A2”,

“value”: “Edit”,

“popup”: “No”

}

]

}

SubReport1.json

{“menuitem”: [

{“value”: “New”, “onclick”: “CreateNewDoc()”},

{“value”: “Open”, “onclick”: “OpenDoc()”},

{“value”: “Close”, “onclick”: “CloseDoc()”}

]

}

SubReport2.json

{“menuitem”: [

{“value”: “Undo”, “onclick”: “onUndo()”},

{“value”: “Redo”, “onclick”: “onRedo()”},

{“value”: “Copy”, “onclick”: “onTextCopy()”},

{“value”: “Past”, “onclick”: “onTextPast()”}

]

}

A reporting tool with support only for a single data source, such as Jasper and BIRT, would combine the multiple sources into one using JAVA classes, while esProc would use a simple script as follows:

	A
1	=file(“D:\\”+argFileName).read().import@j().#1

Read in the JSON file and get its first field, which is represented by “.#1”. By assigning different file names to the parameter argFileName, the report will receive different data sets, as the following shows:

esProc_report_heterogeneous_datasource_8

Perform a join between MongoDB and MySQL

emp1 is a MongoDB collection, whose CityID field is the logical foreign key pointing to CItyID field of cities, a MySQL table that has two fields – CityID and CityName. You need to query employee records from emp1 according to specified time period and switch its CityID field to CityName of cities.

esProc script:

	A
1	=MongoDB(“mongo://localhost:27017/test?user=root&password=sa”)
2	=A1.find@x(“emp1″,”{‘$and’:[{‘Birthday’:{‘$gte’:'”+string(begin)+”‘}},{‘Birthday’:{‘$lte’:'”+string(end)+”‘}}]}”,”{_id:0}”)
3	=myDB1.query(“select * from cities”)
4	=A2.switch(CityID,A3)
5	=A4.new(EID,Dept,CityID.CityName:CityName,Name,Gender)

A1: Connect to MongoDB.

A2: Query emp1 using MongoDB syntax by the specified time period. find function returns a cursor. @x option means closing the MongoDB connection automatically after the data is all fetched. The result would be like this:

esProc_report_heterogeneous_datasource_10

A3: Execute SQL statement to query the MySQL database. Here is the result:

esProc_report_heterogeneous_datasource_11

A4: Replace A2’s CityID field with the corresponding records in A3. switch function works as a left join does. To perform an inner join, use @i option. By performing field replacement using switch function, the key field linking the two tables can be accessed through the object. This object-type access is simple and intuitive, whose merits are especially obvious when performing a multi-level, multi-table join. Here is the result of switch:

esProc_report_heterogeneous_datasource_12

A5: Retrieve the desired fields to generate a table as follows:

esProc_report_heterogeneous_datasource_13

A7: By default the esProc script will return the last calculation cell (here is A5) to the reporting tool.

Perform joins between MongoDB collections

Both sales and emp are two-dimensional MongoDB collections. sales has SellerId field as its logical foreign key that points to emp’s EId field. You need to query orders in sales by the specified time period and associate with emp through a left join, and then present the result in a report.

esProc script:

	A
1	=MongoDB(“mongo://localhost:27017/test?user=root&password=sa”)
2	=A1.find(“sales”,”{‘$and’:[{‘OrderDate’:{‘$gte’:'”+string(begin)+”‘}},{‘OrderDate’:{‘$lte’:'”+string(end)+”‘}}]}”,”{_id:0}”).fetch()
3	=A1.find(“emp”,,”{_id:0}”).fetch()
4	=A1.close()
5	=join@1(A2:sales,SellerId;A3:emp,EId)
6	=A5.new(sales.OrderID:OrderID,sales.Client:Client,sales.Amount:Amount, sales.OrderDate:OrderDate,emp.Name:Name,emp.Dept:Dept,emp.Gender:Gender)

A1, A4: Connect to/disconnect from MongoDB.

A2: Query the sales collection using MongoDB syntax and fetch the cursor data into memory using fetch function (as the data size is small). Here is the result:

esProc_report_heterogeneous_datasource_15

A3: Retrieve data from the emp collection. Here is the result:

esProc_report_heterogeneous_datasource_16

A5: Join the two collections together. join function performs the join operation. @1 means left join and @f means full join. Without any of the options, this function performs an inner join. The result is as follows:

esProc_report_heterogeneous_datasource_17

A6: Retrieve the fields of interest from the result of join to generate a new two-dimensional table, as shown below:

esProc_report_heterogeneous_datasource_18

Join an Oracle table and an Excel file

Here are table1, which is stored in an Oracle database, and table2, an .xlsx file. Both have the same structure. Below are selections from them:

esProc_report_heterogeneous_datasource_19

You need to group table1 and table2 respectively by name, count the number of members in each group, calculate the sum for each group by active field, and then present the results from the two tables in sequence. The expected report layout is as follows:

esProc_report_heterogeneous_datasource_20

esProc script:

	A
1	=myDB1.query(“select name,count(*) Count,sum(isActive) Amount from table1 group by name”)
2	=file(“D:\\table2.xlsx”).importxls@t()
3	=A2.groups(name;count(~):Count,sum(active):Amount)
4	=join@f(A1,name;A3,name)
5	=A4.new(ifn(_1,_2).name:name,_1.Count:Count1,_1.Amount:Amount1, _2.Count:Count2,_2.Amount:Amount2)

A1: Execute the SQL statement to group and aggregate data from table1. Here is the result:

esProc_report_heterogeneous_datasource_22

A2: Import the Excel file and make the first row the column headers.

A3: Group and aggregate A2’s data. Here is the result:

esProc_report_heterogeneous_datasource_23

A4: Perform a left join between A1 and A3. You’ll get the following result:

esProc_report_heterogeneous_datasource_24

A5: Retrieve the fields you want from A4 and rename them. This is the result you’ll get:

esProc_report_heterogeneous_datasource_25

Join a txt file and a JSON file

structure.txt is a structured text separated by tabs. json.txt contains unstructured JSON strings. There is a foreign key relationship between the second field of structure.txt and part of the text in json.txt. Below are selections from them:

structure.txt

Name1 BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302

Name2 DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298

Json.txt

[

{ “Cluster A”: { “member”: { “Cluster A”: “BBBBBBBBBBBB This is Animal A” }, “name”: “Cluster A” } },

{ “Cluster B”: { “member”: { “Cluster B”: “DDDDDDDDDDDD This is Animal B” }, “name”: “cluster B” } }

]

You need to create a report to present the above relationship. This is the expected report layout:

Name1 BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302 Cluster A This is Animal A

Name2 DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298 Cluster B This is Animal B

esProc script:

	A
1	=file(“D: \\json.txt”).read().import@j()
2	=A1.new(#1.name:name,#1.member.(#1):cluster)
3	=A2.derive(left(cluster,(firstblank=pos(cluster,” “)-1)):key,right(cluster,len(cluster)-firstblank):value)
4	=file(“D:\\ structure.txt”).import()
5	=join(A4,_2;A3,key)
6	=A5.new(_1._1,_1._2,_1._3,_1._4,_1._5,_1._6,_1._7,_1._8,_2.name,_2.value)

A1-A3: Read in the JSON file, get the desired data and append a calculated column. Here’s the result:

esProc_report_heterogeneous_datasource_27

A4: Import the text file as a two-dimensional table. Note that esProc can import not only a local file, but a file stored on LANs or in the HDFS file system.

A5: A join operation. The result is as follows:

esProc_report_heterogeneous_datasource_28

A6: Retrieve the desired fields to generate a table as follows:

esProc_report_heterogeneous_datasource_29

Hi，You need to fill in the Username and Email！