JSON format multilevel semi-structured data is commonly seen in internet applications. Java provides just the class library for parsing JSON data, but to perform in-depth calculations, complex hardcoding is required.
esProc supports set-operations, order-related calculations and dynamic script execution, so it can be used as the class library for Java to make JSON data manipulation much easier. The Java application can access the esProc result through JDBC. For details, see How to Use esProc as the Class Library for Java.
Following is the typical JSON data manipulation cases, and their esProc solutions.
Grouping and aggregating JSON data
order.JSON contains the ordering records. The requirement is to calculate the sales amount that each client contributes per month in a specified time period. Below is a selection of the source data:
[{OrderID:”26″,Client:”TAS”,SellerId:”1″,Amount:2142,OrderDate:”05-08-2009 00:00:00″},{OrderID:”33″,Client:”DSGC”,SellerId:”1″,Amount:613,OrderDate:”14-08-2009 00:00:00″},
{OrderID:”84″,Client:”GC”,SellerId:”1″,Amount:89,OrderDate:”16-10-2009 00:00:00″}, {OrderID:”133″,Client:”HU”,SellerId:”1″,Amount:1420,OrderDate:”12-12-2010 00:00:00″}, {OrderID:”32″,Client:”JFS”,SellerId:”3″,Amount:468,OrderDate:”13-08-2009 00:00:00″}… |
The code:
A | |
1 | =file(“D:\\order.JSON”).read().import@j() |
2 | =A1.select(OrderDate>=argBegin && OrderDate<=argEnd) |
3 | =A2.groups(month(OrderDate):Month,Client;sum(Amount):subtotal) |
Import the JSON file as a two-dimensional table and perform a conditional query and then the grouping and aggregation. argBegin and argEnd are parameters. Here’s the result:
Handling JSON format HTTP flows
testServlet returns JSON format strings of employee information. Now query the data according to the given condition and return result to Java main program in JSON format.
A | |
1 | =httpfile(“http://localhost:6080/myweb/servlet/testServlet?table=employee&type=json”) |
2 | =A1.read() |
3 | =A2.import@j() |
4 | =A3.select(${where}) |
5 | =export@j(A4) |
Read in the http file object and parse the JSON format strings to generate a two-dimensional table; and then filter the table according to the condition and export the result as JSON format strings. The parameter where is the dynamic querying condition, like BIRTHDAY>=date(1981,1,1) && GENDER==”F”.
Converting cell phone logs into structured data
The cellphone.log file contains the JSON format cell phone log, in which general field and ping field are subdocuments that each has one record. Another subdocument cap_main has multiple records. The requirement is to display cap_main in a list table, with each record appended by general’s mtmodel field and networktype field and ping’s ssid field.
Below is the source data:
{“cap_main”:[{“pci”:0,”snr”:0,”rxlev”:-81,”locationtype”:161,”frequency”:0,”dl_speed”:0,”longitude_baidu”:103282882, “networktype”:”WIFI”,”lac”:35042,”capid”:1395672278839,”lanip”:”192.168.1.129″,”longitude_wgs84″:0, “bssid”:”00:14:d5:00:21:1a”,”dstype”:”EDGE”,”time”:”2014-03-24 22:44:38″, “encrypttype”:””,”ci”:52532,”ssid”:”\”open\””,”radius”:71.69451141357422,”rssi”:-47,”ul_speed” :0,”channelnumber”:0,”latitude_wgs84″:0,”latitude_baidu”:24770267},{“pci”:0,”snr”:0,”rxlev”: -81,”locationtype”:161,”frequency”:0,”dl_speed”:0,”longitude_baidu”:103282882,”networktype” :”WIFI”,”lac”:35042,”capid”:1395672279839,”lanip”:”192.168.1.129″,”longitude_wgs84″:0, “bssid”:”00:14:d5:00:21:1a”,”dstype”:”EDGE”,”time”:”2014-03-24 22:44:39″,”encrypttype”:””,”ci”:52532,”ssid”:”\”open\””,”radius”:71.69451141357422,”rssi”:-47,”ul_speed”:0, “channelnumber”:0,”latitude_wgs84″:0,”latitude_baidu”:24770267}… |
The code:
A | |
1 | =file(“D:\\ cellphone.log”).read().import@j() |
2 | =A1.cap_main.derive(A1.general.mtmodel,A1.general.networktype,A1.ping.ssid) |
esProc can read in the JSON file while keeping its multilevel structure, and then it can use periods to access fields through the multiple levels. The derive function adds the computed columns. Here’s the result:
Querying multilevel JSON subdocuments
JSONstr.JSON’s runners field is a subdocument that has 3 fields: horseId, ownerColours and trainer, in which trainer has a subfield trainerId. The requirement is to query the subdocument’s horseId, ownerColours and trainer fields according to the corresponding document’s sequence number.
Below is the source data:
[ { “race”: { “raceId”: “1.33.1141109.2”, “startDate”: “2014-11-09T13:15:00.000Z”, “raceClassification”: { “classification”: “Novices'” }, “raceType”: { “key”: “H” }, “raceClass”: 4, “course”: { “courseId”: “1.33” }, “meetingId”: “1.33.1141109” }, “numberOfRunners”: 2, “runners”: [ { “horseId”: “1.00387464”, “trainer”: { “trainerId”: “1.00034060” }, “ownerColours”: “Maroon, pink sleeves, dark blue cap.” }, { “horseId”: “1.00373620”, “trainer”: { “trainerId”: “1.00010997” }, “ownerColours”: “Black, emerald green cross of lorraine, striped sleeves.” } ] }, …… ]
|
The code:
1 | =file(“D: \\JSONstr.JSON”).read().import@j() |
2 | =A1(which).runners |
3 | =A2.new(horseId,trainer.trainerId:trainerId,ownerColours) |
Import the JSON file, retrieve the document’s runners field (the subdocument), and get the desired fields from runners. trainerId comes from a child document of the subdocument. A3 gets the result as follows:
Converting multilevel JSON to structured data
Cells.JSON is a multilevel nested JSON file which needs to be converted to a two-dimensional structured table according to the grouping fields name, type and image. “xlink:href”. The field containing detailed data has 3 subdocuments: custom. Identifier, custom. Classifier and custom. Output, which are of the same structure but contain different number of documents each.
Below is the source data:
{ “cells”: [
{ “name”: “b”, “type”: “basic.Sensor”, “custom”: { “identifier”: [ { “name”: “Name1”, “URI”: “Value1” }, { “name”: “Name4”, “URI”: “Value4” } ], “classifier”: [ { “name”: “Name2”, “URI”: “Value2” } ], “output”: [ { “name”: “Name3”, “URI”: “Value3” } ] }, “image”: { “width”: 50, “height”: 50, “xlink:href”: “data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIAAAACACAYAAADDPmHLAAAABHNC SVQICAgIfAhkiAAAAAlwSFlzAABEJAAARCQBQGfEVAAAABl0RVh0U29mdHdhcmUAd3Vi8f+k/ERE URQtsda2Or/+nFLqP6T5Ecdi0aJFL85msz2Qxyf4JIumMAx/ClmWt23GmL1kO54CXANAVH+WiN4 Sx7EoNVkU3Z41BDHMeXAxjvOxNr7RJjzHX7S/jAflwBxkJr/RwiOpWZ883Nzd+Wpld7tkBr/SJr7ZHZ bHZeuVweSnPfniocMAWYwcGBafH0OoPamFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY4ZBZjmmFGA aY7/B94QnX08zxKLAAAAAElFTkSuQmCC” } }, …… ] }
|
The code:
A | |
1 | =file(“D:\\cells.JSON”).read().import@j() |
2 | =A1.cells.conj(custom.(identifier.new(name:cname,URI:cURI,”identifiler”:cType,A1.cells.name:name,A1.cells.type:type, A1.cells.image.’xlink:href’:image)|classifier.new(name:cname,URI:cURI,”classifier”:cType, A1.cells.name:name,A1.cells.type:type,A1.cells.image.’xlink:href’:image)|output.new(name:cname, URI:cURI,”output”:cType,A1.cells.name:name,A1.cells.type:type,A1.cells.image.’xlink:href’:image))) |
esProc merges the three subdocuments into a single two-dimensional table, gives them a new field name ctype to be identified and appends the grouping fields to the table to finally generate a typical “table with subtables”. Below is the concatenation result of A2:
Exporting the multilevel JSON file as CSV file
Below is the original json file:
{ “Title” : {
“name” : “ABC”, “id” : “1”, “job”: “Teacher” }, “Circle”:{ “area”:”2R” }, “Triangle”:{ “length”:”45″ } }
|
The expected CSV file will present the source data in two columns, as shown below:
|
The difficulties are that the root field names should be combined with the subfield names and that the subfield names and values should each be presented in a separate column. Here’s the code:
1 | =file(“d:\\source.json”).read().import@j() |
2 | =A1.fno().(pjoin([f=A1.field(~)].fname():key,f.array():value)) |
3 | =A2.(~.record@i([A1.fname(#),null],1)) |
4 | =A3.conj() |
5 | =file(“D:\\result.csv”).export@c(A4) |
Read in the JSON file and divide data into 3 groups according to the 3 root documents, as shown below (A2):
Next, insert [root field name,null] into each group as the first record and concatenate all the groups of records and finally export them. The exported CSV file is as follows:
Joining a txt file and a JSON file
structure.txt is a tab-separated structured file. JSON.txt contains unstructured JSON strings. The second field of structure.txt and part of the json.txt are related through the foreign key. The requirement is to join the two files into a two-dimensional table. Below are selections of them.
structure.txt
Name1 BBBBBBBBBBBB 99.40 166 1 0 1 166 334 499 3e-82 302 Name2 DDDDDDDDDDDD 98.80 167 2 0 1 167 346 512 4e-81 298 |
JSON.txt
[ { “Cluster A”: { “member”: { “Cluster A”: “BBBBBBBBBBBB This is Animal A” }, “name”: “Cluster A” } }, { “Cluster B”: { “member”: { “Cluster B”: “DDDDDDDDDDDD This is Animal B” }, “name”: “cluster B” } } ] |
The code:
A | |
1 | =file(“D: \\JSON.txt”).read().import@j() |
2 | =A1.new(#1.name:name,#1.member.(#1):cluster) |
3 | =A2.derive(left(cluster,(firstblank=pos(cluster,” “)-1)):key,right(cluster,len(cluster)-firstblank):value) |
4 | =file(“D:\\ structure.txt”).import() |
5 | =join(A4,_2;A3,key) |
6 | =A5.new(_1._1,_1._2,_1._3,_1._4,_1._5,_1._6,_1._7,_1._8,_2.name,_2.value) |
Read in the JSON file, retrieve the desired fields and subdocument to create a two-dimensional table and append a computed column to it. Below is A3’s result:
Then import the text file as a table, perform a join between the table and A3 and retrieve the desired fields to generate the final result A6, as shown below:
Dynamically exporting JSON data to the database
s.json contains multilevel subdocuments, in which the LIST subdocument and SERVICES subdocument each have a variable number of fields. The requirement is to write s.json into two database tables – groups and service – by filling the values of GROUPNAME and the names of fields under SERVICES in the database table groups, and entering the detailed fields under SERVICES into the database table service. The two database tables are related through groupid.
Below is the source data:
{ “SUCCESS”: [ { “MESSAGE”: “IMEI Service List”, “LIST”: { “MOVISTAR SPAIN”: { “GROUPNAME”: “MOVISTAR SPAIN”, “SERVICES”: { “3”: { “SERVICEID”: 32, “SERVICENAME”: “MOVISTAR NOKIA INSTANTE”, “CREDIT”: 4, “TIME”: “1-30 Minutes”, “INFO”: “<p style=\”text-align: center;\”>…… </p>”, “Requires.Network”: “None”, “Requires.Mobile”: “None”, “Requires.Provider”: “None”, “Requires.PIN”: “None”, “Requires.KBH”: “None”, “Requires.MEP”: “None”, “Requires.PRD”: “None”, “Requires.Type”: “None”, “Requires.Locks”: “None”, “Requires.Reference”: “None” }, “8”: { “SERVICEID”: 77, “SERVICENAME”: “MOVISTAR NOKIA 20 NCK”, … |
The code:
A | B | C | D | |
1 | =file(“E:\\s.json”).read().import@j() | |||
2 | =create(Groupname,groupid) | |||
3 | =create(Serviceid,Servicename,groupid,Credit,Time,INFO,Network,Mobile,Provider,PIN,KBH,MEP,PRD,Type, Locks,Reference) |
|||
4 | for A1.SUCCESS | =A4.LIST | ||
5 | for B4.fno() | =B4.field(B5) | ||
6 | =C5.SERVICES | |||
7 | for C6.fno() | =C6.fname(C7) | ||
8 | =C6.field(C7) | |||
9 | =A2.record([C5.GROUPNAME,D7]) | |||
10 | =A3.record([D8.#1,D8.#2,D7,D8.#3,D8.#4,D8.#5,D8.#6,D8.#7,D8.#8, D8.#9,D8.#10,D8.#11,D8.#12,D8.#13,D8.#14,D8.#15]) |
|||
11 | =mssql.update(A2,groups,Groupname,groupid;groupid) | |||
12 | =mssql.update(A3,services,Serviceid,Servicename,groupid,Credit,Time,INFO,Network,Mobile,Provider,PIN,KBH,MEP, PRD,Type,Locks,Reference;Serviceid) |
Read in the JSON file while retaining its original structure; loop through each subdocument of SUCCESS (B-D10), each field of LIST (C5-D10) and each field of SERVICES (D7-D10) to append the desired records to the empty two-dimensional tables A2 and A3 respectively, and then export the tables to the database. In the above code, several functions for accessing the structural elements of a two-dimensional table are used. The fno function finds the number of fields, the fname function gets the field names by serial numbers, and the field function gets the field values by serial numbers.
Below are A2 and A3, with desired records appended