category:Text Processing

Code Examples of Handling Large Text Files in esProc

To deal with a large text file that can’t be wholly held in the memory, you need to import it in segments and process each segment separately. It’s knotty. Sometimes even multithreaded parallel processing is needed so as to increase performance. But since most of the programming languages don’t support basic class libr...

2016-03-23 45 0 0

Examples of Handling JSON Data with esProc

JSON format multilevel semi-structured data is commonly seen in internet applications. Java provides just the class library for parsing JSON data, but to perform in-depth calculations, complex hardcoding is required. esProc supports set-operations, order-related calculations and dynamic script execution, so it can be u...

2016-03-11 22 0 0

Performing Group Operations on Text-based Tabular Data in JAVA

The group operations performed on tabular data generated from text files include algorithms like grouping and aggregation, obtaining distinct values, group merging and so on, which can be realized using basic JAVA class libraries. But JAVA provides only limited support for the structured-data computing, generating comp...

2015-12-11 22 0 0

Performing File Comparisons in Java: Cases and Solutions

It’s hard to develop code for performing file comparisons – including finding common values or modified records, comparison of big files or multiple fields or files with different structures, and other scenarios, because generally they involve set operations, structured-data handling and multithreaded parallel pr...

2015-12-11 20 0 0

Examples on How esProc Converts Text Files to Structured Data

Having complex formats and unstandardized data, many of the text files are incomputable. They, when used as the data source, need preprocessing to be converted to the structured data or the database table for further query or statistics. Though we can perform this conversion using high-level languages like JAVA, or scr...

2015-11-09 45 0 0

Performing Group Operations on Text-based Tabular Data in esProc

The group operations performed on tabular data generated from text files include algorithms like group and aggregate, obtaining distinct values, group merging and so on, which can be realized through high-level languages like JAVA or scripting languages like Python. But these two types of languages provide only limited...

2015-11-07 18 0 0

Perform File Comparisons using esProc

You can handle simple file comparisons with the console command, Java, python and perl. But all of them are not good at performing set operations and structured computations. This will result in complicated code for multi-threaded processing and cumbersome process in comparing multiple fields, big files and the files w...

2015-08-20 32 0 0

How esProc Implements Text Processing

Encapsulated lots of functions for handling structured file computing, esProc can import text files with complex formats, carry out cursor-style processing with big files and simplify multithreaded parallel processing. Usually there are three modes in which esProc can be applied: a standalone mode, the execution from c...

2015-08-19 26 0 0

esProc Improves Text Processing – Insert Summary values into Grouped Data

The usual way to insert summary values into the grouped data is to process data group by group. Import a group of data, append them and their summary value to a new file and then do the same with the next group, and so on. But it is not easy to realize this in hard coding. esProc, however, supports group cursor with wh...

2015-01-05 18 0 0

esProc Improves Text Processing – Fetching Data from a Batch of Files

Sometimes we need to fetch certain data from multiple files of a multi-level directory during text processing. The operation is too complicated to be well performed at the command line. Though it can be realized in high-level languages, the code is difficult to write; and the involvement of big files will increase the ...

2014-12-22 17 0 0