Functions of esProc/R/Python/Perl in Structured Data Process by Comparison:Chapter 8. 2D Table Summary

Uncategorized 698 0

 

esProc

    =[1,2,3].sum()                                       // Evaluate the sum for the Sequences

    =tbl.sum(col1)                                      // Evaluate the sum for col1 fields in Table sequence tbl

    =tbl.select(col2>30).sum(col1)        // Evaluate the sum for the record field col1 for which the col2>30 is true in Table sequence tbl

    =tbl.sum(#3)                                         // Evaluate the sum for Column 3 in Table sequence tbl

 

For other max, min, count and so on, the aggregation syntaxes are all similar. As we can see, it is far more convenient to make aggregations whether you will do this action by Column Name or by Column Number since there is a concept for the column in the Table sequence.

Perl

Summary on a 1D array is quite simple, for example:

    use List::Utilqw/sum max min maxstrminstr/;
    my @array = (10, 20, 30, 40);
    my $sum = sum @array;                   # Get 100

    my $max = max @array;                  # Get 40
    my $min = min @array;                    # Get 10

    my @array = (“Beijing”, “Shanghai”, “Guangzhou”, “Chengdu”, “Nanjing”);
    my $maxstr = maxstr @array;          # Get Shanghai
    my $minstr = minstr @array;            # Get Beijing

For 2D array, you will end up in trouble. Since there is no concept for the column in 2D array of Perl, in order to summarize on a column, you need to slice by the columns, and then start to do a summary, for example:

    my @array = ([10,11], [20,21], [30,31], [40,41]);

    @array1=map{$array[$_][1]}0..$#array;        #First slice by the columns

    my $sum = sum @$array1;                                        # Get 104

The above syntax can be merged into one statement, that is:

    my $sum = sum map{$array[$_][1]}0..$#array;              # Get 104

Although it can be solved by one statement with the map syntax, it is a little too complex to understand and less intuitive and easy to read than for loop. You can also simplify the map syntax as below:

    my $sum = sum map{$_->[1]}@array;            # Get 104

However, here -> cannot be omitted, because $_ is a pointer.

And if you want to aggregate the records according to specific criterion, e.g., the sum of second field of the record which matches with specific criterion will be evaluated, the syntax is as follows:

    my @array = ([10,11], [20,21], [30,31], [40,41]);

    @array1=map{                 #First, find the elements of Column 2 in the record whose value in Column 1 is greater than and equal to 30

                            if($array[$_][0]>=30){

                                     $array[$_][1];

                            }

    }0..$#array;

    my $sum = sum @array1;        # Then make a summary

This syntax is not too much trouble, but for the beginner who has not well understood the knowledge in terms of map, it is somewhat long and unintelligible, even less intuitive than for.

Python

Python can achieve a column-based slice on a 2D array with zip function in order to facilitate the column-based summary, which runs the source codes simpler than that of Perl, for example:

    #coding=gbk

    a=[1,2,3,4,5,5,6,7,8,8,9,0]

    value=sum(a)            #Make a summary on a 1D array

    a=[[1,2],[3,4],[5,6],[7,8]]

    b,c=zip(*a)                 #Column-based slice, b=[1,3,5,7], c=[2,4,6,8]

    value=max(b)            #Make a summary right after slice

    value=min(c)

As can be seen from the above source codes, although the use of zip function facilitates the column-based aggregation, this function can only be used to perform the operation on the whole columns other than one or two columns, so that the efficiency of zip function will be lowered when a 2D array is much wider. Therefore, it is suggested that you had better be honest to write a loop statement for making a summary if the 2D array is wider and the columns to be counted are relatively lesser, the reference code is shown as below:

    a=[[1,2],[3,4],[5,6],[7,8]]

    value=0

    for i in range(0,len(a)):

         value+=a[i][0]

R

In terms of 2D table summary, R is more powerful than other languages. In addition to more extensive summary functions, it also supports to make a summary by the rows, for example:

    rowSums(dd2)                    # Evaluate the sum by the rows

    colSums(dd2)                      #Evaluate the sum by the columns

    rowMeans(dd2)                 #Evaluate the average value by the rows

    colMeans(dd2)                   #Evaluate the average value by the columns

When making the summary by the rows, the premise is that the data types of all columns are all numeric values.

R language features the summary by rows, but for other languages, it can be achieved only by writing its own loop statement.

esProc_structured_data_process

FAVOR (0)
Leave a Reply
Cancel
Icon

Hi,You need to fill in the Username and Email!

  • Username (*)
  • Email (*)
  • Website