Date and time data has its own characteristics for analysis and handling. In this article let’s look at how to perform date and time handling in esProc.

Usually the date and time data is entered or displayed as strings. With esProc, you can click Tool>Option to set the default format for date and time data on the **Environment** tab. For example:

Then the date and time data will be displayed as the default format in an esProc cellset. Here’s an example:

A | |

1 | =now() |

A1 gets result as follows:

The **now()** function is often used to obtain the current system date and time in date and time handling.

When entering a date/time/datetime constant, enter it in the default format as a string. esProc will automatically parse the constant into date and time data. For example:

A | B | C | |

1 | 02/01/2016 | 12:45:30 | 02/01/2016 10:30:00 |

A1, B1 and C1 will be parsed into the date data, time data and datetime data respectively, as shown below:

If the data is string type, instead of a constant directly entered in esProc, use **date()** function, **time()** function or **datetime()** function to convert the string data to the date data, time data or datetime data. For example:

A | B | C | |

1 | 2016 | 2 | 20 |

2 | =B1/”/”/C1/”/”/A1 | =12/”:”/22/”:00″ | =A2+” “+B2 |

3 | =ifdate(A2) | =iftime(B2) | =ifdate(C2) |

4 | =date(A2) | =time(B2) | =datetime(C2) |

5 | =ifdate(A4) | =iftime(B4) | =ifdate(C4) |

Below are the strings in A2, B2 and C2:

The code in the third line uses **ifdate()** function and **iftime()** function to check if the strings in the second line have been converted to the date data or time data or datatime data. Note that **ifdate()** function is used to find if a value is a date type or a datetime type. A3, B3 and C3 respectively have the following results:

In the fourth line, strings are converted into the date data, time data and datetime data according to the specified formats. Here’re the results:

In the fifth line, same functions are used to check if the values in the fourth line are date type, time type or datetime type. A5, B5 and C5 get results as follows:

For the external data, sometimes you need to handle various formats of date data, time data, and datetime data. In this case, you can add a display format string after a string when using **date()** function, **time()** function or **datetime()**function to convert the data type. For example:

A | B | C | |

1 | Feb 2, 2016 | ‘2:30:45 PM | 2016-2-20 2:30:45 PM |

2 | MMM d,yyyy | h:m:s a | yyyy-M-d h:m:s a |

3 | =date(A1,A2) | =time(B1,B2) | =datetime(C1,C2) |

Values in the first line are not in the default date, time and datetime formats. The value in B1 adds the character ‘ at the beginning to show that this is a string constant. Below are values of A1, B1 and C1:

The second line lists the display format strings for the values in the first line. Then in the third line, functions for converting values to the date data, time data and datetime data specify the display formats for them. Below are the results of A3, B3 and C3:

With the type conversion, values will be shown in the default format when being viewed. For more details about the display formats of the date and time data, see ** Data Display Formats in esProc**. By defining the display format with the

A | B | C | |

1 | 02/01/2016 | 12:45:30 | 02/01/2016 10:30:00 |

2 | MMMM d,yyyy | h:m:s a | MMM d,yyyy h:m:s a |

3 | =string(A1,A2) | =string(B1,B2) | =string(C1,C2) |

A3, B3 and C3 convert the date, time and datetime data into the string data:

You can also directly change the default display format for the date and time data as required.

When using **date()** function, **time()** function and **datetime() **function to generate data of the corresponding types, you can directly specify the quantities for the year, month, day, hour, minute and second respectively. For example:

A | B | C | |

1 | =date(2016,2,20) | =time(13,5,0) | =datetime(2016,2,29,13,5,0) |

Below are the result of A1, B1 and C1:

Pay attention to the proper and reasonable range of each of the above integers when assigning quantities to them. For instance, the value range of hour is 0~23.

The date data, time data and datetime data holds a lot of information, like the year, month, day, hour, minute and second. esProc provides functions such as year(), month(), day(), hour(), minute(), second(), and millisecond() to get values of these parts. For example:

A | B | C | |

1 | 02/21/2016 | 12:45:30 | =now() |

2 | =year(A1) | =month(A1) | =day(A1) |

3 | =hour(B1) | =minute(B1) | =second(B1) |

4 | =month(C1) | =hour(C1) | =millisecond(C1) |

Below are the date, time and datetime data in A1, B1 and C1:

A2, B2 and C2 obtain the value of each part from the date value:

A3, B3 and C3 obtain the value of each part from the time value:

A4, B4 and C4 obtain the values of month, hour and millisecond from the datetime value of the **now()** function:

As can be seen, the **now()** function returns a result accurate to the millisecond. But you can make it return results of different degrees of accuracy by adding different options. For example:

A | B | C | |

1 | =now@d() | =now@t() | |

2 | =now@m() | =now@s() | =millisecond(B2) |

A1 adds **@d** option to get only the date part, and B1 adds **@t** option to get merely the time part. Below are their results:

A2 uses **@m** option to have a result accurate to minute, and B2 uses **@s** option to get a result accurate to second. Below are the results:

You can see from the result of C2 that the value of millisecond part of B2 is 0:

Both **@m** option and **@s** option can be used to set the degree of accuracy as minute or second by **datetime()** function and **time()** function in conversion to datetime data and time data.

You can obtain the time part from date data, and get date part from time data. For example:

A | B | C | |

1 | 02/21/2016 | 12:45:30 | |

2 | =hour(A1) | =minute(A1) | =second(A1) |

3 | =year(B1) | =month(B1) | =day(B1) |

Below are the results of A2, B2 and C2:

It can be seen that the default time in a date value is 00:00:00.

Below are the results of A3, B3 and C3:

So you can see that the default date in a time value is January 1, 1970.

Apart from getting the component values directly from date/time/datetime data, there’re also some esProc functions for getting date information.

You can use **@w** option in the **day()** function, which gets the date, to find the ordinal number of the given date in the week to which it belongs:

A | B | C | |

1 | 02/14/2016 | 02/17/2016 | 02/20/2016 |

2 | =day@w(A1) | =day@w(B1) | =day@w(C1) |

3 | =string(A1,”EEEE”) | =string(B1,”EEEE”) | =string(C1,”EEEE”) |

AA2, B2 and C2 find the ordinal numbers of the given dates in their respective weeks. The results are as follows:

According to the results, a week begins on Sunday. To give you a clearer view, the code in the third line determines the day of the week for each of the given dates with a display format string:

The **pdate()** function can work with different options to get the corresponding dates:

A | B | C | |

1 | 08/17/2015 | ||

2 | =pdate@w(A1) | =pdate@m(A1) | =pdate@q(A1) |

3 | =pdate@we(A1) | =pdate@me(A1) | =pdate@qe(A1) |

Use **@w** option in **pdate()** function to get the date of the first day of the defined week that begins with Sunday; use **@m** option to get the date of the first day of the defined month; use **@q** option to get the date of the first day of the defined quarter; and use **@e** option with any of the other options to get the date of the last day of the defined time period, like a week and a quarter. Below are the respective results of A2, B2, C2, A3, B3 and C3:

The **days()** function by default calculates the number of days in the month to which a certain date belongs. Along with **@q** option, the function can calculate the number of days in the quarter to which the date belongs; and by working with **@y** option, it calculates the number of days in the year to which the date belongs. For example:

A | B | C | |

1 | 2/21/2016 | ||

2 | =days(A1) | =days@q(A1) | =days@y(A1) |

Below are the results of A2, B2 and C2:

Besides obtaining information directly from the date and time data, you can also use them to perform various operations in esProc. The most common date handling operation is to calculate age:

A | B | C | |

1 | 3/30/1995 | =now@d() | |

2 | =age(A1) | =age@m(A1) | =age@y(A1) |

The values of A1 and A2 are as follows:

The **age()** function in the second line calculates the age according to A1’s date of birth and the current date. By default the function returns a result accurate to the day, but it will return one accurate to the month or the year with **@m** option or **@y** option added. With different degrees of accuracy, the obtained age could be different. The results of A2, B2 and C2 are as follows:

What the **age()** function gets is similar to calculating the number of years between the date of birth and the current date. But to find the interval between two dates, the **interval()** function is more widely used. It by default finds the number of days between two dates, but, by working with options such as **@y**, **@q**, **@m**, **@s** and **@ms**, it can find the number of years, quarters, months, seconds and milliseconds between the two dates. For example:

A | B | C | |

1 | 3/30/1995 | 2/15/2016 | |

2 | =interval(A1,B1) | =interval@y(A1,B1) | =B1-A1 |

If you just need to find the number of days between two dates, do it through subtraction. Below are the results of A2, B2 and C2:

By the way, each date/time/datetime value can be converted to a long integer, which is in effect the number of milliseconds between the value itself and 0:00:00 January 1, 1970 GMT. For example:

A | B | |

1 | =datetime(“1/1/1970 0:00:00 GMT”,”m/d/yyyy H:mm:ss z”) | 3/30/1995 |

2 | =interval@ms(A1,B1) | =long(B1) |

B2 uses **long()** function to convert the date data to a long integer directly. Both A2 and B2 get the same result:

The **after( t,k)** function finds the datetime value which is

A | B | C | |

1 | 3/30/1995 | ||

2 | =after(A1,10) | =after@y(A1,20) | =after@m(A1,-1) |

A2, B2 and C2 respectively find the date ten days later, the date 20 years later, and the date one month before. Below are their results:

To find the date which is certain days before or after a given date, do it in the simple way of subtraction, such as =A1+10，=A1-10.

Since not all months have the same number of days, the function for getting the date which is ** k** months after a given date will by default check if the resulting date is the last day of the month to which it belongs and modify it to the last day if it isn’t. Use

A | B | C | |

1 | 2/29/2016 | =after@m(A1,3) | =after@me(A1,3) |

Below are results of B1 and C1:

As February 29, 2016 is the last day of the month, B1 also gets the last day of May which is 3 months later. But by using **@e** option, C1 only finds the date which is exactly 3 months after the given date without making any adjustment.

Generally it’s inconvenient to compare two datetime values in a direct way. But esProc offers **deq()** function make the comparison. If the dates in the two values are the same, then the two values are regarded as equal. Options including **@y**, **@q**, **@m**, **@t** and **@w** can be added to set the degree of accuracy as the year, the quarter, the month, the period of ten days and the week. For example:

A | B | C | |

1 | 2/15/2016 12:05:00 | 2/15/2016 18:45:20 | 2/29/2016 12:05:00 |

2 | =deq(A1,B1) | =deq(A1,C1) | =deq@m(A1,C1) |

Below are the values of A2, B2 and C2:

Another type of date and time handling is about the workdays. The **workday( t,k,h)** function finds the date that is the

A | B | C | |

1 | 12/31/2015 | 1/14/2016 | 1/20/2016 |

2 | [1/1/2016,1/18/2016] | ||

3 | =workday(A1,2,A2) | =workdays(B1,C1) | =workdays(B1,C1,A2) |

A2 sets a sequence of public holidays in January, 2016, including the New Year’s Day and the Martin Luther King Day. Below is A3’s result:

Because the January 1, 2016 is the New Year’s Day and the following January 2 and 3 are weekends, the second workday after December 31, 2015 is January 5, 2016.

Below are the results of B3 and C3:

Without specifying a sequence of dates of adjusted workdays and holidays, January 18, 2016 is still a workday.

Though the **workdays()** function is used to generate a sequence of workdays, another function **periods( s,e,i)** is more commonly used to generate a sequence of date, time or datetime values between the given starting date

A | B | C | |

1 | 1/14/2016 | 1/20/2016 | 4/1/2016 |

2 | =periods(A1,B1) | =periods@m(B1,C1) | |

3 | =periods@xm(B1,C1) | =periods@om(B1,C1) | =periods@oxm(B1,C1) |

Below are the results of A2 and B2:

The unit of time interval in A2 is the day, and the that in A3 is the month.

A3 discards the ending date using **@x** option, B3 won’t automatically adjust the generated date into the first day of the month with **@o** option, and C3 use both options. Below are the results of A3, B3 and C3:

In data analysis, comparison operations are employed to check whether a datum is greater/less or equal to another datum, as well as to perform operations including query, select, sort and group. Here we discuss the uses of comparison operations in esProc and solve possible related problems.

esProc supports various data types such as integer, long integer, floating point number, big decimal, boolean, string, date, time and datetime. We can compare datum ** a** and datum

A | B | |

1 | =cmp(1234,1243) | =cmp(3.145,3.142) |

2 | =cmp(6/2,pi()) | =cmp(true,true) |

3 | =cmp(“New Jersey”,”New York”) | =cmp(date(2016,3,1),date(2016,2,29)) |

Generally the comparison happens between data of the same type. But, integers, long integers, floating point numbers and big decimals are all real numbers, so they can be compared freely with each other. Below are results of A1, B1, A2, B2, A3 and B3:

If the value of** cmp( a,b)** is

If the value of** cmp( a,b)** is

If the value of** cmp( a,b)** is

Similarly, **cmp( a,b)>=0 **means

**cmp( a,b)<=0 **means

**cmp( a,b)!=0 **means

As can be seen from the cellset results in the above, comparing real numbers is comparing their values. Also, the **cmp()** function can compare the results of two expressions. For string comparisons, the function will compare the ASCII values of each pair of characters in order until a pair with different values appears. The comparison is irrelevant to the lengths of the strings. If all their characters are identical, the two strings are equal. In comparing two date/time/datetime values, the one located at a later point in a timeline is greater than the other located at an early point.

The above cellset values can be expressed using the comparison operators in the following ways:

A | B | |

1 | =1234<1243 | =3.145>3.142 |

2 | =6/2<pi() | =true==true |

3 | =”New Jersey”<“New York” | =date(2016,3,1)>date(2016,2,29) |

Expressions like ** a>b** and

The date/time/date/time data needs to be compared in a special way. If we just want to find if the data is in the same day or the same month, use the **deq()** function. For example:

A | B | |

1 | =deq@y(date(2016,3,1),date(2016,2,29)) | =deq@m(date(2016,3,1),date(2016,3,15)) |

With **@y** option, the function determines that the two objects are equal as long as they are in the same year. Using **@m** option, it finds whether the two objects are in the same month. Below are results of A1 and A2:

Except various types of real numbers, esProc forbids comparisons between different types of data. For example:

A | |

1 | =cmp(“New Jersey”,1234) |

In this case the error information appears and the computation terminates:

To compare data having different data types, first a type conversion should be performed. For example:

A | B | |

1 | =cmp(“New Jersey”,string(1234)) | =cmp(long(date(2016,3,1)),12345678987654L) |

2 | =cmp(true,bool(1)) | =cmp(true,bool(“New Jersey”)) |

For type conversion, a date/time/datetime value can be converted to a long integer, which indicates the number of milliseconds between the value itself and 0:00:00 January 1, 1970 GMT. Any real number can be converted to a boolean value **true**; and except the string “**false**” that will be converted to **false**, any other string will be converted to **true**.

A sequence has members. By comparing members of two sequences, we can perform locate, select and sort operation. For example:

A | B | |

1 | [Rebecca Moore,Ashley Wilson,Rachel Johnson,Ryan Williams,Richard] | |

2 | =A1.pos(“Ashley Wilson”) | =A1.select(~>”Re”) |

3 | =A1.sort() | =A1.sort(right(~1)) |

A2 locates the position of Ashley Wilson in the sequence. B2 selects members that are greater than “Re”. Below are results of A2 and B2:

A3 sorts members of the sequence in ascending order. B3 sorts the members by comparing the last letters of members. Below are results of A3 and B3:

The sequence comparison is similar to string comparison. Members of two sequences will be compared in alignment until different members are found. The result of comparing these two different members is the result of sequence comparison. If every two members with the same position are equal, then the two sequences are equal. For example:

A | B | C | |

1 | [1,2,3,4,5] | [1,2,3,5] | [5,3,2,1,4] |

2 | =cmp(A1,B1) | =cmp(A1,C1) | |

3 | =A1<B1 | =A1==C1 | =A1.eq(C1) |

A2 and B2 compare the two sequences using **cmp()** function. Here are the results:

In comparing the sequences in A1 and B1, their first three members are same, but the fourth members are respectively 4 and 5. So the returned result is -1, without the need of considering their lengths. Members in both A1 and C1 are numbers from 1 to 5, but they have different orders. They are regarded as unequal when compared.

As with single value comparison, the relationship of two sequences ** A** and

To determine if sequence ** A** and sequence

Since a sequence allows its members to use different data types, members of the two sequences in the same position need to have comparable data types. For example:

A | B | C | |

1 | [1,2,3,4,5] | [1,3,two] | [1,two,3] |

2 | =cmp(A1,B1) | =cmp(A1,C1) |

A2’s operation works normally because the comparison finishes at the second members. But when trying to compare A1 and C1, because their second members **2** and **two** are of different data types, they can’t be compared and the error information appears to terminate the computation:

We can perform operations such as locate, select and sort on a sequence by comparing each of its members with a given condition. In the same way, we can perform these operations on a table sequence by doing the same with the records. In data processing, the comparison of records usually involves only certain fields. For example:

A | B | |

1 | $ select NAME,ABBR,POPULATION from STATES | =A1.sort(POPULATION:-1) |

2 | =B1(1) | =B1(2) |

3 | =cmp(A2.POPULATION,B2.POPULATION) | =cmp(A2.ABBR,B2.ABBR) |

4 | =cmp(A2,B2) | =A1.sort(~) |

A1 and B1 contain respectively a table sequence generated from the demo database and a record sequence sorted by population in descending order:

A2 and B2 respectively retrieve the state record with the largest population and the one with the second largest population:

Actually B1 sorts records by comparing every two of the records’ POPULATION field values. A3 and B3 compare the POPULATION fields and ABBR fields of the two retrieved records. Here’re the results:

As can be seen, different results may be obtained by comparing different fields of the same two records.

A4 compares the two records themselves. B4 sorts A1’s records in ascending order by comparing every two of them. Their results are as follows:

That A4 returns a result means records themselves can be compared. But by examining B4’s result, we can see that records aren’t compared according to any of the fields and the result seems disordered. In fact, without specifying the sorting field(s), record sorting makes no sense and simply causes disorder. esProc, however, compares two records in a table sequence based on their intrinsic hash values.

Though it’s meaningless to purely compare two records, we can still use the comparison to check whether they are equal from a particular point of view so that we can realize some operations like grouping. For example:

A | B | |

1 | $ select STATEID,NAME,ABBR from STATES | $ select CID,NAME,STATEID as STATE from CITIES |

2 | =B1.switch(STATE,A1) | |

3 | =A2.group(STATE) | =A3.new(STATE.STATEID:SID,STATE.ABBR:State, ~.(NAME):Cities,~.count():Count) |

4 | =A2.group(STATE.ABBR) | =A4.new(STATE.STATEID:SID,STATE.ABBR:State, ~.(NAME):Cities,~.count():Count) |

A1 and B1 retrieve the state information and city information separately from the demo database. A2 associates the STATE field with the records in the *states* table sequence using the **switch** function. Here’s A2’s result:

A3 groups the city records by state to learn more about the cities in each state. The group operation is performed by directly comparing the state records. In B3, A3’s grouping result is presented clearly and in detail, as shown below:

Because the grouping is carried out by simply comparing the records according to their intrinsic hash values, the result is chaotic but is achieved faster.

To make the grouping result ordered, we should specify the grouping criterion. For example, A4 groups records by ABBR field. Its result is presented by B4 as follows:

With the grouping criterion, the resulting records will be ordered by a certain desired field, like the state abbreviation.

]]>esProc provides **regex()** function to match a string or a string field value in a record sequence using a pattern defined by the regular expression. The aim is to analyze and examine the string in order to find the pattern in it and replace it. Here we look at the uses for regular expression in esProc.

A regular expression is a string specifying a pattern. Its most basic use is to match a string ** s** with the regular expression

A | B | C | |

1 | =”a12b”.regex(“(a[0-9])”) | =”a12b”.regex(“(a[0-9]*)”) | =”a12b”.regex(“[0-9]b”) |

2 | =”a12b”.regex(“\\S*([0-9][a-z])”) | ‘\S*([0-9]a) | =”a12b”.regex(B2) |

Here’re the results of A1, B1and C1:

The regular expression used in A1 is “(a[0-9])”. **a** is the literal character a, [0-9] matches any single character in the range 0-9, the parentheses () specify that a string with the pattern “letter a plus a one digit” will be retrieved and returned. The result is a sequence comprising one member a1. In B1the * following [0-9] matches the preceding character consecutively and repeatedly, which here means matching any number of characters between 0 and 9 appearing in a row. So B1 returns a result of a12. C1 searches for a string starting with a number followed by letter b. But a12b doesn’t start with a number, and they don’t match. Thus the returned result is a null.

The regular expression A2 uses is “\\S*([0-9][a-z])”, in which **[a-z]** stands for any character of letters from a to z, i.e. any lower-case letter, and **\S*** stands for any number of non-whitespace characters appearing in a row. Since the \ also means the escape character in a string, it needs to be escaped to be a literal – that is the \\S*. The parentheses specify that the returned substring is the match of [0-9][a-z] and the match of **\S***be discarded. B2 is the string constant representing the regular expression in A2, without needing to use an escape character. C2 thus gets the same result as A2 does. Below are their values:

esProc uses **()** to define the substring to be returned when matching a string with the regular expression. The returned result is a **sequence** of the members in the parentheses. Without the parentheses, the string itself will be returned if the matching is successful.

By using **@c** option, the **regex()** function becomes case-insensitive in string matching according to regular expression. For example:

A | B | C | |

1 | =”a12b”.regex@c(“(A[0-9])”) | =”a12b”.regex@c(“([A-Z][0-9])”) | =”a12b”.regex(“([A-Z][0-9])”) |

Below are the results of A1, B1 and C1:

A1’s regular expression includes the uppercase letter A, and B1’s matches any uppercase letter, both return a result of **a1** because **regex()** function adds the **@c** option. But, C1 returns a **null** because the function works alone without the **@c** option and can’t find a match for the regular expression.

In short, a sequence consisting of the matching result will be returned when regular expression **rs** finds its match in string **s**; and **null** will be returned when it can’t find a match.

esProc allows using the universal unicode symbols to display non-English characters in regular expression so that the character set setting can’t intervene. To do that, remember adding **@u** option to the **regex()** function. For example:

A | B | |

1 | =”Gerente de Fábrica”.regex(“.* (.*á.*)”) | =”Gerente de Fábrica”.regex@u(“.* (.*\\u00e1.*)”) |

The regular expression .* (.*á.*) in A1 includes a dot **.** and the **.***. The **.** represents any single character except carriage return and new line; **.*** represents any number of random characters. A1’s regular expression finds the last word substring containing the character á. So does B1’s. But the **regex()** function in B1 adds **@u** option to display á as \u00e1. Usually the **regex@u()** is used to parse a string on the outside, keeping the regular expression from being affected by the character set setting. Both A1 and B1 get the same result:

A regular expression comprises literal characters and metacharacters that have special meanings. In the regular expression (a[0-9]) in the previous section’s first example, **a** is a literal character that matches itself while both [0-9] and () are metacharacters that respectively match one digit and return one matching result.

Table of common esProc metacharacters:

There are also other metacharacters, such as **\t** (that matches the tab character), **\W** (that matches a non-word character) and **\s** (that matches a whitespace character). But they are no so often used in esProc.

These metacharacters are used in regular expressions for string matching. For example:

A | B | C | |

1 | ‘(\+?[1-9]\d*)$ | =”15432″.regex(A1) | =”-132″.regex(A1) |

2 | =”3.14″.regex(A1) | =”0032″.regex(A1) | =”123b”.regex(B2) |

In the regular expression in A1, **\+** means a literal plus sign, and followed by a question mark **?** the plus sign is allowed to be missing; **[1-9]** denotes a non-zero digit; **\d*** indicates any number of digits appearing in row; and the **$** at the end asserts the end of a line, meaning there are no extra characters after finishing the matching. The regular expression matches a string of positive integer and returns it through **()**. So, only B1has a matching result while C1, A2, B2 and C2 which respectively contain a negative number, a decimal, a string beginning with a zero and a string with a non-digit character can’t find a match and thus return nulls. B1’s result is as follows:

Here’s another example:

A | |

1 | =”Tom and Jerry”.regex(“.*(\\w{5}).*”) |

In A1’s regular expression .*(\\w{5}), **.*** represents any number of characters following one after another, and **\w{5}** finds and returns a match of 5-letter word which using **()**. A1 returns the first 5-letter word in the string as follows:

To make the matching more precisely, the regular expression should be written as “.*(\\b\\w{5}\\b).*”, where the word boundaries are matched to strictly specify a 5-character word.

Apart from using a regular expression in string matching to get one member, esProc also uses one to generate a sequence of multiple members from a string, which are returned separately through **()**. For example:

A | |

1 | =”2016-1-30″.regex(“(\\d{4})-(\\d{1,2})-(\\d{1,2})”) |

2 | =”Jerry”.regex(“(.)(.)(.)(.)(.)”) |

3 | =”Tom and Jerrry”.regex(“(\\w*) (\\w*) (\\w*)”) |

A1 extracts the year, the month and the day from a date string. A2 splits a word to form a sequence composed of 5 characters. A3 extracts three words from a string. Below are their results:

If the target string has its own format, as with the case of A1, we can split it conveniently according to the regular expression. If we don’t know how many members there will be in the resulting sequence when dividing a string or a word, we can use a loop statement. For example:

A | B | C | |

1 | Do one thing at a time, and do well. | [] | =A1.words() |

2 | for len(A1)>0 | =A1.regex(“(\\w*)[ ,.]+(.*)”) | >B1=B1|B2(1) |

3 | >A1=B2(2) |

In B2, the regular expression (\\w*)[ ,.]+(.*) divides A1’s string into two parts – one word and the rest without the comma and whitespaces – over each loop. A2 runs the loop until the splitting process finishes, and C2 adds each extracted word to B1. Finally B1 gets a result as follows:

An alternative esProc method of extracting words from a string is using the **words()** function. So we can see the same result obtained by C1.

In addition to parsing a single string, esProc also allows analyzing a sequence of strings with the regular expression. For example:

A | |

1 | [Rebecca Moore,Ashley Wilson,Rachel Johnson,Ryan Williams,Richard] |

2 | =A1.regex(“R.* .*”) |

3 | =A1.regex(“(R.*) .*”) |

A2 uses **A**.regex() function to perform the regular expression matching on each member of A1. The regular expression R.* .* specifies a matching string that begins with R and contains a whitespace, but it doesn’t use the **()** to define the returned string. In this case, A2 will return all matching name strings in A1 that begin with R and contain a whitespace. The operation is similar to **A**.select(), which returns the members satisfying the condition specified by a regular expression. A3 has a seemingly similar regular expression where the **()** specifies that the first word be returned. Therefore, a new table sequence made up of the selected strings will be created. Below are the target sequence, A2’s result and A3’s result separately:

The **regex()** function can be also used to parse the string field values of a table sequence. For example:

A | B | |

1 | [Rebecca Moore,Ashley Wilson,Rachel Johnson,Ryan Williams,Richard] | =A1.new(#:ID,~:Name) |

2 | =B1.regex(“R.* .*”,Name) | |

3 | =B1.regex(“(R.*) .*”,Name) | =B1.regex(“(R.*) (.*)”,Name;Firstname,Surname) |

B1 generates a table sequence as follows:

The absence of **()** in A2’s regular expression enables an operation that is equivalent to the **select** operation. A record sequence consisting of the records in which the Name field values match the regular expression will be returned. A3 retrieves the first word of each name to return a new table sequence. Here’re results of A2 and A3:

The regular expression in B3 divides the name into two separate words to return them together. When calling the **regex()** function, two other parameters can be used after the semicolon to specify names for the newly-generated fields. Here’s the result:

Similar to the table sequences, the **regex()** function can handle string field values stored in a cursor according to a regular expression. For example:

A | B | C | |

1 | =file(“Cities.txt”) | ||

2 | =A1.cursor@t(CID,NAME) | =A2.regex(“.*n$”,NAME) | =B2.fetch() |

3 | =A1.cursor@t(CID,NAME) | =A3.regex(“(.*in)”,NAME) | =B3.fetch() |

B2 matches the NAME field values with the regular expression specifying that a city name end with n, but without extracted item. The effect is like filtering the cursor data. C2 gets a result as follows:

The regular expression in B3 specifies extracting the letters in together with the preceding part when matching with NAME field values. The result will be a new table sequence comprising the extracted strings. Here’s C3’s result:

]]>