我试图计算子串的特定部分。我的a是正确的,但是我和b在工作上有困难。我加入了实验室的评论来帮助解释某些代码。
data = LOAD '/dualcore/orders' AS (order_id:int,
cust_id:int,
order_dtm:chararray);
/*
* Include only records where the 'order_dtm' field matches
* the regular expression pattern:
*
* ^ = beginning of string
* 2013 = literal value '2013'
* 0[2345] = 0 followed by 2, 3, 4, or 5
* - = a literal character '-'
* \\d{2} = exactly two digits
* \\s = a single whitespace character
* .* = any number of any characters
* $ = end of string
*
* If you are not familiar with regular expressions and would
* like to know more about them, see the Regular Expression
* Reference at the end of the Exercise Manual.
*/
recent = FILTER data by order_dtm matches '^2013-0[2345]-\\d{2}\\s.*$';
-- TODO (A): Create a new relation with just the order's year and month
A = FOREACH data GENERATE SUBSTRING(order_dtm,0,7);
-- TODO (B): Count the number of orders in each month
B = FOREACH data GENERATE COUNT_STAR(A);
-- TODO (C): Display the count by month to the screen.
DUMP C;'
1条答案
按热度按时间yks3o0rb1#
你可以用两种方法解决这个问题。
选项1:使用你提到的子字符串
输入
Pig手稿:
输出:
选项2:使用regex函数
输出:
在这两种情况下,我都在最终输出中包含了年份,如果您不想,请删除
FLATTEN(year)
从剧本里。