A = LOAD '/path/text*.txt' AS (lines:chararray);
B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)lines)) AS words;
C = FOREACH B GENERATE FLATTEN(TOKENIZE(REPLACE(words,'','|'), '|')) AS letters;
D = FILTER C BY (letters matches '.*(a|b|c).*');
E = GROUP D BY letters;
F = FOREACH E GENERATE group,COUNT(D);
DUMP F;
1条答案
按热度按时间q5iwbnjs1#
将使用通配符*的所有文件加载到chararray类型的字段中。将行拆分为单词,然后再拆分为字母并计数。