从表中选择行,其中列包含分隔符分隔的字符串,这些字符串至少与配置单元中另一组字符串中的一个字符串匹配

zphenhs4  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(319)


在表中,“app”列包含一个长字符串(例如-“aa;bb型;抄送”)。我还有另外一组字符串,它们有几个字符串('aa','bb','cc')。我想选择至少包含('aa','bb','cc')字符串之一的行。

ee7vknir

ee7vknir1#

如果测试字符串的元素数量有限,可以拆分它,并使用array\u contains()检查应用程序中是否存在每个元素。
演示:

with your_data as 
 ( --Test dataset, replace it with your table
  select stack(5, --5 tuples
                  'aa\\;bb\\;cc',
                  'dd\\;cc\\;ff',
                  'ww\\;xx\\;yy',
                  'ww\\;aa\\;yy',
                  'pp\\;bb\\;qq'                                
               ) as App
 ),

 test_string as 
 ( --This can be also a dataset, or replace it with string parameter
   --according to your requirements
 select "'aa','bb','cc'" as str2 --contains values in single quotes delimited by comma
 )

 select App from
 (
 select App, split(App, '\\;') AppArray, 
        split(regexp_replace(t.str2,"'",''),',') as TestArray2 --remove single quotes and split
   from your_data 
        cross join test_string t 
        --cross join here. This is one string example only. Rewrite join according to your dataset.
 )s
 where array_contains(AppArray, TestArray2[0]) or --First element exists
       array_contains(AppArray, TestArray2[1]) or --second
       array_contains(AppArray, TestArray2[2])    --third
 ;

结果:

OK
aa;bb;cc
dd;cc;ff
ww;aa;yy
pp;bb;qq
Time taken: 41.853 seconds, Fetched: 4 row(s)

相关问题