配置单元sql：如何按前缀计算唯一字符串

e4eetjau 于 2021-06-28 发布在 Hive

关注(0)|答案(3)|浏览(276)

想象一下这张table

text
----
h
he
hel     // All above are prefixes 
helll123   // hel is a prefix of helll123; this is the first occurrence of helll123
helll123   // second helll123 
f
fa
fals
falst0    // fals is a prefix of falst0

下面的查询是伪代码来演示我所追求的

SELECT
    unique_by_prefix(text) AS unique_text, // pseudo code
    COUNT(*)
FROM
    my_table
GROUP BY 
    1

应生成以下结果

unique_text count
helll123       2
falst0         1

基本上，我们将忽略前缀，只计算唯一的文本。

sql Hive hiveql

来源：https://stackoverflow.com/questions/39217063/hive-sql-how-to-count-unique-strings-by-prefix

3条答案

按热度按时间

xfb7svmp1#

这里有一种方法。

select distinct text into my_table1 from my_table
            alter table my_table1 add sno int identity

            create table my_table2 (text varchar(max), counter int)

            declare @i int = 0
            While (@i < (select COUNT(*) from my_table1))
            Begin
            set @i = @i + 1
            declare @text varchar(max) = (select text FROM my_table1 where sno = @i)
            insert into my_table2 values(
            (select text from my_table1 where sno = @i),
            (select COUNT(*) from my_table1 where text like @text + '%'))
            End

            select A.text, count(*) from my_table A left join my_table2 B on A.text = B.text where B.counter = 1 group by A.text

赞(0）回复(0）举报 2021-06-28

zazmityj2#

开窗功能https://cwiki.apache.org/confluence/display/hive/languagemanual+windowingandanalytics

select text, 
  lead(text) over ( order by text )  as next_text, 
  lag(text) over ( order by text )  as pre_text 
from my_table;

结果将是：

text  next_text  pre_text
h      he        NULL
he     hel       h
hel    helll123   he
helll123  helll123   hel 
helll123  f   helll123
f      NULL    helll123

然后您可以比较这些值：如果下一个文本以文本开头，则此记录不是您想要的记录，否则获取此记录。

case when instr(next_text, text) = 1 then null else text as text_u_want

然后删除null并加入我的表，就可以得到文本计数了

赞(0）回复(0）举报 2021-06-28

x7rlezfr3#

我不认为你可以在Hive里用一个查询来完成这个任务。
有一种可能性：

select text, count(*)
from t
where not exists (select 1
                  from t t2
                  where t2.text <> t.text and t2.text like t1.text || '%'
                 )
group by text;

虽然这抓住了逻辑，但我怀疑hive想要correlation子句的等式。

赞(0）回复(0）举报 2021-06-28

我来回答

配置单元sql：如何按前缀计算唯一字符串

3条答案

相关问题

热门标签

最新问答