如何获取HiveSQL中每个组的最大行数

6psbrbz9 于 2021-05-29 发布在 Hadoop

关注(0)|答案(4)|浏览(582)

使用配置单元sql中的行\编号（），我可以通过在where子句中选择1来过滤重复项/选择id的第一个示例，如下所示。我需要的是如何在每个组中找到最后一个示例。

select * from 
(select c1,c2,c3,c4,c5,id, row_number() over(partition by id ORDER BY id) as seq
from 
table) as cnt where seq = 1;

我的要求是，例如，如果ID1212有3个示例，而1313在下面的表中有5个示例，我可以使用上面的查询，通过在where子句中选择1只获得一个示例。但我要1212号身份证3张，1313号身份证5张。

c1,  c2,  c3,  c4,  c5,  ID     seq
2020 2020 2020 2020 2020 1212     1
2021 2020 2021 2020 2021 1212     2
2022 2020 2022 2020 2022 1212     3
2023 2020 2023 2020 2023 1313     1
2024 2020 2024 2020 2024 1313     2
2025 2020 2025 2020 2025 1313     3
2026 2020 2026 2020 2026 1313     4
2026 2020 2026 2020 2026 1313     5

sql hadoop Hive hiveql

来源：https://stackoverflow.com/questions/51085920/how-to-get-the-maximum-row-number-for-each-group-in-hive-sql

4条答案

按热度按时间

xzabzqsa1#

添加附加列 COUNT(*) OVER (PARTITION BY id) AS cnt . 它将包含组中的行数，这也是组的最大行数值。

赞(0）回复(0）举报 2021-05-29

yshpjwxd2#

select id,max(seq) over(partition by id ORDER BY id)from 
(select *, row_number() over(partition by id ORDER BY id) as seq
from 
table)maxseq
group by id

赞(0）回复(0）举报 2021-05-29

isr3a4wc3#

使用列表中的所有列 group by 使用 max 上 row_number() ```
select c1,c2,c3,c4,c5,id,max(r_no)
from
(
select c1,c2,c3,c4,c5,id, row_number() over (partition by id ORDER BY c1,c2,c3,c4,c5,id) as r_no
from
table
) a
group by c1,c2,c3,c4,c5,id

赞(0）回复(0）举报 2021-05-29

prdp8dxp4#

将升序排序更改为降序排序：

select t.* 
from (select c1, c2, c3, c4, c5, id,
             row_number() over (partition by id ORDER BY id desc) as seqnum
------------------------------------------------------------^
      from table
    ) t
where seqnum = 1;

赞(0）回复(0）举报 2021-05-29

我来回答

如何获取HiveSQL中每个组的最大行数

4条答案

相关问题

热门标签

最新问答