配置单元中的sql复杂查询分组

k7fdbhmy  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(232)

我有数据集在Hive表如下

date           col1      col2  col3
2016-02-01      A         X     5 
2016-02-03      A         X     5
2016-02-04      A         X     5
2016-03-01      A         X     6
2016-04-01      A         X     5
2016-04-02      A         Y     5
2016-04-03      A         Y     5

我必须对col1和col2进行选择性分组,其中每当col3值发生变化时,就形成一个组,例如col3值在第4行中从5变为6,我必须获取date列并从中获取min和max值。输出应该是这样的。

col1 col2 col3    minDate           maxDate
A     X    5     2016-02-01         2016-02-04
A     X    6     2016-03-01         2016-03-01
A     X    5     2016-04-01         2016-04-01
A     Y    5     2016-04-02         2016-04-03

我确信col1和col2上的简单分组是行不通的。有谁能提出一个实现这一目标的方法吗?

dnph8jn4

dnph8jn41#

select      col1,col2,col3
           ,min(date) as minDate
           ,max(date) as maxDate

from       (select      *
                       ,row_number () over 
                        (
                            partition by    col1,col2
                            order by        date
                        ) as rn_part_1_2

                       ,row_number () over 
                        (
                            partition by    col1,col2,col3
                            order by        date
                        ) as rn_part_1_2_3                            

            from        mytable
            ) t

group by    col1,col2,col3
           ,rn_part_1_2 - rn_part_1_2_3

order by    col1,col2
           ,minDate
;
+------+------+------+------------+------------+
| col1 | col2 | col3 |  mindate   |  maxdate   |
+------+------+------+------------+------------+
| A    | X    |    5 | 2016-02-01 | 2016-02-04 |
| A    | X    |    6 | 2016-03-01 | 2016-03-01 |
| A    | X    |    5 | 2016-04-01 | 2016-04-01 |
| A    | Y    |    5 | 2016-04-02 | 2016-04-03 |
+------+------+------+------------+------------+

相关问题