在红移中合并行,选择最后一个值

yqlxgs2m  于 2021-07-27  发布在  Java
关注(0)|答案(2)|浏览(225)

我有一张这样的table:

+-----+-----+--------------------------+--------+------+-----------+
| uid | sid |           url            |  name  | age  | timestamp |
+-----+-----+--------------------------+--------+------+-----------+
|   1 | a   | null                     | Fred   | null |       123 |
|   1 | a   | null                     | null   | 23   |       124 |
|   1 | a   | google.com/index.html    | null   | null |       125 |
|   1 | a   | null                     | Freddie| null |       126 |
|   1 | a   | null                     | null   | 23   |       127 |
|   1 | a   | null                     | null   | 24   |       128 |
|   1 | a   | null                     | Freddy | null |       129 |
|   1 | a   | null                     | null   | 22   |       130 |
|   1 | a   | google.com/features.html | null   | null |       131 |
|   1 | a   | null                     | Freddy | null |       132 |
|   1 | a   | null                     | null   | 24   |       133 |
|   1 | b   | google.com/index.html    | null   | null |       134 |
|   1 | b   | null                     | Freddy | null |       135 |
|   1 | b   | null                     | null   | 24   |       136 |
+-----+-----+--------------------------+--------+------+-----------+

我想把它转换成:

+-----+-----+--------------------------+--------+-----+-----------+
| uid | sid |           url            |  name  | age | timestamp |
+-----+-----+--------------------------+--------+-----+-----------+
|   1 | a   | null                     | Fred   |  23 |       123 |
|   1 | a   | google.com/index.html    | Freddy |  24 |       125 |
|   1 | a   | google.com/features.html | Freddy |  24 |       131 |
|   1 | b   | google.com/index.html    | Freddy |  24 |       134 |
+-----+-----+--------------------------+--------+-----+-----------+

从本质上说,我是在合并共享相同uid和sid的记录,但也在压缩url更改后的记录。我想保留名字的最后一个值和年龄的最大值。
这在sql/redshift中可能吗?我研究了“分组方式”和/或窗口来实现这一点,但似乎无法得到预期的结果。如果可能的话,请有人描述一下这个问题。

xzlaal3s

xzlaal3s1#

好吧,在对@gordonlinoff答案稍加修改之后,我得到了我想要的结果。查询如下所示:

SELECT
    DISTINCT group_id, 
    last_value(uid IGNORE NULLS) OVER (
        PARTITION BY group_id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
    ) AS uid, 
    last_value(sid IGNORE NULLS) OVER (
        PARTITION BY group_id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
    ) AS sid, 
    last_value(url IGNORE NULLS) OVER (
        PARTITION BY group_id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
    ) AS url, 
    last_value(name IGNORE NULLS) OVER (
        PARTITION BY group_id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
    ) AS name, 
    max(age IGNORE NULLS) OVER (
        PARTITION BY group_id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
    ) AS age, 
    min(timestamp IGNORE NULLS) OVER (
        PARTITION BY group_id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
    ) AS timestamp
FROM (
    SELECT
        t.*,
        CONCAT(
            COUNT(url) OVER (
                PARTITION BY uid, sid ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
            ),
            CONCAT(uid, sid)
        ) AS group_id 
    FROM t
) AS t_grouped 
ORDER BY timestamp ASC
irlmq6kh

irlmq6kh2#

我认为您可以使用中的值的累计计数来定义组 url . 但是,你需要一种方法来获得最后一个非- NULL 每个组中的值,您可以使用 last_value() 或者 first_value() :

select distinct
       max(url) over (partition by uid, sid, grp),
       first_value(name ignore nulls) over (partition by uid, sid, grp order by timestamp desc rows between unbounded preceding and current row) as name,
       first_value(age ignore nulls) over (partition by uid, sid, grp order by timestamp desc rows between unbounded preceding and current row) as age
from (select t.*,
             count(url) over (partition by uid, sid order by timestamp) as grp
      from t
     ) t

相关问题