由两列生成sql会话id

lsmepo6l  于 2021-07-24  发布在  Java
关注(0)|答案(1)|浏览(276)

我用sql为一个包含用户、组和事件时间的表生成会话id。会话定义为10分钟的窗口。我当前的实现生成会话ID;但是,有一个警告:可能有属于多个组的用户,而这不会反映在会话id分类中

架构示例:

userid |  group  | event_time
001        A      2020-06-20 02:04:50.000
001        A.     2020-06-20 02:06:12.000
001.       A      2020-06-20 02:17:16.000
001.       B.     2020-06-20 02:20:10.000
001.       A.     2020-06-20 02:28:13.000
002.       A.     2020-06-20 04:13:97.000

sql代码段:

tmp_table AS (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY
                userid
            ORDER BY
                event_time
        ) AS user_row,
        LAG(userid) OVER (
            PARTITION BY
                userid
            ORDER BY
                event_time
        ) AS prev_user,
        LAG(event_time) OVER (
            PARTITION BY userid
            ORDER BY
                event_time
        ) AS prev_action
    FROM table
    ORDER BY
        userid,
        event_time
)

SELECT
    *,
    CASE
        WHEN prev_user = user_row AND DATE_DIFF('minute', prev_action, event_time) < 10
            THEN LAG(user_row) OVER (
            PARTITION BY
                userid
            ORDER BY
                user_row
        )
        ELSE user_row
    END AS session_id
FROM tmp_table

然而,这会产生

userid |  group  | event_time             | session_id
001        A      2020-06-20 02:04:50.000.  1
001        A.     2020-06-20 02:06:12.000.  1
001.       A      2020-06-20 02:17:16.000.  2
001.       B.     2020-06-20 02:20:10.000.  2
001.       A.     2020-06-20 02:28:13.000.  2
002.       A.     2020-06-20 04:13:97.000.  1

应该什么时候

userid |  group  | event_time             | session_id
001        A      2020-06-20 02:04:50.000.  1
001        A.     2020-06-20 02:06:12.000.  1
001.       A      2020-06-20 02:17:16.000.  2
001.       B.     2020-06-20 02:20:10.000.  1
001.       A.     2020-06-20 02:28:13.000.  3
002.       A.     2020-06-20 04:13:97.000.  1

因为userid001同时属于a和b,在a和b中发生的事情是相互独立的。

mlmc2os5

mlmc2os51#

您可以简化会话的计算。只需查看每个事件的前一个事件时间 userid / group 组合。当差值大于或等于10时,新会话开始:

WITH tmp_table AS (
      SELECT t.*,
             LAG(event_time) OVER (PARTITION BY userid, group ORDER BY event_time) as prev_event_time      
      FROM table t
    )
SELECT t.*,
       SUM(CASE WHEN DATE_DIFF('minute', prev_event_time, event_time) < 10
                THEN 0 ELSE 1
           END) OVER (PARTITION BY userid ORDER BY event_time)
FROM tmp_table t;

我不知道你的代码应该如何工作。但我希望 grouppartition by 如果你想重新开始每组的计数。

相关问题