分区上的Hive SQL

n53p2ov0  于 4个月前  发布在  Hive
关注(0)|答案(1)|浏览(68)

我有一个列的row_number()over(partition...)

row_number() OVER (partition BY customer_id ORDER BY order_date, order_number) AS order_row,

字符串
另一列使用over(partition...)计算“累计”支出

sum(spend_amount) OVER (partition BY customer_id ORDER BY order_date, order_number) as cumulative_spend,


如何在另一列中标记哪个order_row先达到$100?我想标记的是order_number 123458,order_row = 3($110.00)。
eg.

customer_number    order_date  order_number  spend_amount order_row  cumulative_spend
abcdefg            01/01/2023  123456        10.00        1          10.00
abcdefg            01/01/2023  123457        50.00        2          60.00
abcdefg            14/01/2023  123458        50.00        3          110.00
abcdefg            23/01/2023  123459        20.00        4          130.00


我尝试了以下操作,但与预期的不完全相同,我希望看到空值,然后对于每个cumulative_spend值超过100的1、2、3等,但我看到的是空值以及3和4

case when cumulative_spend >= 100 then row_number() OVER (partition BY customer_number ORDER BY order_row) end AS transaction_threshold_reached

customer_number    order_date  order_number  spend_amount order_row  cumulative_spend   transaction_threshold_reached
abcdefg            01/01/2023  123456        10.00        1          10.00              null
abcdefg            01/01/2023  123457        50.00        2          60.00              null
abcdefg            14/01/2023  123458        50.00        3          110.00             3
abcdefg            23/01/2023  123459        20.00        4          130.00             4

vaqhlq81

vaqhlq811#

只需从累积和中减去当前行的值,即可获得前一行的总和,并检查当前行是否大于100,而前一行是否小于100。
请注意,在重复的情况下,您应该使用ROWS UNBOUNDED PRECEDING,它也更快。

SELECT
  *,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date, order_number) AS order_row,
  SUM(spend_amount) OVER (PARTITION BY customer_id
      ORDER BY order_date, order_number ROWS UNBOUNDED PRECEDING) as cumulative_spend,
  CASE WHEN spend_amount >= 100
    AND SUM(spend_amount) OVER (PARTITION BY customer_id
      ORDER BY order_date, order_number ROWS UNBOUNDED PRECEDING) - spend_amound < 100
       THEN CAST(1 AS bit) END as cumulative_spend_reached_100
FROM YourTable;

字符串
也可以在CTE或子查询中执行此操作

WITH summed AS (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date, order_number) AS order_row,
      SUM(spend_amount) OVER (PARTITION BY customer_id
          ORDER BY order_date, order_number ROWS UNBOUNDED PRECEDING) as cumulative_spend
    FROM YourTable
)
SELECT
  *,
  CASE WHEN spend_amount >= 100
    AND cumulative_spend - spend_amound < 100
       THEN CAST(1 AS bit) END as cumulative_spend_reached_100
FROM summed;

相关问题