hive 使用SparkSQL时PARTITION BY的“RANGE BETWEEN”问题

ct2axkht  于 4个月前  发布在  Hive
关注(0)|答案(1)|浏览(50)

当我尝试在sparkSQL中使用PARTITION BY时,我有这个复杂的查询:
对于当前的每一行,使用user_id,product_id,[create_date-3day,create_date+3day]作为数据窗口,进行一些数据查询(例如LAST_VALUE())。一个关键部分是在查询时,我需要按列ORDER BY NEW_DATE进行进一步的排序,这样我就可以确保每个窗口中的数据都是按New_date排序的--这是我想要查询的真实的列。
所以一开始我的想法是使用这样的子句,它看起来像这样:
LAST_VALUE() over(PARTITION BY user_id,product_id ORDER BY create_date RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWING)
使用range between与ORDER BY组合应该可以。但是这里使用order by与range between组合在窗口中进行过滤。我需要进一步的ORDER BY来实现ORDER BY NEW_DATE
但这样的查询不起作用:
LAST_VALUE() over(PARTITION BY user_id,product_id ORDER BY create_date RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWING ORDER BY NEW_DATE)
在这个子句中使用两个orderby不会编译。我如何进一步实现这一点?或者有其他方法来实现这一点?

9cbw7uwe

9cbw7uwe1#

您可以使用model子句为窗口和计算定义不同的计算上下文。
请参见下面的代码

SQL> select *
  2  from t
  3  model
  4    /*Partition by as in window spec*/
  5    partition by (user_id, product_id)
  6    /*Dimension column is what should be used to define window size*/
  7    dimension by (create_date)
  8    /*Put original value to use in FIRST_VALUE and ORDER BY column*/
  9    measures(value_, value_*0 as new_value, new_date)
 10    rules update (
 11      new_value[any] = max(value_) keep(dense_rank first order by new_date asc)[
 12          /*Window size applied to all "measures"
 13            in the above function, including NEW_DATE*/
 14          create_date between cv(create_date) - interval '3' day and cv(create_date) + interval '3' day
 15        ]
 16    )
 17  order by 1,2,3 asc

字符串
对于此示例数据:

> SQL> create table t(
>   2   user_id number, 
>   3   product_id number, 
>   4   create_date date, 
>   5   new_date date, 
>   6   value_ number
>   7  )
>   8  /
> 
> Table T created.
> 
> SQL> 
> SQL> begin
>   2  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-17',date '2023-12-17','1');
>   3  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-16',date '2023-12-18','2');
>   4  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-15',date '2023-12-15','3');
>   5  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-14',date '2023-12-16','4');
>   6  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-13',date '2023-12-12','5');
>   7  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-12',date '2023-12-14','6');
>   8  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-11',date '2023-12-11','7');
>   9  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-10',date '2023-12-11','8');
>  10  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','0',date '2023-12-09',date '2023-12-14','9');
>  11  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-08',date '2023-12-16','10');
>  12  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-07',date '2023-12-15','11');
>  13  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-06',date '2023-12-14','12');
>  14  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-05',date '2023-12-14','13');
>  15  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-04',date '2023-12-17','14');
>  16  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-03',date '2023-12-18','15');
>  17  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-02',date '2023-12-10','16');
>  18  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-12-01',date '2023-12-09','17');
>  19  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-11-30',date '2023-12-11','18');
>  20  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('0','1',date '2023-11-29',date '2023-12-16','19');
>  21  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-28',date '2023-12-15','20');
>  22  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-27',date '2023-12-14','21');
>  23  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-26',date '2023-12-17','22');
>  24  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-25',date '2023-12-16','23');
>  25  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-24',date '2023-12-11','24');
>  26  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-23',date '2023-12-15','25');
>  27  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-22',date '2023-12-16','26');
>  28  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-21',date '2023-12-10','27');
>  29  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-20',date '2023-12-12','28');
>  30  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','2',date '2023-11-19',date '2023-12-14','29');
>  31  Insert into T (USER_ID,PRODUCT_ID,CREATE_DATE,NEW_DATE,VALUE_) values ('1','3',date '2023-11-18',date '2023-12-11','30');
>  32  commit;
>  33  end;
>  34  /
> 
> PL/SQL procedure successfully completed.


返回

>    USER_ID PRODUCT_ID CREATE_DATE             VALUE_  NEW_VALUE NEW_DATE           
> ---------- ---------- ------------------- ---------- ---------- -------------------
>          0          0 2023-12-09 00:00:00          9          8 2023-12-14 00:00:00
>          0          0 2023-12-10 00:00:00          8          8 2023-12-11 00:00:00
>          0          0 2023-12-11 00:00:00          7          8 2023-12-11 00:00:00
>          0          0 2023-12-12 00:00:00          6          8 2023-12-14 00:00:00
>          0          0 2023-12-13 00:00:00          5          8 2023-12-12 00:00:00
>          0          0 2023-12-14 00:00:00          4          7 2023-12-16 00:00:00
>          0          0 2023-12-15 00:00:00          3          5 2023-12-15 00:00:00
>          0          0 2023-12-16 00:00:00          2          5 2023-12-18 00:00:00
>          0          0 2023-12-17 00:00:00          1          3 2023-12-17 00:00:00
>          0          1 2023-11-29 00:00:00         19         17 2023-12-16 00:00:00
>          0          1 2023-11-30 00:00:00         18         17 2023-12-11 00:00:00
>          0          1 2023-12-01 00:00:00         17         17 2023-12-09 00:00:00
>          0          1 2023-12-02 00:00:00         16         17 2023-12-10 00:00:00
>          0          1 2023-12-03 00:00:00         15         17 2023-12-18 00:00:00
>          0          1 2023-12-04 00:00:00         14         17 2023-12-17 00:00:00
>          0          1 2023-12-05 00:00:00         13         16 2023-12-14 00:00:00
>          0          1 2023-12-06 00:00:00         12         13 2023-12-14 00:00:00
>          0          1 2023-12-07 00:00:00         11         13 2023-12-15 00:00:00
>          0          1 2023-12-08 00:00:00         10         13 2023-12-16 00:00:00
>          1          2 2023-11-19 00:00:00         29         27 2023-12-14 00:00:00
>          1          2 2023-11-20 00:00:00         28         27 2023-12-12 00:00:00
>          1          2 2023-11-21 00:00:00         27         27 2023-12-10 00:00:00
>          1          2 2023-11-22 00:00:00         26         27 2023-12-16 00:00:00
>          1          2 2023-11-23 00:00:00         25         27 2023-12-15 00:00:00
>          1          2 2023-11-24 00:00:00         24         27 2023-12-11 00:00:00
>          1          2 2023-11-25 00:00:00         23         24 2023-12-16 00:00:00
>          1          2 2023-11-26 00:00:00         22         24 2023-12-17 00:00:00
>          1          2 2023-11-27 00:00:00         21         24 2023-12-14 00:00:00
>          1          2 2023-11-28 00:00:00         20         21 2023-12-15 00:00:00
>          1          3 2023-11-18 00:00:00         30         30 2023-12-11 00:00:00
> 
> 30 rows selected.

相关问题