如果连续行之间的差异小于10天,则生成秩/数

bsxbgnwa  于 2021-06-28  发布在  Hive
关注(0)|答案(2)|浏览(302)

需要配置单元查询来计算连续记录的日期差异,但对于相同的txn类型,如果差异小于10,则生成相同的数字,否则生成新的数字。

输入表

+--------+----------+-------------+
| Txn_id | Txn_type |  Txn_date   |
+--------+----------+-------------+
|      1 | T100     | 26-Aug-2015 |
|      2 | T100     | 03-Nov-2015 |
|      3 | T100     | 05-Dec-2015 |
|      4 | T100     | 08-Dec-2015 |
|      5 | T100     | 25-Jan-2016 |
|      6 | T111     | 26-Jan-2016 |
|      7 | T200     | 02-Feb-2016 |
|      8 | T200     | 07-May-2016 |
|      9 | T200     | 12-May-2016 |
|     10 | T200     | 20-May-2016 |
+--------+----------+-------------+

预期产量

+--------+----------+-------------+--------+
| Txn_id | Txn_type |  Txn_date   | Number |
+--------+----------+-------------+--------+
|      1 | T100     | 26-Aug-2015 |      1 |
|      2 | T100     | 03-Nov-2015 |      2 |
|      3 | T100     | 05-Dec-2015 |      3 |
|      4 | T100     | 08-Dec-2015 |      3 |
|      5 | T100     | 25-Jan-2016 |      4 |
|      6 | T111     | 26-Jan-2016 |      1 |
|      7 | T200     | 02-Feb-2016 |      1 |
|      8 | T200     | 07-May-2016 |      2 |
|      9 | T200     | 12-May-2016 |      2 |
|     10 | T200     | 20-May-2016 |      2 |
+--------+----------+-------------+--------+
wtlkbnrh

wtlkbnrh1#

使用公共表表达式标记差异超过10天的行,然后对这些行进行计数以获得新的数字。

with test_data as (
 SELECT 1 txn_id, 'T100' txn_type, to_date('26-AUG-2015','DD-MON-YYYY') txn_date from dual union all
 SELECT 2 txn_id, 'T100', to_date('03-NOV-2015','DD-MON-YYYY') from dual union all
 SELECT 3 txn_id, 'T100', to_date('05-DEC-2015','DD-MON-YYYY') from dual union all
 SELECT 4 txn_id, 'T100', to_date('08-DEC-2015','DD-MON-YYYY') from dual union all
 SELECT 5 txn_id, 'T100', to_date('25-JAN-2016','DD-MON-YYYY') from dual union all
 SELECT 6 txn_id, 'T111', to_date('26-JAN-2016','DD-MON-YYYY') from dual union all
 SELECT 7 txn_id, 'T200', to_date('02-FEB-2016','DD-MON-YYYY') from dual union all
 SELECT 8 txn_id, 'T200', to_date('07-MAY-2016','DD-MON-YYYY') from dual union all
 SELECT 9 txn_id, 'T200', to_date('12-MAY-2016','DD-MON-YYYY') from dual union all
 SELECT 10 txn_id, 'T200', to_date('20-MAY-2016','DD-MON-YYYY') from dual),
 markers as (
select td.*, 
       case when td.txn_date - nvl(lag(td.txn_date) 
           over ( partition by txn_type order by txn_id ), td.txn_date-9999) > 10 
           THEN 'Y' ELSE NULL end new_txn_marker from test_data td )
SELECT txn_id, txn_type,txn_date, 
       count(new_txn_marker) over ( partition by txn_type order by txn_id ) "NUMBER" 
FROM  markers;
qncylg1j

qncylg1j2#

不确定“少于10天”是严格不等式还是非严格不等式,否则:

with
     inputs ( txn_id, txn_type, txn_date ) as (
       select  1, 'T100', to_date('26-Aug-2015', 'dd-Mon-yy') from dual union all
       select  2, 'T100', to_date('03-Nov-2015', 'dd-Mon-yy') from dual union all 
       select  3, 'T100', to_date('05-Dec-2015', 'dd-Mon-yy') from dual union all
       select  4, 'T100', to_date('08-Dec-2015', 'dd-Mon-yy') from dual union all
       select  5, 'T100', to_date('25-Jan-2016', 'dd-Mon-yy') from dual union all
       select  6, 'T111', to_date('26-Jan-2016', 'dd-Mon-yy') from dual union all
       select  7, 'T200', to_date('02-Feb-2016', 'dd-Mon-yy') from dual union all
       select  8, 'T200', to_date('07-May-2016', 'dd-Mon-yy') from dual union all
       select  9, 'T200', to_date('12-May-2016', 'dd-Mon-yy') from dual union all
       select 10, 'T200', to_date('20-May-2016', 'dd-Mon-yy') from dual
     ),
     prep ( txn_id, txn_type, txn_date, ct ) as (
       select txn_id, txn_type, txn_date, 
              case when txn_date < lag(txn_date) over (partition by txn_type
                           order by txn_date) + 10 then 0 else 1 end
       from   inputs
     )
select txn_id, txn_type, txn_date, 
       sum(ct) over (partition by txn_type order by txn_date) as number_
from prep;

我曾经 number_ 作为列名;不要对表或列名使用甲骨文保留字,除非你的生命依赖于它,即使这样也不行。

相关问题