使用etl的过滤器逻辑-(matillion或python)

kadbb459  于 2021-08-09  发布在  Java
关注(0)|答案(1)|浏览(271)

数据集:
在此处输入图像描述

email   brand   startdate   response_no     need logic for conditions to select/filter response
abc     wi      4/1/2019     1              (select)
abc     wi      9/4/2019     2           (compare with 1st),since less than 6 month, filter out)
abc     wi      11/22/2019   3           (compare with 1st), more than 6 month, select)
xyz     wi      3/2/2019     1           (select)
xyz     wi      10/23/2019   2           (compare with 1st , more than 6 month , select)
xyz     wi      11/27/2019   3           (compare with 2nd, less than 6 month , filter out)
xyz     msw     2/21/2019    1            (select)
xyz     msw     2/20/2020    2           (compare with 1st , more than 6 month , select)

根据以上数据,我需要为每封邮件和每个品牌写一个逻辑,过滤掉从之前选择的回复开始日期起6个月内的回复号。例如,对于电子邮件和品牌wi,我在2019年4月1日有第一次回复(回复1),第二次回复是在2019年9月24日(从第一次回复起5个月),所以我需要过滤掉它,下一次第三次回复是2019年11月22日(从第一次回复起超过6个月),所以不要过滤掉。如果第二次回复比第一次晚6个月,我需要避免过滤它,然后第三次回复必须与第二次而不是第一次进行比较,基本上,要过滤的检查条件应该在当前回复日期和以前的回复日期之间,每个品牌的每封邮件都没有过滤掉

omhiaaxx

omhiaaxx1#

我相信你期望得到以下结果是吗?

因此,从sql的Angular 来看,我做了以下工作(也可以用其他方法解决):

Create Table DB_NM.SCHEMA_NM.TEST (
  email varchar(255),
  brand varchar(4),  
  startdate date, 
  response_no numeric(10,0)
); 

Insert Into DB_NM.SCHEMA_NM.TEST VALUES('abc', 'wi', '2019-04-01', 1);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('abc', 'wi', '2019-09-04', 2);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('abc', 'wi', '2019-11-22', 3);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'wi', '2019-03-02', 1);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'wi', '2019-10-23', 2);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'wi', '2019-11-27', 3);         
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'msw', '2019-02-21', 1);          
Insert Into DB_NM.SCHEMA_NM.TEST VALUES('xyz', 'msw', '2020-02-20', 2);

sql语句:

Select SRC.EMAIL, 
       SRC.BRAND,
       SRC.STARTDATE
From 
(
Select EMAIL, 
       BRAND,
       STARTDATE,
       lag(STARTDATE) over (partition by EMAIL, BRAND order by STARTDATE) as PREV_DATE,
       Case When PREV_DATE Is Null Then -1 Else (-1*(datediff(month,STARTDATE,PREV_DATE))) End as DATE_DIFF_MTH, 
       Case When PREV_DATE Is Null Then '1949-01-01'
            Else Case When (-1*(datediff(month,STARTDATE,PREV_DATE))) < 6 
                      Then lead(STARTDATE) over (partition by EMAIL, BRAND order by STARTDATE)
                      Else STARTDATE End End as DATE_TO_CONSIDER
From DB_NM.SCHEMA_NM.TEST
Order By 1,2  
) P
Inner Join DB_NM.SCHEMA_NM.TEST SRC ON SRC.STARTDATE = P.DATE_TO_CONSIDER
Where P.PREV_DATE IS NOT NULL
Order BY 1,2

就像我说的,还有其他方法可以解决这个问题。希望这有帮助!
谢谢

相关问题