用hql计算两个字段之差的范围

flmtquvp  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(297)

我需要你帮忙找张table。我在配置单元中使用hql查询来加载表。有人想把table装起来吗?
表用户

START_TIME_DATE |    END_TIME_DATE |  USER   | START_DAY_ID  | END_DAY_ID  
    (String)             (String)    (Bigint)     (Int)           (Int)
210241              231236              1         01092019      01092019
234736              235251              2         01092019      01092019
223408              021345              3         01092019      02092019

开始\u时间\u日期、结束\u时间\u日期字段表示用户在该位置的时间。这样做的目的是在不同的行中指出用户的每一个小时,只有“小时”字段中的前两个数字。
表userhour

DATE               |    HOUR    |    ID    
 (Bigint)                (String)    (Bigint)   
 01092019                   21            1
 01092019                   22            1
 01092019                   23            1
 01092019                   23            2            
 01092019                   22            3          
 01092019                   23            3            
 02092019                   00            3
 02092019                   01            3
 02092019                   02            3

目前我的查询是这个,但它不工作。我在尝试“联合所有人”

insert overwrite table USERHOUR
(select [start_time_date] ,[end_time_date]
    from user
    union all
    select [start_time_date]+1,[end_time_date]
    where [start_time_date]+1<=[end_time_date]
    )
as hour) --generate a range between start_time_date and end_time_date and before cast to Hours,
end_day_id a date,
user as id
from table USER;
s1ag04yj

s1ag04yj1#

为此,我计算了以小时为单位的差异,使用posexplode(space(hours))生成行,计算开始时间戳+(explode的位置)*3600,并从结果时间戳中提取小时和日期。
使用您的示例查看此演示:

with mydata as(--this is your data
select stack(3,
'210241', '231236', 1, '01092019', '01092019',
'234736', '235251', 2, '01092019', '01092019',
'223408', '021345', 3, '01092019', '02092019'
) as (START_TIME_DATE,END_TIME_DATE,USER,START_DAY_ID,END_DAY_ID))

select --extract date, hour from timestamp calculated
       --this can be done in previous step
       --this subquery is to make code cleaner
       date_format(dtm, 'ddMMyyyy') as DATE, 
       date_format(dtm, 'HH')       as HOUR, 
       user                         as ID
from
(
select user, 
       start, h.i, hours, --these columns are for debugging
       from_unixtime(start+h.i*3600) dtm --add hour (in seconds) to the start unix timestamp
                                         --and convert to timestamp
from
(
select user,
       --start timestamp (unix timestamp in seconds) 
       unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH') as start, 
       floor((unix_timestamp(concat(END_DAY_ID, ' ', substr(END_TIME_DATE,1,2)),'ddMMyyyy HH')-
              unix_timestamp(concat(START_DAY_ID, ' ', substr(START_TIME_DATE,1,2)),'ddMMyyyy HH')
             )/ --diff in seconds
        3600) as hours --diff in hours
  from mydata
)s
lateral view posexplode(split(space(cast(s.hours as int)),' ')) h as i,x --this will generate rows
)s
;

结果:

OK
01092019        21      1
01092019        22      1
01092019        23      1
01092019        23      2
01092019        22      3
01092019        23      3
02092019        00      3
02092019        01      3
02092019        02      3
Time taken: 3.207 seconds, Fetched: 9 row(s)

相关问题