Hive连接具有相同列名的数据

sg2wtvxw  于 2021-04-03  发布在  Hive
关注(0)|答案(1)|浏览(687)

我有两张日线表,第一张是这样的。

| yyyy_mm_dd | x_id | feature     | impl_status   |
|------------|------|-------------|---------------|
| 2020-08-18 | 1    | Basic       | first_contact |
| 2020-08-18 | 1    | Last Minute | first_contact |
| 2020-08-18 | 1    | Geo         | first_contact |
| 2020-08-18 | 2    | Basic       | implemented   |
| 2020-08-18 | 2    | Last Minute | first_contact |
| 2020-08-18 | 2    | Geo         | no_contact    |
| 2020-08-18 | 3    | Basic       | no_contact    |
| 2020-08-18 | 3    | Last Minute | no_contact    |
| 2020-08-18 | 3    | Geo         | implemented   |

而第二种则是这样的:

| yyyy_mm_dd | x_id | payment |
|------------|------|---------|
| 2020-08-18 | 1    | 0       |
| 2020-08-18 | 2    | 0       |
| 2020-08-18 | 3    | 1       |
| 2020-08-19 | 1    | 0       |
| 2020-08-19 | 2    | 0       |
| 2020-08-19 | 3    | 1       |

我想建立一个查询,使 "payment "成为第一个表中的 "feature",由于 "payment "是一个布尔值(1/0),所以没有 "first_contact "状态。

select
    yyyy_mm_dd,
    t1.x_id
    t1.impl_status
from
    schema.table1 t1
left join(
    select
        yyyy_mm_dd,
        x_id,
        'payment' as feature,
        if(payment=1, 'implemented', 'no_contact') as impl_status
    from
         schema.table2
 ) t2 on t2.yyyy_mm_dd = t1.yyyy_mm_dd and t2.x_id = t1.x_id

然而,这样做,我将需要选择t1.impl_statust2.impl_status,因为不明确。
考虑到这一点,预期的输出将是这样的。

| yyyy_mm_dd | x_id | feature     | impl_status   |
|------------|------|-------------|---------------|
| 2020-08-18 | 1    | Basic       | first_contact |
| 2020-08-18 | 1    | Last Minute | first_contact |
| 2020-08-18 | 1    | Geo         | first_contact |
| 2020-08-18 | 1    | Payment     | no_contact    |
| 2020-08-18 | 2    | Basic       | implemented   |
| 2020-08-18 | 2    | Last Minute | first_contact |
| 2020-08-18 | 2    | Geo         | no_contact    |
| 2020-08-18 | 2    | Payment     | no_contact    |
| 2020-08-18 | 3    | Basic       | no_contact    |
| 2020-08-18 | 3    | Last Minute | no_contact    |
| 2020-08-18 | 3    | Geo         | implemented   |
| 2020-08-18 | 3    | Payment     | implemented   |
| 2020-08-19 ...
 ...
k0pti3hp

k0pti3hp1#

你可以使用 "union all"。

select yyyy_mm_dd, x_id, feature, impl_status from table1 t1
union all
select yyyy_mm_dd, x_id, 'Payment', case when payment = 0 then 'no_contact' else 'implemented' end from table2

相关问题