从“锚定表”左外部联接看到重复项

fslejnso  于 2021-05-27  发布在  Hadoop
关注(0)|答案(2)|浏览(300)

我正在尝试在5个表之间进行完整的otuer连接。我尝试使用一个“锚定表”,然后执行一个左外连接。从每个表中选择所有不同的键字段,然后从5个表中的每个表中选择左外部联接。用Hive做这个。这是我的。。。

select
COALESCE(a.key,b.key,c.key,d.key,e.key,NULL) key,
a.field_1,
b.field_2,
c.field_3,
d.field_4,
e.field_5
from (
    select distinct key from table_1
    union
    select distinct key from table_2
    union
    select distinct key from table_3
    union
    select distinct key from table_4
    union
    select distinct key from table_5
) as keyvalues
    LEFT OUTER JOIN table_1 as a on a.key = keyvalues.key
    LEFT OUTER JOIN table_2 as b on b.key = keyvalues.key
    LEFT OUTER JOIN table_3 as c on c.key = keyvalues.key
    LEFT OUTER JOIN table_4 as d on d.key = keyvalues.key
    LEFT OUTER JOIN table_5 as e on e.key = keyvalues.key

上面的代码是通用的,我有更多的领域,但这是想法。我的结果显示的是重复的键值。这是因为键不一定是这些表中的主键。所以我的结果我希望没有重复。几乎如果我能在开始做那些左外连接之前对“keyvalues”变量进行区分。

i5desfxk

i5desfxk1#

如果您喜欢联接,则行号为的子查询将执行以下操作:

with 

table1 as (
select key, col1...coln
  from
      (select key, col1...coln,
              row_number() over(partition by key order by... ) rn
       from table_1)s
 where s.rn=1
),

table2 as (
select key, col1...coln
  from
      (select key, col1...coln,
              row_number() over(partition by key order by... ) rn
       from table_2)s
 where s.rn=1
),

table3 ... and so on ...

keyvalues as( --can use UNION ALL, because subqueries are already contain unique keys
select key from table1
union all 
select key from table2
union all
... and so on
)

select
COALESCE(a.key,b.key,c.key,d.key,e.key) key,
a.col1,
b.coln,
...
from keyvalues --join with de-duped sub-queries
    LEFT OUTER JOIN table1 as a on a.key = keyvalues.key
    LEFT OUTER JOIN table2 as b on b.key = keyvalues.key
    ...
j8yoct9x

j8yoct9x2#

我会建议一种非常不同的方法,我将用3个表来说明:

select key,
       max(field1) as field1, max(field2) as field2, max(field3) as field3
from ((select key, field1, null as field2, null as field3,
              row_number() over (partition by key order by key) as seqnum
       from table1
      ) union all
      (select key, null as field1, field2, null as field3,
              row_number() over (partition by key order by key) as seqnum
       from table2
      ) union all
      (select key, null as field1, null as field2, field3,
              row_number() over (partition by key order by key) as seqnum
       from table3
      )
     ) t
group by key, seqnum;

这将为每个键的每个字段创建一个“垂直列表”。

相关问题