hive连接理解问题

pxq42qpu 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(352)

我在hive中创建了两个表，如下所示 create table test1(id string) ;

create table test2(id string);

test1的值如下所示
1
1
test2的值如下所示
1
1
当我连接这两个表时，我得到的是输出
1
1
1
1
这是使用的查询：

select a.id from test1 a,test2 b where a.id=b.id;

请帮助我预期的输出是一样的
1
1
我正在使用cloudera分发

sql hadoop Hive hdfs apache-spark

来源：https://stackoverflow.com/questions/45812801/hive-join-understanding-issue

1条答案

按热度按时间

i7uq4tfw1#

更好地使用ansi联接语法：

select a.id 
  from test1 a 
       inner join test2 b on a.id=b.id

预期的输出不能是联接的结果，因为对于每个 a.id 中的所有匹配行 a 以及 b 已选定。第一排从 a 它将是两个匹配的行 b . 第二排从 a 它也将是来自的两个匹配行 b . 一共四排。
例如，可以在join之前将distinct应用于第二个表。

select a.id 
  from test1 a 
       inner join (select distinct b.id from test2 b) b on a.id=b.id

在这种情况下，表中的每一行 a 它将是表中的单个匹配行 b .
请参阅本课程以更好地了解连接：https://www.coursera.org/learn/analytics-mysql/lecture/kydcf/joins-with-many-to-many-relationships-and-duplicates

赞(0）回复(0）举报 2021-05-29

我来回答

hive连接理解问题

1条答案

相关问题

热门标签

最新问答