sql where not exists查询不返回结果

cpjpxq1n  于 2021-06-26  发布在  Impala
关注(0)|答案(2)|浏览(358)

我在构造一个查询并让它在impala上运行时遇到问题。我创建了以下工作查询以联接两个表:

SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens 
WHERE vcf.filter = "PASS" 
AND vcf.qual > 100 
AND vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop

现在我正在尝试编写一个查询,查找vcf.filter=“pass”和vcf.qual>100的所有变体,但没有按染色体和位置匹配的变体。
我试过这个:

SELECT * 
FROM p7dev.illumina_test, p7dev.ensembl_test
WHERE NOT EXISTS(
  SELECT * 
  FROM p7dev.illumina_test as vcf, p7dev.ensembl_test as ens
  WHERE vcf.chromosome = ens.chromosome  
  AND vcf.position BETWEEN ens.start AND ens.stop  
 )

但这并没有返回任何结果。我认为with子句可能会起作用,但如果有人能帮我理解它的工作原理,我将不胜感激。非常感谢!

egmofgnx

egmofgnx1#

试试这个。。。

SELECT * 
FROM p7dev.illumina_test vcf
WHERE NOT EXISTS( SELECT 1
                  FROM p7dev.ensembl_test as ens
                  WHERE vcf.chromosome = ens.chromosome  
                  AND vcf.position BETWEEN ens.start AND ens.stop
                 )
AND vcf.filter = 'PASS' 
AND vcf.qual > 100
ivqmmu1c

ivqmmu1c2#

因为您正在寻找与任何集合都不相关的变体,所以奇怪的是,您会形成变体和集合的交叉连接,从中过滤出行。不过,如果你真的想这样做的话,那么就应该这样做:

SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens 
WHERE vcf.filter = "PASS" 
AND vcf.qual > 100 
AND (
    vcf.chromosome != ens.chromosome
    OR vcf.position < ens.start
    OR vcf.position > ens.stop
  )

这就否定了将变量行与集成行相关联的条件。
不过,我怀疑你真正想要的是:

SELECT vcf.*
FROM
  illuminavariant as vcf
  LEFT JOIN ensembl_genes as ens 
    ON vcf.chromosome = ens.chromosome
    AND vcf.position BETWEEN ens.start AND ens.stop
WHERE
  vcf.filter = "PASS" 
  AND vcf.qual > 100
  AND ens.chromosome IS NULL

执行与第一个查询相同的联接,但作为左联接。然后,实际表示匹配的行被 ens.chromosome IS NULL 条件。它只返回variant表的列,因为关键是要查找在集成表中没有相应行的变量。

相关问题