sql—当联接右侧不存在任何记录时，如何运行将返回0作为计数的配置单元查询？

yzuktlbb 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(213)

我有一些时间序列数据，我想按每分钟记录数分组。这是我能得到的最接近的：

select
  2016 as year,
  9 as month,
  dhm.day,
  dhm.hour,
  dhm.min,
  count(case isnotnull(id) when true then 1 else 0 end )
from crdb.dayhourmin dhm
 left  join tracelog
  on  (cast(substr(tracestart, 20, 2) as int) = dhm.min 
  and cast(substr(tracestart, 17, 2) as int) = dhm.hour
  and cast(substr(tracestart, 9 , 2) as int) = dhm.day
  and  substr(tracestart, 12, 4)='2016'
  and substr(tracestart, 5, 3)='Sep' )      
group by      
  dhm.day, dhm.hour, dhm.min

sort by 1,2,3,4,5
``` `tracestart` 是字符串字段。对此我无能为力。这个 `dayhourmin` 表包含一个月内每一天、每小时和每分钟的一个条目：24*60*31=44640行。但是，当我运行上面的查询时，我得到了很多 `1` 值（大约5k），它告诉我在右边有很多空值——在这个数据中，每分钟记录一次是非常不可能的。如果我去掉空值检查并得到一个计数，我将得到大约39k行。九月将是43200行，真的，因为它只有30天，但我现在不必担心，我可以在excel中轻松放下最后一天。
如何编写这样的查询以便 `crdb.dayhourmin` （时间片）在输出中有一行，对于连接右侧的每一行，计数都是准确的？
这可能吗？一个脚本——循环每个可能的值并运行一个查询，它肯定比整天挥舞着这个sql要快——对我来说似乎是一个自然的选择，但这将是一个很大的问题，除非有某种方法可以从 `hive` 命令行并将输出连接到文件。

sql hadoop Hive aggregate-functions

来源：https://stackoverflow.com/questions/41175886/how-can-i-run-a-hive-query-that-will-return-0s-for-count-when-no-record-is-pres