sql-重用列中唯一值的和

qij5mzcb 于 2021-06-27 发布在 Hive

关注(0)|答案(1)|浏览(238)

我需要一个包含用户、文件名和负载的表中的shoes和hats的总和。如果重复记录定义为相同的用户、有效负载和文件名“/”后面的部分，则应忽略重复记录。在下面的示例表中，使用上述规则，记录#3是记录#2的副本。所需的结果是鞋和帽子的总和，示例如下。
示例数据

+---+------+----------+-----------+
| # | User | Filename |  Payload  |
+---+------+----------+-----------+
| 1 | A    | a/123    | Shoes = 3 |
| 2 | A    | a/123    | Hats = 2  |
| 3 | A    | b/123    | Hats = 2  |
| 4 | B    | a/123    | Shoes = 1 |
| 5 | B    | a/123    | Hats = 1  |
+---+------+----------+-----------+

预期产量

+-------+------+
| Shoes | Hats |
+-------+------+
|     4 |    3 |
+-------+------+

sql Hive hiveql

来源：https://stackoverflow.com/questions/53026580/sql-sum-of-unique-values-from-reused-column

1条答案

按热度按时间

xxe27gdn1#

Hive正好支持 substring_index() ，所以您可以：

select sum(case when payload like 'Shoes%'
                then substring_index(payload, ' = ', -1)
                else 0
           end) as num_shoes,
       sum(case when payload like 'Hats%'
                then substring_index(payload, ' = ', -1)
                else 0
           end) as num_hats
from (select t.*,
             row_number() over (partition by user, payload, substring_index(filename, '/', -1)
                                order by user
                               ) as seqnum
      from t
     ) t
where seqnum = 1;

我强烈建议您更改数据模型，不要将有效负载存储为字符串。数字应存储为数字。名称应存储为名称。如果可以避免的话，它们不应该组合成一个字符串。

赞(0）回复(0）举报 2021-06-27

我来回答

sql-重用列中唯一值的和

1条答案

相关问题

热门标签

最新问答