Hive Pivot表-将行列转换示例

x33g5p2x  于2020-11-13 发布在 Hive  
字(3.1k)|赞(0)|评价(0)|浏览(4418)

许多关系数据库(如Oracle、Snowflake)支持PIVOT函数,您可以使用该函数将行转换为列。但是,Apache Hive还不支持Pivot函数。作为一种替代方法,您可以使用CASE和DECODE语句,根据您的需求将表行转换为列,或将列转换为行。在本文中,我们将用透视函数替代法来检验Hive表的转置方法的不同,并举例说明。
在数据仓库中,创建数据透视表是一项非常常见的任务。在加载到目标表之前,可能必须将行转换为列,或将列转换为行。Microsoft excel是一个流行的工具,它允许您根据自己的需求来透视列。Apache Hive还不支持Pivot或unpivot功能。在Hive中要实现同样的事情并不容易。你必须使用一种变通方法将行转换成列,反之亦然。
但是,您可以使用CASE或DECODE语句在Hive中转换行到列。

行转置到列的例子

如前所述,将行转置到列是加载到目标表之前的常见预处理需求。
在本例中,我们将使用下面的student表。

> select class,section,pass_fail,count from students;
+---------+----------+------------+--------+--+
|  class  | section  | pass_fail  | count  |
+---------+----------+------------+--------+--+
| CLASS2  | B        | FAIL       | 333    |
| CLASS2  | B        | PASS       | 543    |
| CLASS1  | A        | FAIL       | 123    |
| CLASS1  | A        | PASS       | 321    |
+---------+----------+------------+--------+--+

考虑以下示例,以将PASS和FAIL行转换为列。

SELECT class, 
       section, 
       Sum(CASE 
             WHEN pass_fail = 'PASS' THEN count 
             ELSE 0 
           END) AS PASS, 
       Sum(CASE 
             WHEN pass_fail = 'FAIL' THEN count 
             ELSE 0 
           END) AS FAIL 
FROM   students 
GROUP  BY class, 
          section; 

+---------+----------+-------+-------+--+
|  class  | section  | pass  | fail  |
+---------+----------+-------+-------+--+
| CLASS1  | A        | 321   | 123   |
| CLASS2  | B        | 543   | 333   |
+---------+----------+-------+-------+--+

Hive UNPIVOT表–将列转换为行示例

就像前面的示例一样,将列转置或转置到行也是数据仓库环境中非常常见的需求。
例如,考虑下面一个示例销售表。该表包含每月销售数量。

> select month_id,sale_qty from sales_by_month;
+-----------+-----------+--+
| month_id  | sale_qty  |
+-----------+-----------+--+
| 201601    | 897456    |
| 201603    | 267156    |
| 201602    | 232253    |
| 201604    | 265646    |
| 201606    | 265654    |
| 201606    | 123457    |
+-----------+-----------+--+

现在,让我们将月份值转置为列,将数量转置为行。

SELECT Sum(CASE 
             WHEN month_id = 201601 THEN sale_qty 
             ELSE NULL 
           END) AS m_201601, 
       Sum(CASE 
             WHEN month_id = 201602 THEN sale_qty 
             ELSE NULL 
           END) AS m_201602, 
       Sum(CASE 
             WHEN month_id = 201603 THEN sale_qty 
             ELSE NULL 
           END) AS m_201603, 
       Sum(CASE 
             WHEN month_id = 201604 THEN sale_qty 
             ELSE NULL 
           END) AS m_201604, 
       Sum(CASE 
             WHEN month_id = 201605 THEN sale_qty 
             ELSE NULL 
           END) AS m_201605, 
       Sum(CASE 
             WHEN month_id = 201606 THEN sale_qty 
             ELSE NULL 
           END) AS m_201606 
FROM   sales_by_month; 

+---------+---------+---------+---------+-------+---------+--+
|  m_01   |  m_02   |  m_03   |  m_04   | m_05  |  m_06   |
+---------+---------+---------+---------+-------+---------+--+
| 897456  | 232253  | 267156  | 265646  | NULL  | 389111  |
+---------+---------+---------+---------+-------+---------+--+

Apache Hive中的替代Oracle PIVOT 像Oracle和Snowflake这样的关系数据库提供了FOR子句PIVOT函数。您可以使用CASE语句作为替代方法。
下面是一个带有FOR子句的Oracle PIVOT函数示例。

SELECT * FROM order_stats
PIVOT(
    COUNT(order_id)
    FOR category_name
    IN ( 
        'CPU',
        'VC',
        'MB',
        'Storage'
    )
)
ORDER BY status;

+------------+------+-----+-----+----------+--+
|   status   | cpu  | vc  | mb  | storage  |
+------------+------+-----+-----+----------+--+
| cancelled  | 2    | 0   | 0   | 0        |
| pending    | 0    | 1   | 0   | 0        |
| shipped    | 1    | 1   | 1   | 1        |
+------------+------+-----+-----+----------+--+

以下是Apache Hive的等效示例。

select status,
count( case when category_name = 'CPU' then order_id else null end ) as CPU,
count( case when category_name = 'VC' then order_id else null end ) as VC,
count( case when category_name = 'MB' then order_id else null end ) as MB,
count( case when category_name = 'Storage' then order_id else null end ) as Storage
from order_stats
group by status
order by status
;

+------------+------+-----+-----+----------+--+
|   status   | cpu  | vc  | mb  | storage  |
+------------+------+-----+-----+----------+--+
| cancelled  | 2    | 0   | 0   | 0        |
| pending    | 0    | 1   | 0   | 0        |
| shipped    | 1    | 1   | 1   | 1        |
+------------+------+-----+-----+----------+--+

相关文章

微信公众号

最新文章

更多