HiveParquet快速压缩不起作用

4nkexdtk 于 2021-05-29 发布在 Hadoop

关注(0)|答案(5)|浏览(431)

我正在使用表属性创建一个表 backbone TBLPROPERTIES('PARQUET.COMPRESSION'='SNAPPY') （因为文件是Parquet格式的）并在创建表之前设置一些参数，如下所示：

set hive.exec.dynamic.partition.mode=nonstrict;
set parquet.enable.dictionary=false;
set hive.plan.serialization.format=javaXML;
SET hive.exec.compress.output=true;
SET mapred.output.compression.type=BLOCK;
set avro.output.codec=snappy;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
add jar /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p1168.923/lib/sentry/lib/hive-metastore.jar;

table还是没有被压缩。你能告诉我table没有压缩的原因吗。
提前感谢您的投入。

hadoop Hive parquet hiveql snappy

来源：https://stackoverflow.com/questions/48395999/hive-parquet-snappy-compression-not-working

5条答案

按热度按时间

tv6aics11#

我看到这个错误被做了好几次，这里是什么需要做的（这将只适用于Hive）。不带Spark）：
旧物业：
tblproperty（'parquet.compression'='snappy'）
正确属性：
tblproperty（'parquet.compress'='snappy'）

赞(0）回复(0）举报 2021-05-29

ktca8awb2#

我最近创建了一些存储为Parquet文件的表，并使用了以下命令：

set hive.exec.compress.output=true;
set mapreduce.output.fileoutputformat.compress=true;
set hive.intermediate.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
set hive.intermediate.compression.type=BLOCK;

赞(0）回复(0）举报 2021-05-29

wixjitnu3#

你的Parquet桌可能是压缩了，但你没有直接看到。在Parquet文件中，压缩被烘焙到格式中。不是压缩整个文件，而是使用指定的算法压缩单个段。因此，压缩Parquet地板从外观上看与压缩Parquet地板相同（通常它们不包含任何后缀，就像普通压缩文件（例如。 .gz )因为您无法使用常用工具对其进行解压缩）。
将压缩烘焙成Parquet格式是Parquet格式的许多优点之一。这使得文件（hadoop-）可以独立于压缩算法进行拆分，并且可以快速访问文件的特定部分，而无需解压缩整个文件。在查询引擎处理parquet文件上的查询的情况下，这意味着它通常只需要读取较小但未压缩的头，查看与查询相关的段，然后只需要解压缩这些相关的部分。

赞(0）回复(0）举报 2021-05-29

eh57zj3b4#

解决方案是使用 “TBLPROPERTIES ('parquet.compression'='SNAPPY')” （和案件的重要性）在ddl而不是 “TBLPROPERTIES ('PARQUET.COMPRESSION'='SNAPPY')”. 您还可以使用配置单元中的以下属性来实现压缩。

set parquet.compression=SNAPPY

赞(0）回复(0）举报 2021-05-29

xvw2m8pv5#

Set the below parameters and after that perform below steps-

SET parquet.compression=SNAPPY; 
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

CREATE TABLE Test_Parquet ( 
    customerID int, name string, ..etc 
) STORED AS PARQUET Location ''

INSERT INTO Test_Parquet SELECT * FROM Test;

If not how do i identify a parquet table with snappy compression and parquet table without snappy compression?.

describe formatted tableName

Note  -  but you will always see the compression as NO because the compression data format is not stored in 
metadata of the table , the best way is to do dfs  -ls -r  to the table location and see the file format for compression.

Note- Currently the default compression is - Snappy with Impala tables. 

If your issue didn't resolved after these steps also please post the all steps which are you performing..?

赞(0）回复(0）举报 2021-05-29

我来回答

HiveParquet快速压缩不起作用

5条答案

相关问题

热门标签

最新问答