如何从hive中的特定bucket中检索数据

rmbxnbpk 于 2021-05-30 发布在 Hadoop

关注(0)|答案(4)|浏览(282)

我在Hive中创建了一个表

create table HiveMB 
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets 
   stored as orc TBLPROPERTIES ('transactional'='true') ;

我的文件格式是

1,Anne,Admin,50000,A
2,Gokul,Admin,50000,B
3,Janet,Sales,60000,A
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C

数据分为三个存储桶供部门使用。
我检查仓库时，发现有3个桶

Found 3 items
-rwxr-xr-x   3 aibladmin hadoop     252330 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00000
-rwxr-xr-x   3 aibladmin hadoop     100421 2014-11-28 14:45 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00001
-rwxr-xr-x   3 aibladmin hadoop     313047 2014-11-28 14:46 /user/hive/warehouse/hivemb/delta_0000012_0000012/bucket_00002

我怎样才能找回一个这样的桶。
当我做了一个 -cat ，它不是人类可读的格式。表现出

`J�ǉ�(��rwNj��[��Y���gR�� \�B�Q_Js)�6 �st�A�6�ixt� R �
ޜ�KT� e����IL Iԋ� ł2�2���I�Y��FC8 /2�g� ����� > ������q�D � b�` `�`���89$ $$ ����I��y|@޿    
                                                                                                %\���� �&�ɢ`a~ � S �$�l�:y���K $�$����X�X��)Ě���U*��
6.  �� �cJnf� KHjr�ć����� ��(p` ��˻_1s  �5ps1:  1:I4L\��u

如何才能看到存储在每个存储桶中的数据？
我的文件是csv格式而不是orc，所以作为一个解决方法，我做了这个
但我无法查看桶中的数据。这不是人类可读的格式。

hadoop Hive partitioning

来源：https://stackoverflow.com/questions/27223340/how-to-retrieve-data-from-a-specific-bucket-in-hive

4条答案

按热度按时间

unftdfkk1#

您的table：

> create table HiveMB 
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets 
   stored as orc TBLPROPERTIES ('transactional'='true') ;

您被选为 ORC 格式，这意味着它压缩实际数据并存储压缩数据。

赞(0）回复(0）举报 2021-05-30

l7wslrjt2#

create table HiveMB1 
  (EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
 row format delimited
fields terminated by ',';

load data local inpath '/home/user17/Data/hive.txt'
overwrite into table HiveMB1;

create table HiveMB2
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets
   row format delimited
   fields terminated by ',';

 insert overwrite table HiveMB2 select * from HiveMB1 ;

user17@BG17:~$ hadoop dfs -ls /user/hive/warehouse/hivemb2
Found 3 items
-rw-r--r--   1 user17 supergroup         22 2014-12-01 15:52 /user/hive/warehouse/hivemb2/000000_0
-rw-r--r--   1 user17 supergroup         44 2014-12-01 15:53 /user/hive/warehouse/hivemb2/000001_0
-rw-r--r--   1 user17 supergroup         43 2014-12-01 15:53 /user/hive/warehouse/hivemb2/000002_0

user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000000_0
2,Gokul,Admin,50000,B

user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000001_0
4,Hari,Admin,50000,C
5,Sanker,Admin,50000,C

user17@BG17:~$ hadoop dfs -cat /user/hive/warehouse/hivemb2/000002_0
1,Anne,Admin,50000,A
3,Janet,Sales,60000,A

赞(0）回复(0）举报 2021-05-30

lb3vh1jj3#

您可以通过以下命令查看存储桶的orc格式：

hive --orcfiledump [path-to-the-bucket]

赞(0）回复(0）举报 2021-05-30

pcww981p4#

我正在上传orc的屏幕截图，它是从这个Hive中产生的：

create table stackOverFlow 
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
row format delimited
fields terminated by ',';

load data local inpath '/home/ravi/stack_file.txt'
overwrite into table stackOverFlow;

和

create table stackOverFlow6
(EmployeeID Int,FirstName String,Designation String,Salary Int,Department String)
   clustered by (Department) into 3 buckets
   row format delimited
   fields terminated by ','
stored as orc tblproperties ("orc.compress"="ZLIB");
 insert overwrite table stackOverFlow6 select * from stackOverFlow;

为上述配置单元查询生成的orc结果文件：

赞(0）回复(0）举报 2021-05-30

我来回答

如何从hive中的特定bucket中检索数据

4条答案

相关问题

热门标签

最新问答