在配置单元中使用群集进行bucketing时出错

a0x5cqrl  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(217)
DROP TABLE filtered_online_march_customers;
            --creating bucketed table with customer id
            CREATE TABLE IF NOT EXISTS filtered_online_march_customers(
              customer_id           string,
              order_id              string
            )
            CLUSTERED BY(customer_id) INTO 32 BUCKETS;

--populating the table
set hive.enforce.bucketing = true;
FROM filtered_march_online_transactions
INSERT OVERWRITE TABLE filtered_online_march_customers
SELECT
  *

我创建了这个表,它按客户id进行聚类。但是,当我实际尝试使用bucket时,它不起作用。

CREATE TABLE randomized_filtered_march_customers
AS
SELECT
  *
FROM
 filtered_online_march_customers
TABLESAMPLE(BUCKET 1 OUT OF 32 ON customer_id)

我在pathtopartitioninfo:[maprfs:/hive/v0k0020.db/filtered\u online\u march\u customers/000000\u 0]org.apache.hadoop.hive.io.hivefileformatutils.getpartitiondescfrompathRecursive(hivefileformatutils)中发现错误:cannot find dir=maprfs:///hive/v0k0020.db/filtered\u online\u march\u customers/000000\u 0。java:344)在org.apache.hadoop.hive.ql.io.hivefileformatutils.getPartitionDescrFromPathRecursive(hivefileformatutils)。java:306)位于org.apache.hadoop.hive.ql.io.combinehiveinputformat$combinehiveinputsplit.(combinehiveinputformat。java:108)位于org.apache.hadoop.hive.ql.io.combinehiveinputformat.getsplits(combinehiveinputformat)。java:455)在org.apache.hadoop.mapred.jobclient.writeoldsplits(jobclient。java:1098)在org.apache.hadoop.mapred.jobclient.writeslits(jobclient。java:1090)访问org.apache.hadoop.mapred.jobclient.access$500(jobclient。java:176)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:931)在org.apache.hadoop.mapred.jobclient$2.run(jobclient。java:882)在javax.security.auth.subject.doas(主题)中的java.security.accesscontroller.doprivileged(本机方法)。java:415)在org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation。java:1595)在org.apache.hadoop.mapred.jobclient.submitjobinternal(jobclient。java:882)在org.apache.hadoop.mapred.jobclient.submitjob(jobclient。java:856)在org.apache.hadoop.hive.ql.exec.mr.execdriver.execute(execdriver。java:420)位于org.apache.hadoop.hive.ql.exec.mr.mapredtask.execute(mapredtask)。java:136)在org.apache.hadoop.hive.ql.exec.task.executetask(任务。java:153)位于org.apache.hadoop.hive.ql.exec.taskrunner.runsequential(taskrunner)。java:85)在org.apache.hadoop.hive.ql.driver.launchtask(驱动程序。java:1503)在org.apache.hadoop.hive.ql.driver.execute(driver。java:1270)在org.apache.hadoop.hive.ql.driver.runinternal(driver。java:1088)在org.apache.hadoop.hive.ql.driver.run(driver。java:911)在org.apache.hadoop.hive.ql.driver.run(driver。java:901)在org.apache.hadoop.hive.cli.clidriver.processlocalcmd(clidriver。java:268)在org.apache.hadoop.hive.cli.clidriver.processcmd(clidriver。java:220)在org.apache.hadoop.hive.cli.clidriver.processline(clidriver。java:423)在org.apache.hadoop.hive.cli.clidriver.processline(clidriver。java:359)在org.apache.hadoop.hive.cli.clidriver.processreader(clidriver。java:456)在org.apache.hadoop.hive.cli.clidriver.processfile(clidriver。java:466)在org.apache.hadoop.hive.cli.clidriver.executedriver(clidriver。java:748)在org.apache.hadoop.hive.cli.clidriver.run(clidriver。java:686)位于org.apache.hadoop.hive.cli.clidriver.main(clidriver。java:625)在sun.reflect.nativemethodaccessorimpl.invoke0(本机方法)位于sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl)。java:57)在sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl。java:43)在java.lang.reflect.method.invoke(方法。java:606)在org.apache.hadoop.util.runjar.run(runjar。java:221)在org.apache.hadoop.util.runjar.main(runjar。java:136)作业提交失败,出现异常“java.io.ioexception(在路径分区信息:[maprfs:/hive/v0k0020.db/filtered\u online\u march\u customers/000000\u 0]中找不到dir=maprfs:///hive/v0k0020.db/filtered\u online\u march\u customers/000000\u 0])”
如果我把查询改为

CREATE TABLE randomized_filtered_march_customers
    AS
    SELECT
      *
    FROM
     filtered_online_march_customers
    TABLESAMPLE(BUCKET 1 OUT OF 32 ON rand())

很好用。你知道怎么解决吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题