如何将配置单元查询结果以json格式存储在文件中?

bqujaahr  于 2021-05-27  发布在  Hadoop
关注(0)|答案(1)|浏览(331)

我想将配置单元查询结果存储到json格式的文件中。通过brickhouse jar,我可以获得json格式的查询输出,但无法将其存储在文件或表中。下面给出了我正在尝试的问题。当 INSERT OVERWRITE 查询运行时,它给出一个错误;如何解决此错误?有没有办法通过查询以json格式存储查询结果?
查询:

add jar hdfs:///mydir/brickhouse-0.7.1.jar;

INSERT OVERWRITE DIRECTORY '/mydir/textfile1'
stored as textfile
SELECT to_json( named_struct( "id",id,
            "name",name))
   FROM link_tbl;

错误:

INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: INSERT OVERWRITE DIRECTORY '/mydir/text...pl(Stage-1)
INFO :

INFO : Status: Running (Executing on YARN cluster with App id application_1571318954298_0001)

INFO : Map 1: -/-
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1571318954298_0001_1_00, diagnostics=[Vertex vertex_1571318954298_0001_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4536)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4300(VertexImpl.java:202)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3352)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3301)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3282)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1862)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:201)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1978)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1964)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://sandbox.hortonworks.com:8020/tmp/hive/hive/2eaf13cf-1f98-4a2d-8f76-4e9c839f355b/hive_2019-10-17_13-33-05_763_197979924455130156-2/hive/_tez_scratch_dir/d9d1df72-f68c-4c1f-b642-85a46f32a79f/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 19963874, Size: 113
Serialization trace:
_mainHash (org.codehaus.jackson.sym.BytesToNameCanonicalizer)
_rootByteSymbols (org.codehaus.jackson.JsonFactory)
jsonFactory (brickhouse.udf.json.ToJsonUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:472)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:311)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101)
... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: java.lang.IndexOutOfBoundsException: Index: 19963874, Size: 113
Serialization trace:
_mainHash (org.codehaus.jackson.sym.BytesToNameCanonicalizer)
_rootByteSymbols (org.codehaus.jackson.JsonFactory)
jsonFactory (brickhouse.udf.json.ToJsonUDF)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:745)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1173)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1062)
at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1076)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:432)
... 32 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 19963874, Size: 113
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:820)
at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:743)
at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
... 65 more
]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
yvfmudvl

yvfmudvl1#

解决方案是在这个目录上创建表,并使用jsonserde的强大功能。
创建表:

CREATE EXTERNAL TABLE mydirectory_tbl(
  id   string,
  name string
)
ROW FORMAT SERDE
  'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/mydir' --this is HDFS/S3 location
;

插入数据:

INSERT OVERWRITE table mydirectory_tbl
SELECT id,name
   FROM link_tbl;

并且不能指定文件名来代替表或目录位置。只有目录。如果您想要一个文件,那么您可以稍后连接文件(最好是性能更好的文件),或者强制使用单个缩减器,例如通过添加 ORDER BY id .

相关问题