flink parquetavrowriters给出了错误的演员阵容

z8dt9xmd  于 2021-06-24  发布在  Flink
关注(0)|答案(1)|浏览(411)

在阅读了kafka的genericrecord之后,我编写了一个将流写入parquet格式的示例代码

Properties config = new Properties();
        config.setProperty("bootstrap.servers", "localhost:9092");
        config.setProperty("group.id", "1");
        config.setProperty("zookeeper.connect", "localhost:2181");
        String schemaRegistryUrl = "http://127.0.0.1:8081";

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        File file = new File(EventProcessor.class.getClassLoader().getResource("event.avsc").getFile());
        Schema schema = new Schema.Parser().parse(file);

        DataStreamSource<GenericRecord> input = env
                .addSource(
                        new FlinkKafkaConsumer010<GenericRecord>("event_new",
                                new KafkaGenericAvroDeserializationSchema(schemaRegistryUrl),
                                config).setStartFromEarliest());

        Path path = new Path("/tmp");

        final StreamingFileSink sink = StreamingFileSink.forBulkFormat
                (path, ParquetAvroWriters.forGenericRecord(schema)).build();

        input.addSink(sink);

当我运行此代码时,得到错误:

Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator

Caused by: java.lang.ClassCastException: org.apache.avro.util.Utf8 cannot be cast to org.apache.avro.generic.IndexedRecord
    at org.apache.avro.generic.GenericData.getField(GenericData.java:697)
    at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:188)

我不明白出了什么问题。请帮助我理解并解决这个问题。

ccgok5k5

ccgok5k51#

最可能的原因是event.avsc与存储在kafka中的记录不匹配。它正在寻找一个字符串,它需要一个记录。
如果您添加来自kafka的模式和示例记录(例如,使用console consumer打印),那么我可以提供更多帮助。

相关问题