xml处理

a1o7rhls 于 2021-06-21 发布在 Flink

关注(0)|答案(1)|浏览(240)

我是ApacheFlink和分布式处理的新手。我已经阅读了flink快速安装指南，了解了mapfunctions的基础知识。但我找不到具体的xml处理示例。我读过hadoopsxmlinputformat，但不知道如何使用它。
我需要的是，我有巨大的（100mb）xml文件的格式如下，

<Class>
    <student>.....</student>
    <student>.....</student>
    .
    .
    .
    <student>.....</student>
</Class>

flink处理器将从hdfs读取文件并开始处理它（基本上遍历所有student元素）
我想知道（用外行的话说），如何处理xml并创建一个student对象列表。
一个简单的外行的解释将不胜感激

Java apache-flink flink-streaming xml-parsing

来源：https://stackoverflow.com/questions/40215387/xml-processing-using-apache-flink

1条答案

按热度按时间

qlvxas9a1#

Apache·马霍特 XmlInputFormat 对于apachehadoop，它提取两个标记之间的文本（在您的示例中可能是 <student> 以及 </student> ). flink提供了使用hadoop输入格式的 Package 器，例如通过 readHadoopFile() 方法 ExecutionEnvironment .
如果您不想使用 XmlInputFormat 如果xml文件格式很好，即每个学生记录都在一行中，那么可以使用flink的常规textinputformat逐行读取文件。随后的 FlatMap 函数可以解析所有学生行并过滤掉所有其他行。

赞(0）回复(0）举报 2021-06-22

我来回答

xml处理

1条答案

相关问题

热门标签

最新问答