如何为一组操作/工作流创建dag表示?

az31mfrm  于 2021-06-24  发布在  Flink
关注(0)|答案(3)|浏览(278)

我们使用apacheflink进行流处理。文档中说,flink根据定义的操作(流数据的转换链)生成执行图/dag。我还可以在ui门户上看到dag表示。
我有点好奇这是怎么做到的。是否有任何可用的库为flink执行此操作或由flink自己实现。

fruv7luv

fruv7luv1#

如果要查看作业的执行计划,可以执行以下操作:

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
...
System.out.println(env.getExecutionPlan());
env.execute();

您将希望在作业结束时捕获此计划,因为此时作业图已完全构建。
它看起来像这样:

{
  "nodes": [
    {
      "id": 1,
      "type": "Source: Collection Source",
      "pact": "Data Source",
      "contents": "Source: Collection Source",
      "parallelism": 1
    },
    {
      "id": 3,
      "type": "Map",
      "pact": "Operator",
      "contents": "Map",
      "parallelism": 4,
      "predecessors": [
        {
          "id": 1,
          "ship_strategy": "REBALANCE",
          "side": "second"
        }
      ]
    },
    {
      "id": 5,
      "type": "Source: Collection Source",
      "pact": "Data Source",
      "contents": "Source: Collection Source",
      "parallelism": 1
    },
    {
      "id": 6,
      "type": "Flat Map",
      "pact": "Operator",
      "contents": "Flat Map",
      "parallelism": 4,
      "predecessors": [
        {
          "id": 5,
          "ship_strategy": "REBALANCE",
          "side": "second"
        }
      ]
    },
    {
      "id": 8,
      "type": "Co-Process-Broadcast-Keyed",
      "pact": "Operator",
      "contents": "Co-Process-Broadcast-Keyed",
      "parallelism": 8,
      "predecessors": [
        {
          "id": 3,
          "ship_strategy": "HASH",
          "side": "second"
        },
        {
          "id": 6,
          "ship_strategy": "BROADCAST",
          "side": "second"
        }
      ]
    },
    {
      "id": 9,
      "type": "Sink: Print to Std. Out",
      "pact": "Data Sink",
      "contents": "Sink: Print to Std. Out",
      "parallelism": 8,
      "predecessors": [
        {
          "id": 8,
          "ship_strategy": "FORWARD",
          "side": "second"
        }
      ]
    }
  ]
}
8dtrkrch

8dtrkrch2#

它是由flink自己实现的。如果你深入研究代码,你会发现 org.apache.flink.streaming.api.graph.JSONGenerator 类是什么 @Internal 而且有一个 getJSON 方法。它用于生成 StreamGraph 示例(这里涉及到jackson库)。这个 StreamGraph 它本身表示一个完整的作业拓扑,可以以多种方式呈现。
Flink来源

tmb3ates

tmb3ates3#

除了大卫说的,你还可以用 planToDot() 方法将他提到的json转换为标准的图形格式( .dot 文件),然后可以用几个图形可视化程序中的任意一个打开。请注意,这是一个逻辑计划,因此您不会看到像flink的WebUI中显示的那样的操作符管道衬砌结果。

相关问题