如何在databricks中访问spark executor、worker和master级别的度量?

tpxzln5u  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(424)

我的问题是针对databricks的。我试图通过在spark配置下传递来通过graphite sink在databricks中访问spark度量,我希望在创建集群时传递配置。

spark.metrics.conf.*.sink.graphite.class org.apache.spark.metrics.sink.GraphiteSink
spark.metrics.conf.*.sink.graphite.host myhost
spark.metrics.conf.*.sink.graphite.port 2003
spark.metrics.conf.*.sink.graphite.period 10
spark.metrics.conf.*.sink.graphite.unit seconds
spark.metrics.conf.*.source.jvm.class org.apache.spark.metrics.source.JvmSource

但是,上面的配置只获取驱动程序级别的度量。我在一些帖子中读到,要获得执行者级别的指标,必须传递以下参数

spark-submit <other parameters> --files metrics.properties 
--conf spark.metrics.conf=metrics.properties

我的问题是在创建集群时如何在databricks中传递--files参数(因为我没有执行spark submit),或者是否有其他方法来获取executor、worker和master级别的度量?
群集json

{
    "num_workers": 0,
    "cluster_name": "mycluster",
    "spark_version": "5.5.x-scala2.11",
    "spark_conf": {
        "spark.metrics.conf.*.sink.graphite.unit": "seconds",
        "spark.metrics.conf.*.sink.graphite.class": "org.apache.spark.metrics.sink.GraphiteSink",
        "spark.metrics.conf.*.sink.graphite.period": "10",
        "spark.databricks.delta.preview.enabled": "true",
        "spark.metrics.conf.*.source.jvm.class": "org.apache.spark.metrics.source.JvmSource",
        "spark.metrics.conf.*.sink.graphite.host": "myhost",
        "spark.metrics.conf.*.sink.graphite.port": "2003"
    },
    "aws_attributes": {
        "first_on_demand": 0,
        "availability": "ON_DEMAND",
        "zone_id": "us-west-2c",
        "spot_bid_price_percent": 100,
        "ebs_volume_count": 0
    },
    "node_type_id": "dev-tier-node",
    "driver_node_type_id": "dev-tier-node",
    "ssh_public_keys": [],
    "custom_tags": {},
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "autotermination_minutes": 120,
    "enable_elastic_disk": false,
    "cluster_source": "UI",
    "init_scripts": [],
    "cluster_id": "0604-114345-oboe241"
}
ev7lccsx

ev7lccsx1#

--jars、--py files、--files参数支持dbfs路径。
您可以在spark submit中指定度量文件的路径,如下所示。

--files=dbfs/yourPath/metrics.properties --conf spark.metrics.conf=./metrics.properties

reference:azure databricks 作业api-sparksubmittask
本文给出了一个如何使用spark可配置度量系统监视apachespark组件的示例。具体来说,它展示了如何设置一个新的源和启用一个接收器。

相关问题