在一行pyspark中组合具有相同名称的mx值

l3zydbqr 于 2021-05-19 发布在 Spark

关注(0)|答案(1)|浏览(364)

我想隐藏这个值

{"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"20 alt1.aspmx.l.google.com"}
    {"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"20 alt2.aspmx.l.google.com"}
    {"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"30 aspmx2.googlemail.com"}
    {"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"30 aspmx3.googlemail.com"}
    {"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"20 alt1.aspmx.l.google.com"}
    {"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"20 alt2.aspmx.l.google.com"}
    {"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"30 aspmx2.googlemail.com"}
    {"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"30 aspmx3.googlemail.com"}

    test.printSchema()
root
 |-- name: string (nullable = true)
 |-- timestamp: string (nullable = true)
 |-- type: string (nullable = true)
 |-- value: string (nullable = true)

将具有相同名称的mx值组合在一行中，得到我想要的结果

{ "timestamp":"1601093713", "name":"exmple1.com", "type":"mx", "value":" alt1.aspmx.l.google.com,alt2.aspmx.l.google.com , aspmx2.googlemail.com, aspmx3.googlemail.com" }
   { "timestamp":"1601093713", "name":"exmple2.com", "type":"mx", "value":" alt1.aspmx.l.google.com, alt2.aspmx.l.google.com , aspmx2.googlemail.com, aspmx3.googlemail.com" }

python apache-spark pyspark apache-spark-sql

来源：https://stackoverflow.com/questions/64404813/combine-the-mx-value-with-same-name-in-one-line-pyspark

1条答案

按热度按时间

8dtrkrch1#

你可以使用 groupBy , agg ，和 collect_list [文档（外部链接）]。请注意，这将提供一个值列表，而不是一个字符串。如果需要，可以在convert pyspark dataframe column from list to string中找到如何进行转换。

df_grouped = df.groupby('name').agg(F.collect_list('value').alias('values'))

接下来的问题是如何处理其他列。e、 g.时间戳或类型。

赞(0）回复(0）举报 2021-05-20

我来回答

在一行pyspark中组合具有相同名称的mx值

1条答案

相关问题

热门标签

最新问答