我想隐藏这个值
{"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"20 alt1.aspmx.l.google.com"}
{"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"20 alt2.aspmx.l.google.com"}
{"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"30 aspmx2.googlemail.com"}
{"timestamp":"1601093713","name":"exmple1.com","type":"mx","value":"30 aspmx3.googlemail.com"}
{"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"20 alt1.aspmx.l.google.com"}
{"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"20 alt2.aspmx.l.google.com"}
{"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"30 aspmx2.googlemail.com"}
{"timestamp":"1601093713","name":"exmple2.com","type":"mx","value":"30 aspmx3.googlemail.com"}
test.printSchema()
root
|-- name: string (nullable = true)
|-- timestamp: string (nullable = true)
|-- type: string (nullable = true)
|-- value: string (nullable = true)
将具有相同名称的mx值组合在一行中,得到我想要的结果
{ "timestamp":"1601093713", "name":"exmple1.com", "type":"mx", "value":" alt1.aspmx.l.google.com,alt2.aspmx.l.google.com , aspmx2.googlemail.com, aspmx3.googlemail.com" }
{ "timestamp":"1601093713", "name":"exmple2.com", "type":"mx", "value":" alt1.aspmx.l.google.com, alt2.aspmx.l.google.com , aspmx2.googlemail.com, aspmx3.googlemail.com" }
1条答案
按热度按时间8dtrkrch1#
你可以使用
groupBy
,agg
,和collect_list
[文档(外部链接)]。请注意,这将提供一个值列表,而不是一个字符串。如果需要,可以在convert pyspark dataframe column from list to string中找到如何进行转换。接下来的问题是如何处理其他列。e、 g.时间戳或类型。