如何在没有数据集到rdd转换的情况下做到这一点？

6fe3ivhb 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(470)

有人能帮我避免rdd转换吗？

val qksDistribution: Array[((String, Int), Long)] = tripDataset
      .map(i => ((i.getFirstPoint.getQk.substring(0, QK_PARTITION_LEVEL), i.getProviderId), 1L))
      .rdd
      .reduceByKey(_+_)
      .filter(_._2>maxCountInPartition/10)
      .collect

scala rdd Dataset apache-spark

来源：https://stackoverflow.com/questions/61940492/how-to-do-that-without-dataset-to-rdd-conversion

1条答案

按热度按时间

irtuqstp1#

val qksDistribution: Array[((String, Int), Long)] = tripDataset
      .map(i => (i.getFirstPoint.getQk.substring(0, QK_PARTITION_LEVEL), i.getProviderId)) // no need to add the 1
      .groupByKey(x => x) //similar to key by
      .count // you wanted to count per key
      .filter(_._2>maxCountInPartition/10)
      .collect

赞(0）回复(0）举报 2021-05-27

我来回答

如何在没有数据集到rdd转换的情况下做到这一点？

1条答案

相关问题

热门标签

最新问答