如何在没有数据集到rdd转换的情况下做到这一点?

6fe3ivhb  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(470)

有人能帮我避免rdd转换吗?

val qksDistribution: Array[((String, Int), Long)] = tripDataset
      .map(i => ((i.getFirstPoint.getQk.substring(0, QK_PARTITION_LEVEL), i.getProviderId), 1L))
      .rdd
      .reduceByKey(_+_)
      .filter(_._2>maxCountInPartition/10)
      .collect
irtuqstp

irtuqstp1#

val qksDistribution: Array[((String, Int), Long)] = tripDataset
      .map(i => (i.getFirstPoint.getQk.substring(0, QK_PARTITION_LEVEL), i.getProviderId)) // no need to add the 1
      .groupByKey(x => x) //similar to key by
      .count // you wanted to count per key
      .filter(_._2>maxCountInPartition/10)
      .collect

相关问题