如何在sparksql中找到给定特定行/id的前n个相似行?

yqhsw0fo  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(193)

我想找到类似的行,我研究了一点,但找不到太多。我想到了groupby,但似乎效率不高。
我有一些这样的样本数据

{"customer":"customer-13","attributes":{"att-a":"att-a-6","att-b":"att-b-9","att-c":"att-c-15","att-d":"att-d-12","att-e":"att-e-10","att-f":"att-f-8","att-g":"att-g-1","att-h":"att-h-11","att-i":"att-i-14","att-j":"att-j-2"}}
{"customer":"customer-14","attributes":{"att-a":"att-a-4","att-b":"att-b-1","att-c":"att-c-2","att-d":"att-d-13","att-e":"att-e-4","att-f":"att-f-9","att-g":"att-g-10","att-h":"att-h-4","att-i":"att-i-15","att-j":"att-j-3"}}
{"customer":"customer-15","attributes":{"att-a":"att-a-9","att-b":"att-b-9","att-c":"att-c-15","att-d":"att-d-7","att-e":"att-e-10","att-f":"att-f-12","att-g":"att-g-5","att-h":"att-h-3","att-i":"att-i-5","att-j":"att-j-4"}}
{"customer":"customer-16","attributes":{"att-a":"att-a-15","att-b":"att-b-11","att-c":"att-c-13","att-d":"att-d-14","att-e":"att-e-7","att-f":"att-f-8","att-g":"att-g-7","att-h":"att-h-8","att-i":"att-i-3","att-j":"att-j-6"}}

我给客户id和编号n来查找最相似的客户。
例如,给定customer-20,找出前5个相似的客户。你知道怎么做吗?我比较新。
谢谢

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题