在pysparkMap中使用带有for循环的函数

2admgd59  于 2021-05-24  发布在  Spark
关注(0)|答案(1)|浏览(345)

我正在尝试删除rdd中每个单词的特殊字符:

special_characters = '~!@#$%^&*()_+-=[]{};:,<.>/?'
    def remove_special_characters(word):
        for character in special_characters[0: len(special_characters)]:
            word = word.replace(character, '')
            return word
words = lines.flatMap(lambda line: line.split(" "))
words_lower = words.map(lambda word: word.lower())

clean_words_1 = words_lower.map(lambda word: remove_special_characters(word))
clean_words_2 = words_lower.map(remove_special_characters)

每个单词只替换第一个特殊字符。

相关问题