如何使用FuzzyWzzy从列表中提取全文?

yduiuuwa  于 2021-08-20  发布在  Java
关注(0)|答案(2)|浏览(296)

下面是我的代码:

from fuzzywuzzy import fuzz

check = open("text.txt","a")

MIN_MATCH_SCORE = 30
heard_word = 'i5-1135G7 '
possible_words = check

guessed_word = [word for word in possible_words if fuzz.ratio(heard_word, word) >= 
MIN_MATCH_SCORE]
print ('this one - ', guessed_word)

预期产出:

11th Generation Intel® Core™ i5-1135G7 Processor

仅仅给出“i5-1135g7”就可以得到预期输出的整个句子吗?是否有其他解决方案来实现我的期望?先谢谢你。
下面是text.txt的链接
https://drive.google.com/file/d/1mo3qfmeoaqa3wppyg8spefvsjdx7aqbj/view

l7wslrjt

l7wslrjt1#

为了抵消较长的句子,并确保在单词层面上重叠,您应该使用 token_set_ratio . 另外,如果您想要完整的单词重叠,则增加 MIN_MATCH_SCORE 接近100。

from fuzzywuzzy import fuzz

MIN_MATCH_SCORE = 90
heard_word = 'i5-1135G7'

possible_words = ['11th Generation Intel® Core™ i5-1135G7 Processor (2.40 GHz,up to  4.20 GHz with Turbo Boost, 4 Cores, 8 Threads, 8 MB Cache)', 
                   'windows 10 64 bit', 'intel i7']

print ([word for word in possible_words 
        if fuzz.token_set_ratio(heard_word, word) >= MIN_MATCH_SCORE])

输出:

['11th Generation Intel® Core™ i5-1135G7 Processor (2.40 GHz,up to  4.20 GHz with Turbo Boost, 4 Cores, 8 Threads, 8 MB Cache)']
nhhxz33t

nhhxz33t2#

token\u set\u比率工作正常!

从fuzzyfuzzy导入fuzz

s = []
for l in df1.values:
    l = ', '.join(l)
    s.append(l)

s = ', '.join(s)    
main = [x for x in g if x]
MIN_MATCH_SCORE = 60
heard_word = 'i5-11th gen'
guessed_word = [word for word in main if fuzz.token_set_ratio(heard_word, 
word) >= MIN_MATCH_SCORE]
print ('this one - ', guessed_word)

相关问题