python—是否有方法获取列表中与dataframe中的值相匹配的值?

cidc1ykv  于 2021-08-25  发布在  Java
关注(0)|答案(1)|浏览(257)

我有一个这样的单词列表:

words1 = ['hi','my']
words2 = ['name','is']

我有 Dataframe df 这样地:

id Sentence
0  'my name was'
1  'hi i am'
2  'my phone is'
3  'what is this'
4  'her name was'

我正在运行下面的代码,以获取值匹配的 Dataframe 索引。

matched_idx1 = df.loc[df.Sentence.str.contains('|'.join(words1)),:].index.array
matched_idx2 = df.loc[df.Sentence.str.contains('|'.join(words2)),:].index.array

因此 matched_idx1 给出了数组:

[0,1,2]

matched_idx2 给出了数组:

[0,2,3,4]

现在,我想获得在contains函数中匹配的值的列表或数组。
比如说一个新的变量 matched_idx1_values 输出应为:

['my','hi','my']

以及 matched_idx2_values 输出应为:

['name','is','is','name']

请让我知道如何获得这些索引以及它们匹配的值。这个例子很简单,我的列表中还有很多词。
谢谢

qacovj5a

qacovj5a1#

下面是使用spacy的完整示例:


# Sample data

import pandas as pd
df = pd.DataFrame({'id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}, 'Sentence': {0: 'my name was', 1: 'hi i am', 2: 'my phone is', 3: 'what is this', 4: 'her name was'}})

# Load spacy

import spacy
nlp = spacy.blank("en")
ruler = nlp.add_pipe('entity_ruler', config={"overwrite_ents": True}, last=True)

# add word patterns

lst_all_patterns = list()

for wrd in words1:
    lst_all_patterns += [{"label": "words1", "pattern": [{"lower": wrd}]}]

for wrd in words2:
    lst_all_patterns += [{"label": "words1", "pattern": [{"lower": wrd}]}]

ruler.add_patterns(lst_all_patterns)

# EXAMPLE:

doc_string = nlp('my name was')
for e in doc_string.ents:
    print(e.label_, e, e.start, e.end)

# words1 my 0 1

# words1 name 1 2

# EXAMPLE dataframe

df['docs'] = df['Sentence'].map(nlp)
df['docs'].map(lambda x: [e.start for e in x.ents])

# 0    [0, 1]

# 1       [0]

# 2    [0, 2]

# 3       [1]

# 4       [1]

# Name: docs, dtype: object

相关问题