dataframe:将col1中由col2中的字符串标识的子字符串替换为col3中的字符串

zvokhttg  于 2021-09-08  发布在  Java
关注(0)|答案(2)|浏览(275)

我有一个pandas数据框,我的问题可以简化为下面的例子。我想用第三列中指定的字符串替换一列中由第二列中的字符串标识的部分字符串。
示例 Dataframe :

main_string          | target | replacement
Hello My Name is XXX | XXX    | John
Hello My name is YYY | YYY    | Mary
Hello my Name is Rob | Nan    | None
Hello My name is ZZZ | ZZZ    | Kate

我的目标输出是,在新的dataframe列中:

Hello My Name is John
Hello My name is Mary
Hello my Name is Rob 
Hello My name is Kate
vd2z7a6w

vd2z7a6w1#

可以将pandas的replace()函数与以下示例中的字典一起使用https://cmdlinetips.com/2021/04/pandas-replace-to-one-or-more-column-values-python/

zfciruhq

zfciruhq2#

将自定义lambda函数用于 if-else 用于测试缺少的值 NaNNone 喜欢 Nonetype :

f = lambda x: x['main_string'].replace(x['target'], x['replacement']) 
              if pd.notna(x['target']) 
              else x['main_string']
df['out'] = df.apply(f, axis=1)
print (df)
            main_string target replacement                    out
0  Hello My Name is XXX    XXX        John  Hello My Name is John
1  Hello My name is YYY    YYY        Mary  Hello My name is Mary
2  Hello my Name is Rob    Nan        None   Hello my Name is Rob
3  Hello My name is ZZZ    ZZZ        Kate  Hello My name is Kate

具有列表理解功能的替代解决方案:

df['out'] = [a.replace(b, c) if pd.notna(b) else a 
             for a,b,c in df[['main_string','target','replacement']].to_numpy()]

相关问题