python—基于计算将多列添加到 Dataframe 的函数—pandas

umuewwlo  于 2021-08-25  发布在  Java
关注(0)|答案(3)|浏览(307)

我有这样一个 Dataframe :

name:   ...  line: 
bobo    ...   10
amy     ...   5
amanda  ...   15

我想创建一个可用于多个 Dataframe 的函数,该函数根据函数中的计算向 Dataframe 添加新列。这就是我试图对函数所做的,但它不起作用。

def check(df, lines):

    for line in lines:
        df['big_line'] = (line*5, line)
        df['small_line'] = line*2
        df['massive_line'] = line*10
        df['line_word'] = line + ' line'

    return df

本质上,我试图让它返回的是如下所示的 Dataframe :
函数调用:

def check(df, df['line'])

返回:

name:   ...  line: big_line: small_line: massive_line: line_word:
bobo    ...   10   (50, 10)         20           100         10 line
amy     ...   5     (25, 5)         10            50          5 line
amanda  ...   15  ...............................................

如果有人能给我指出正确的方向那就太好了。谢谢
我在使用big_line时出错,因为它是元组类型的对象。

bxfogqkk

bxfogqkk1#

将序列指定给序列对象。序列只有2个元素,但 Dataframe 有2行以上。此答案可以帮助您理解错误:

def check(df, lines):
    for line in lines.to_list():
        df['big_line'] = f"({line*5}, {line})"
        df['small_line'] = line*2
        df['massive_line'] = line*10
        df['line_word'] = line + ' line'
    return df

check(df, df['line'])

输出:

name    line    big_line    small_line  massive_line    line_word
0   bobo      10    (75, 15)         30     150             15 line
1   amy       5     (75, 15)         30     150             15 line
2   amanda    15    (75, 15)         30     150             15 line

编辑:根据您的评论,如果您想更新原始 Dataframe 的每一行,那么我建议修改您的原始函数,以便索引每一行号,使用 loc 方法:

def check(df, lines):
  for index, line in enumerate(lines.to_list()):
      df.loc[index, 'big_line'] = f"({line*5}, {line})"
      df.loc[index, 'small_line'] = line*2
      df.loc[index, 'massive_line'] = line*10
      df.loc[index, 'line_word'] = line + ' line'
  return df

输出:

name    line    big_line    small_line  massive_line    line_word
0   bobo    10     (50, 10)            20   100             10 line
1   amy     5      (25, 5)             10   50               5 line
2   amanda  15     (75, 15)            30   150             15 line
kyks70gy

kyks70gy2#

使用计算每行输出的函数

输入:

df = pd.DataFrame({'line': [10,5,15]}, index=['bobo', 'amy', 'amanda']).rename_axis(index='name')

line
name        
bobo      10
amy        5
amanda    15

您可以定义一个返回序列的函数:

def check(s):
    line = s['line']
    return pd.Series({'big_line': (line*5, line),
                      'small_line': line*2,
                      'massive_word': line*10,
                      'line_word': str(line)+' line'
                     })

然后将其应用于行:

df.apply(check, axis=1)

输出:

big_line  small_line  massive_word line_word
name                                                
bobo    (50, 10)          20           100   10 line
amy      (25, 5)          10            50    5 line
amanda  (75, 15)          30           150   15 line

使用向量运算

df['big_line']     = df['line'].apply(lambda x: (5*x, x))
df['small_line']   = df['line']*2
df['massive_line'] = df['line']*10
df['line_word']    = df['line'].astype(str)+' line'
oymdgrw7

oymdgrw73#

如果您只需要一个字符串,可以尝试:

df['big_line'] = f'({5*line}, {line})'

如果需要是元组,则在创建字符串后包括:

df['big_line'] = df.big_line.apply(lambda x: eval(x))

相关问题