使用pandas或numpy -在一个组中，如何将数据从每行添加到组中的每行？

ig9co6j1 于 5个月前发布在其他

关注(0)|答案(1)|浏览(51)

我有一个像下面这样的dict，它代表一场赛马。数据集中有许多比赛，按raceId分组：

data_orig = {
    'meetingId': [178515] * 6,
    'raceId': [879507] * 6,
    'horseId': [90001, 90002, 90003, 90004, 90005, 90006],
    'position': [1, 2, 3, 4, 5, 6],
    'weight': [51, 52, 53, 54, 55, 56],
}

字符串
我想把每一行的马的具体数据添加到每一行。结果应该是这样的：

data_new = {
    'meetingId': [178515] * 6,
    'raceId': [879507] * 6,
    'horseId_a':[90001, 90002, 90003, 90004, 90005, 90006],
    'position_a':[1, 2, 3, 4, 5, 6],
    'weight_a':[51, 52, 53, 54, 55, 56],
    'horseId_b':[90002, 90003, 90004, 90005, 90006, 90001],
    'position_b':[2, 3, 4, 5, 6, 1],
    'weight_b':[52, 53, 54, 55, 56, 51],
    'horseId_c':[90003, 90004, 90005, 90006, 90001, 90002],
    'position_c':[3, 4, 5, 6, 1, 2],
    'weight_c':[53, 54, 55, 56, 51, 52],
    'horseId_d':[90004, 90005, 90006, 90001, 90002, 90003],
    'position_d':[4, 5, 6, 1, 2, 3],
    'weight_d':[54, 55, 56, 51, 52, 53],
    'horseId_e':[90005, 90006, 90001, 90002, 90003, 90004],
    'position_e':[5, 6, 1, 2, 3, 4],
    'weight_e':[55, 56, 51, 52, 53, 54,],
    'horseId_f':[90006, 90001, 90002, 90003, 90004, 90005],
    'position_f':[6, 1, 2, 3, 4, 5],
    'weight_f':[56, 51, 52, 53, 54, 55],
}

型
我在下面试过了，这是对矩阵的调换。

data_orig_df = pd.DataFrame(data_orig)
new_df = pd.DataFrame()
for index, row_i in data_orig_df.iterrows():
    horseId = row_i['horseId']
    row_new = row_i.copy()
    for index, row_j in race_df.iterrows():
        if row_j['horseId']:
            continue
        row_new = pd.merge(row_new, row_j[getHorseSpecificCols()], suffixes=('', row_j['position']))
    new_df = pd.concat([new_df, row_new], axis=1)

型
谢谢你的帮忙。

numpy

来源：https://stackoverflow.com/questions/77572335/using-pandas-or-numpy-within-a-group-how-can-i-add-data-from-every-row-to-eve

1条答案

按热度按时间

8qgya5xd1#

您可以使用numpy轻松地滚动/索引值：

def roll(g):
    a = g.to_numpy()
    x = np.arange(len(a))
    return pd.DataFrame(a[((x[:,None] + x)%len(a)).ravel()].reshape(len(a), -1),
                        index=g.index,
                        columns=[f'{c}_{i+1}' for i in x for c in g.columns])
    
cols = ['meetingId', 'raceId']

out = (data_orig_df.groupby(cols)
       .apply(lambda g: roll(g.drop(columns=cols)))
       .reset_index(cols)
       )

字符串
输出量：

meetingId  raceId  horseId_1  position_1  weight_1  horseId_2  position_2  weight_2  horseId_3  position_3  weight_3  horseId_4  position_4  weight_4  horseId_5  position_5  weight_5  horseId_6  position_6  weight_6
0     178515  879507      90001           1        51      90002           2        52      90003           3        53      90004           4        54      90005           5        55      90006           6        56
1     178515  879507      90002           2        52      90003           3        53      90004           4        54      90005           5        55      90006           6        56      90001           1        51
2     178515  879507      90003           3        53      90004           4        54      90005           5        55      90006           6        56      90001           1        51      90002           2        52
3     178515  879507      90004           4        54      90005           5        55      90006           6        56      90001           1        51      90002           2        52      90003           3        53
4     178515  879507      90005           5        55      90006           6        56      90001           1        51      90002           2        52      90003           3        53      90004           4        54
5     178515  879507      90006           6        56      90001           1        51      90002           2        52      90003           3        53      90004           4        54      90005           5        55

型

赞(0）回复(0）举报 5个月前

我来回答

使用pandas或numpy -在一个组中，如何将数据从每行添加到组中的每行？

1条答案

相关问题

热门标签

最新问答