pandas 操作数据框

xpszyzbs  于 2022-11-20  发布在  其他
关注(0)|答案(1)|浏览(141)

假设我正在处理一个数据集:#虚拟数据集

import pandas as pd
data = pd.DataFrame({"Name_id" : ["John","Deep","Julia","John","Sandy",'Deep'], 
                     "Month_id" : ["December","March","May","April","May","July"],
                    "Colour_id" : ["Red",'Purple','Green','Black','Yellow','Orange']})
data

如何将此数据框转换为如下形式:

其中A_id是唯一的,并根据值和其他列的存在/不存在(按出现顺序)形成新列?我曾尝试使用透视,但我注意到它更多地用于数字数据,而不是分类数据。

lfapxunr

lfapxunr1#

也许你应该试试pivot

data['Rowid'] = data.groupby('Name_id').cumcount()+1
d = data.pivot(index='Name_id', columns='Rowid',values = ['Month_id','Colour_id'])
d.reset_index(inplace=True)
d.columns = ['Name_id','Month_id1', 'Colour_id1', 'Month_id2', 'Colour_id2']

其给出了

Name_id Month_id1 Colour_id1 Month_id2 Colour_id2
0    Deep     March       July    Purple     Orange
1    John  December      April       Red      Black
2   Julia       May        NaN     Green        NaN
3   Sandy       May        NaN    Yellow        NaN

相关问题