将 Dataframe 拆分为多个带

zed5wv10  于 2021-08-25  发布在  Java
关注(0)|答案(1)|浏览(302)

我有一个Pandas数据框。它是

Turnover      Outstanding
10c4d42c-72e5-4228-8760-1525f1efe1d8    2.262946e+05    196863.00
236ac7d7-e971-4599-b22d-d54880014914    3.303953e+06    1351439.03
bc50f973-9218-45bd-a297-111781f1a359    2.541676e+06    465144.46
fac0d84b-dbaa-42fb-8779-f36cfc52d20c    4.301824e+05    30054.26
46fc245a-e0cc-4c39-b0d5-e8490e204ecc    3.781342e+05    397482.40
... ... ...
dacce0ef-c7c4-4a90-b215-5635bbc25355    2.167455e+06    842684.59
6a9ffb14-9507-48bb-9563-828325e4fa6f    5.673843e+05    321210.31
12d763af-1ca1-4ce7-9995-3a4a6e72c19b    2.536324e+05    132747.83
218031da-5342-4df4-8881-1608dd5cdd7a    2.292235e+05    288300.16
7d47d4fe-a789-415f-a0c2-c1f2b9035f67    1.586166e+07    3689727.98

我想将此 Dataframe 细分为如下 Dataframe :

0-5K         5-10K        10-15k             450-500k

0-1M                     40           100         75                    0

1-2M                     70           170         87                    0

2-3M

基本上是基于两个不同的列对 Dataframe 进行装箱。有什么方便的方法吗?

vxf3dgd4

vxf3dgd41#

这里有一个想法:
为每一列定义合适的箱子(我选择的箱子只是为了展示想法):

bins_turnover = pd.IntervalIndex.from_breaks([i * 5_000_000 for i in range(6)], closed='left')
bins_outstanding = pd.IntervalIndex.from_breaks([i* 5_000_000 for i in range(6)], closed='left')

然后 groupby 在各自的 cut 对于每一列, count ,及 unstack 结果是:

df['Count'] = 1
df = df.Count.groupby([pd.cut(df.Turnover, bins_turnover),
                       pd.cut(df.Outstanding, bins_outstanding)]).count().unstack()

结果

Outstanding           [0, 5000000)  ...  [20000000, 25000000)
Turnover                            ...                      
[0, 5000000)                     9  ...                     0
[5000000, 10000000)              0  ...                     0
[10000000, 15000000)             0  ...                     0
[15000000, 20000000)             1  ...                     0
[20000000, 25000000)             0  ...                     0

[5 rows x 5 columns]

样品 df ```
Turnover Outstanding
Index
10c4d42c-72e5-4228-8760-1525f1efe1d8 226294.6 196863.00
236ac7d7-e971-4599-b22d-d54880014914 3303953.0 1351439.03
bc50f973-9218-45bd-a297-111781f1a359 2541676.0 465144.46
fac0d84b-dbaa-42fb-8779-f36cfc52d20c 430182.4 30054.26
46fc245a-e0cc-4c39-b0d5-e8490e204ecc 378134.2 397482.40
dacce0ef-c7c4-4a90-b215-5635bbc25355 2167455.0 842684.59
6a9ffb14-9507-48bb-9563-828325e4fa6f 567384.3 321210.31
12d763af-1ca1-4ce7-9995-3a4a6e72c19b 253632.4 132747.83
218031da-5342-4df4-8881-1608dd5cdd7a 229223.5 288300.16
7d47d4fe-a789-415f-a0c2-c1f2b9035f67 15861660.0 3689727.98

相关问题