我需要清理和转置一个来自excel的植物学数据的框架。这里是一个CSV摘录:
Transect Espèce Exigence 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
1 Acer pseudoplatanus Sylvatique 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 Agrimonia eupatoria Prairiale 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0
1 Agrostis stolonifera Sylvatique 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 Anagallis arvensis Pionnière 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 Apicaceae sp. 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 Cirsium vulgare Rudérale 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 Clematis vitalba Sylvatique calcicole 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
1 Odontites vernus Hémiparasite 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 Ombellifère sp. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 Sanguisorba minor Pelouse calcicole 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
2 Avenula pubescens Prairiale calcicole 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Circaea lutetiana Sylvatique 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Cirsium arvense Rudérale 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
2 Lysimachia arvensis Pionnière 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Trifolium repens Prairiale 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Trifolium sp. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 Ulex europaeus Landicole 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0
3 Primula veris Prairiale 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Prunella vulgaris Prairiale 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
3 Prunus avium Sylvatique 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Prunus mahaleb Sylvatique calcicole 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
3 Prunus spinosa Sylvatique 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0
3 Rubus sp Rudérale 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1
3 Senecio vulgaris Rudérale 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Sinapis arvensis Pionnière calcicole 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Trifolium sp. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 Ulex europaeus Landicole 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
字符串
其思想是为每个唯一的“Transect”值创建一个嵌套框,然后使用“Buckence”对每个列的数据求和,然后转置嵌套框。此代码必须可在其他嵌套框上重用,其中“Buckence”列具有另一个名称,并且在它和“Transect”列之间存在更多列。
我做了一个for循环,但我想创建一个更干净的代码。我试图理解如何使用lambda函数来做到这一点和/或map/reduce/filter,但我有点迷路。这里是我到目前为止的代码:
def stat_pl(df,col):
# spliting df in multiple dataframes on Transect unique values
dfs = dict(tuple(df.groupby('Transect')))
#Empty dictionnary for resulting dataframes
dfr = {}
n = df.Transect.max()
for i in range(n):
# selecting 1 dataframe
ext=dfs[i+1]
# dropping nan and useless columns
ext=ext.dropna(axis=1)
col_list=list()
col_list.append(col)
col_list.extend(list(ext.columns[ext.columns.get_loc(1):len(ext.columns) - 1].values))
ext=ext[col_list]
ext=ext.groupby(col).sum()
ext.index.name = None
ext=ext.transpose()
ext.rename(columns = {0:'Sans exigence'}, inplace = True)
ext.to_csv('t'+str(i+1)+'.csv')
dfr[i] = ext
return dfr
型
2条答案
按热度按时间w3nuxt5m1#
我只使用
pivot_table
:字符串
另一种变体(* 可能更有效 *):
型
输出量:
的字符串
使用的输入:
型
eivnm1vs2#
非常感谢!我修改了一点你的代码(当我执行它时,nan值没有正确删除)
字符串