python-3.x For循环pandas的替代选择

6ss1mwsb  于 4个月前  发布在  Python
关注(0)|答案(2)|浏览(67)

我需要清理和转置一个来自excel的植物学数据的框架。这里是一个CSV摘录:

Transect    Espèce  Exigence    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37
1   Acer pseudoplatanus     Sylvatique  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                                           
1   Agrimonia eupatoria Prairiale   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   1   1   0                                           
1   Agrostis stolonifera    Sylvatique  0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0                                           
1   Anagallis arvensis  Pionnière   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                                           
1   Apicaceae sp.   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                                           
1   Cirsium vulgare Rudérale    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                                           
1   Clematis vitalba    Sylvatique calcicole    0   1   0   1   0   1   0   0   0   0   0   0   0   0   1   1   1   1   1   1   1   1   1   1   0   0                                           
1   Odontites vernus    Hémiparasite    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0                                           
1   Ombellifère sp. 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                                           
1   Sanguisorba minor   Pelouse calcicole   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0                                           
2   Avenula pubescens   Prairiale calcicole 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                   
2   Circaea lutetiana   Sylvatique  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                   
2   Cirsium arvense     Rudérale    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   0                   
2   Lysimachia arvensis     Pionnière   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                   
2   Trifolium repens    Prairiale   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                   
2   Trifolium sp.   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0                   
2   Ulex europaeus  Landicole   1   1   0   0   0   0   0   0   1   1   1   1   0   1   1   1   1   0   0   0   0   1   1   0   1   0   0   1   0   0   0   0                   
3   Primula veris   Prairiale   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
3   Prunella vulgaris   Prairiale   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   
3   Prunus avium    Sylvatique  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
3   Prunus mahaleb  Sylvatique calcicole    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   0   0   0   0   1   0   0   0   
3   Prunus spinosa  Sylvatique  0   0   0   0   0   1   0   0   0   0   0   0   0   1   0   0   1   0   0   1   0   0   0   0   0   0   0   1   0   1   1   0   1   0   1   0   
3   Rubus sp    Rudérale    0   0   1   1   1   1   1   1   0   1   1   1   1   1   1   1   1   1   0   1   1   1   1   0   1   1   1   1   1   0   1   1   1   1   1   1   
3   Senecio vulgaris    Rudérale    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
3   Sinapis arvensis    Pionnière calcicole 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
3   Trifolium sp.   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
3   Ulex europaeus  Landicole   1   0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   1   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

字符串
其思想是为每个唯一的“Transect”值创建一个嵌套框,然后使用“Buckence”对每个列的数据求和,然后转置嵌套框。此代码必须可在其他嵌套框上重用,其中“Buckence”列具有另一个名称,并且在它和“Transect”列之间存在更多列。
我做了一个for循环,但我想创建一个更干净的代码。我试图理解如何使用lambda函数来做到这一点和/或map/reduce/filter,但我有点迷路。这里是我到目前为止的代码:

def stat_pl(df,col):
    # spliting df in multiple dataframes on Transect unique values
    dfs = dict(tuple(df.groupby('Transect')))
    #Empty dictionnary for resulting dataframes
    dfr = {}
    n = df.Transect.max()
    for i in range(n):
        # selecting 1 dataframe
        ext=dfs[i+1]
        # dropping nan and useless columns
        ext=ext.dropna(axis=1)
        col_list=list()
        col_list.append(col)
        col_list.extend(list(ext.columns[ext.columns.get_loc(1):len(ext.columns) - 1].values))
        ext=ext[col_list]
        ext=ext.groupby(col).sum()
        ext.index.name = None
        ext=ext.transpose()
        ext.rename(columns = {0:'Sans exigence'}, inplace = True)
        ext.to_csv('t'+str(i+1)+'.csv')
        dfr[i] = ext
    return dfr

w3nuxt5m

w3nuxt5m1#

  • 我做了一个for循环,但我想创建一个更干净的代码。我试图理解如何使用lambda函数来做到这一点和/或map/reduce/filter,但我有点迷路了。*

我只使用pivot_table

def stat_pl(df, col):
    return {
        tr: (g.drop(columns="Transect")
             .pivot_table(columns=col, aggfunc="sum")
             .rename_axis(columns=None))
        for tr, g in (df.fillna({col: "Sans exigence"})
                      .drop(columns="Espèce").groupby("Transect"))
    }

字符串
另一种变体(* 可能更有效 *):

def stat_pl(df, col):
    piv = (
        df.fillna({col: "Sans exigence"}).drop(columns="Espèce")
            .pivot_table(index="Transect", columns=col, aggfunc="sum")
            .stack(0)
    )
    
    return {
        tr: piv.loc[tr].rename_axis(columns=None).dropna(how="all", axis=1)
        for tr in df["Transect"].unique()
    }


输出量:

d = stat_pl(df, "Exigence")

for tr, sub_df in d.items():
    print(f"Transect: {tr}"); print(sub_df, end="\n"*2)
    # sub_df.to_csv(f"t{tr}.csv")# uncomment this line if needed
Transect: 1
    Hémiparasite  Pelouse calcicole  ...  Sylvatique  Sylvatique calcicole
1           0.00               0.00  ...        0.00                  0.00
10          0.00               0.00  ...        0.00                  0.00
..           ...                ...  ...         ...                   ...
8           0.00               0.00  ...        0.00                  0.00
9           0.00               0.00  ...        0.00                  0.00

[37 rows x 6 columns]

Transect: 2
    Landicole  Pionnière  ...  Sans exigence  Sylvatique
1        1.00       0.00  ...           0.00        0.00
10       1.00       0.00  ...           0.00        0.00
..        ...        ...  ...            ...         ...
8        0.00       0.00  ...           0.00        0.00
9        1.00       0.00  ...           0.00        0.00

[37 rows x 7 columns]

Transect: 3
    Landicole  Pionnière calcicole  ...  Sylvatique  Sylvatique calcicole
1        1.00                 0.00  ...        0.00                  0.00
10       0.00                 0.00  ...        0.00                  0.00
..        ...                  ...  ...         ...                   ...
8        0.00                 0.00  ...        0.00                  0.00
9        0.00                 0.00  ...        0.00                  0.00

[37 rows x 7 columns]

的字符串
使用的输入:

from io import StringIO
import pandas as pd

tmp = pd.read_csv(StringIO(text_op),
      sep=r"(?<=[a-z.])\s+(?=[01])", engine="python")

def adjust(block, df=tmp):
    return (tmp.iloc[:, block].str.split("\s\s+", expand=True)
            .set_axis(df.columns[block].split(), axis=1))

df = adjust(0).join(adjust(1)).apply(pd.to_numeric, errors="ignore")

eivnm1vs

eivnm1vs2#

非常感谢!我修改了一点你的代码(当我执行它时,nan值没有正确删除)

def stat_pl(df, col):
df=df.fillna(-1)
df.loc[df.Exigence==0, 'Exigence']='Sans exigence'
piv = (
    df.drop(columns="Espèce")
        .pivot_table(index="Transect", columns=col, aggfunc="sum")
        .stack(0)
)
piv = piv.loc[piv.Sylvatique >= 0]
return {
    tr: piv.loc[tr].rename_axis(columns=None).dropna(how="all", axis=1)
    for tr in df["Transect"].unique()
}

d = stat_pl(df, col) for tr, sub_df in d.items():
     print(f"Transect: {tr}"); print(sub_df, end="\n"*2)
     sub_df.to_csv(f"t{tr}.csv")# uncomment this line if needed

字符串

相关问题