我有一个1 m行的df,其中一列总是5000个字符,A-Z 0 -9
我将长列解析为972列,
def parse_long_string(df):
df['a001'] = df['long_string'].str[0:2]
df['a002'] = df['long_string'].str[2:4]
df['a003'] = df['long_string'].str[4:13]
df['a004'] = df['long_string'].str[13:22]
df['a005'] = df['long_string'].str[22:31]
df['a006'] = df['long_string'].str[31:40]
....
df['a972'] = df['long_string'].str[4994:]
return(df)
字符串
当我调用函数时,我得到以下警告:PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling
frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
阅读PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert
many times, which has poor performance,当创建> 100列并且没有指定新列的数据类型,但每列都自动为字符串时,就会出现这个问题。
有没有别的办法warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)
个
1条答案
按热度按时间l0oc07j21#
我不知道你是怎么得到这样一个配置的,但是是的,我可以在类似的框架/代码上触发
PerformanceWarning
。所以,这里有一个可能的解决方案来摆脱警告,使用concat
:字符串
输出量:
型
使用的输入:
型