python-3.x 使用sklearn恢复StandardScaler().fit_transform()的要素名称

ckocjqey 于 4个月前发布在 Python

关注(0)|答案(2)|浏览(62)

编辑自a tutorial in Kaggle，我尝试运行下面的代码和数据（available to download from here）：
代码：

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt  # for plotting facilities
from datetime import datetime, date
from sklearn.model_selection import TimeSeriesSplit, GridSearchCV
import xgboost as xgb
from sklearn.metrics import mean_squared_error, mean_absolute_error
import math
from sklearn.preprocessing import StandardScaler

df = pd.read_csv("./data/Aquifer_Petrignano.csv")

df['Date'] = pd.to_datetime(df.Date, format = '%d/%m/%Y')
df = df[df.Rainfall_Bastia_Umbra.notna()].reset_index(drop=True)

df = df.interpolate(method ='ffill')
df = df[['Date', 'Rainfall_Bastia_Umbra', 'Depth_to_Groundwater_P24', 'Depth_to_Groundwater_P25', 'Temperature_Bastia_Umbra', 'Temperature_Petrignano', 'Volume_C10_Petrignano', 'Hydrometry_Fiume_Chiascio_Petrignano']].resample('7D', on='Date').mean().reset_index(drop=False)

X = df.drop(['Depth_to_Groundwater_P24','Depth_to_Groundwater_P25','Date'], axis=1)
y1 = df.Depth_to_Groundwater_P24
y2 = df.Depth_to_Groundwater_P25

scaler = StandardScaler()
X = scaler.fit_transform(X)

model = xgb.XGBRegressor()
param_search = {'max_depth': range(1, 2, 2),
                'min_child_weight': range(1, 2, 2),
                'n_estimators' : [1000],
                'learning_rate' : [0.1]}

tscv = TimeSeriesSplit(n_splits=2)
gsearch = GridSearchCV(estimator=model, cv=tscv,
                        param_grid=param_search)
gsearch.fit(X, y1)

xgb_grid = xgb.XGBRegressor(**gsearch.best_params_)
xgb_grid.fit(X, y1)

ax = xgb.plot_importance(xgb_grid)
ax.figure.tight_layout()
ax.figure.savefig('test.png')

y_val = y1[-80:]
X_val = X[-80:]

y_pred = xgb_grid.predict(X_val)
print(mean_absolute_error(y_val, y_pred))
print(math.sqrt(mean_squared_error(y_val, y_pred)))

字符串
我绘制了一个特征重要性图，其原始特征名称被隐藏：

的数据
如果我注解掉这两行：

scaler = StandardScaler()
X = scaler.fit_transform(X)

型
我得到输出：

的
如何将scaler.fit_transform()用于X并获得具有原始功能名称的功能重要性图？

python-3.x

来源：https://stackoverflow.com/questions/71509883/recovering-features-names-of-standardscaler-fit-transform-with-sklearn

2条答案

按热度按时间

wa7juj8i1#

这背后的原因是，StandardScaler返回numpy.ndarray的特征值（与pandas.DataFrame.values相同的形状，但未归一化），您需要将其转换回具有相同列名的pandas.DataFrame。
下面是需要更改的代码部分。

scaler = StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

字符串

赞(0）回复(0）举报 4个月前

rryofs0p2#

您现在可以（从版本1.2.2开始）使用set_output方法（参见here详细信息）

scaler = StandardScaler().set_output(transform="pandas")

字符串

赞(0）回复(0）举报 4个月前

我来回答

python-3.x 使用sklearn恢复StandardScaler().fit_transform()的要素名称

2条答案

相关问题

热门标签

最新问答