python-3.x 如何在使用pandas-profiling时更改变量类型？

sqougxex 于 4个月前发布在 Python

关注(0)|答案(2)|浏览(88)

用于重现问题，笔记本，数据，输出：github link

我的数据集中有Contract变量/列，看起来像这样，所有看起来像数字，但它们实际上是分类的。
x1c 0d1x的数据
当用pandas读取时，信息显示它被读取为int。由于合同变量是一个类别（来自我收到的元数据），所以我手动更改了变量类型，如下所示

df['Contract'] = df['Contract'].astype('categorical')
df.dtypes # shows modified dtype now

字符串
然后我尝试从pandas_profiling获取报告。生成的报告显示contact被解释为真实的数字，即使我将类型从int更改为str/category。

# Tried both, but resulted in same.
ProfileReport(df)
df.profile_report()

型

的
你能解释正确的方法来解释数据与pandas_profiling？即，改变contract变量为categorical类型。

python-3.x

来源：https://stackoverflow.com/questions/65805316/how-to-change-variable-type-when-working-with-pandas-profiling

2条答案

按热度按时间

9w11ddsr1#

在很长一段时间后发布这个问题，raising issue并在pandas-profiling GitHub页面上为此创建pull request，我几乎忘记了这个问题。我感谢IampShadesDrifter提醒我通过回答来关闭这个问题。
实际上pandas-profiling的这种行为是预期的。pandas-profiling试图推断最适合列的数据类型。这就是它以前的编写方式。由于没有解决方案。它驱使我在GitHub上创建了我的第一个pull request。
现在有了ProfileReport/profile_report中新添加的参数infer_dtypes，我们可以显式地要求pandas-profiling不推断任何数据类型，而是使用来自pandas（df.dtypes）的数据类型。

# for the df in the question,

df['Contract'] = df['Contract'].astype('categorical')

# `Contract` dtype now will be used as `categorical` as type-casted above. 
# And `pandas-profiling` does not infer dtype on its own, rather uses dtypes as understood by pandas
# for this we have to set `infer_dtypes=False`
ProfileReport(df, infer_dtypes=False) # or
df.profile_report(infer_dtypes=False)

字符串
如果你发现了什么值得一提的东西，请随时为这个答案做出贡献。

赞(0）回复(0）举报 4个月前

8oomwypt2#

另一种方法是重写列类型

prof = ProfileReport(
    df,
    config_file="config.yaml",
    type_schema={
        "column_1": "categorical",
        "column_2": "categorical",
    }
)
prof.to_file("profile.html")

字符串

赞(0）回复(0）举报 4个月前

我来回答

python-3.x 如何在使用pandas-profiling时更改变量类型？

2条答案

相关问题

热门标签

最新问答