我有一个Dataframe df
有四列 id
, ts
, lat
以及 lon
. 如果我跑了 df.schema()
在调试模式下,我得到
0 = {StructField@13126} "StructField(id,LongType,true)"
name = "id"
dataType = {LongType$@12993} "LongType"
nullable = true
metadata = {Metadata@13065} "{"encoding":"UTF-8"}"
1 = {StructField@13127} "StructField(ts,LongType,true)"
name = "timestamp"
dataType = {LongType$@12993} "LongType"
nullable = true
metadata = {Metadata@13069} "{"encoding":"UTF-8"}"
2 = {StructField@13128} "StructField(lat,DoubleType,true)"
name = "position_lat"
dataType = {DoubleType$@13034} "DoubleType"
nullable = true
metadata = {Metadata@13073} "{"encoding":"UTF-8"}"
3 = {StructField@13129} "StructField(lon,DoubleType,true)"
name = "position_lon"
dataType = {DoubleType$@13034} "DoubleType"
nullable = true
metadata = {Metadata@13076} "{"encoding":"UTF-8"}"
现在,我想去掉所有的元数据。 "{"encoding":"ZSTD"}"
应该由替换 ""
对于每列。请注意,我的实际表有许多列,因此解决方案需要有点通用性。提前谢谢!
1条答案
按热度按时间rseugnpd1#
您可以使用encode(“xx”,“ignore”)
例子: