我有一个要求,json文件的某些字段必须是struct类型,这样我的转换才能工作,否则我会得到一个错误- Can't extract value from <field>; need struct type but got string
spark的隐式模式在这里没有用处,因为在整个json文件中,相关字段的值可以为null,因此spark假设它是字符串类型。我不想在这里定义自己的模式,因为这里有大量的字段(>1000),并且只有少数字段需要强制输入类型。
我试图在读取文件后强制转换字段,但出现了一个错误 due to data type mismatch: cannot cast string to struct<>
编辑:您可以参考这个在pyspark udf中使用structtype列的例子,以了解有关我正在处理的转换的更多详细信息。
一个非常高级的模式,
- root:
- a_time:
- time_to_resolution_remainingTime
....
- b_time:
- time_to_resolution_remainingTime
....
在我的用例中 a_time
字段完全为空,导致Spark将其指定为字符串类型。
...
{
"name":"time_to_resolution",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"_links",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"self",
"nullable":true,
"type":"string"
}
],
"type":"struct"
}
},
{
"metadata":{
},
"name":"completedCycles",
"nullable":true,
"type":{
"containsNull":true,
"elementType":{
"fields":[
{
"metadata":{
},
"name":"breached",
"nullable":true,
"type":"boolean"
},
{
"metadata":{
},
"name":"elapsedTime",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"friendly",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"millis",
"nullable":true,
"type":"long"
}
],
"type":"struct"
}
},
{
"metadata":{
},
"name":"goalDuration",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"friendly",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"millis",
"nullable":true,
"type":"long"
}
],
"type":"struct"
}
},
{
"metadata":{
},
"name":"remainingTime",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"friendly",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"millis",
"nullable":true,
"type":"long"
}
],
"type":"struct"
}
},
{
"metadata":{
},
"name":"startTime",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"epochMillis",
"nullable":true,
"type":"long"
},
{
"metadata":{
},
"name":"friendly",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"iso8601",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"jira",
"nullable":true,
"type":"string"
}
],
"type":"struct"
}
},
{
"metadata":{
},
"name":"stopTime",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"epochMillis",
"nullable":true,
"type":"long"
},
{
"metadata":{
},
"name":"friendly",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"iso8601",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"jira",
"nullable":true,
"type":"string"
}
],
"type":"struct"
}
}
],
"type":"struct"
},
"type":"array"
}
}
另一个具有相同结构但由于所有空值而被指定为字符串的字段。
{
"metadata":{
},
"name":"time_to_resolution_b",
"nullable":true,
"type":"string"
}
暂无答案!
目前还没有任何答案,快来回答吧!