从pyspark在配置单元中编写结构类型

w8ntj3qf  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(170)

我正在尝试将df写入hive:

df_block_identity.printSchema()

root
 |-- HUB_ID: long (nullable = false)
 |-- ClientId: string (nullable = true)
 |-- publicID: string (nullable = true)
 |-- CreationAppSource: string (nullable = true)
 |-- LastUpdateAppSource: string (nullable = true)
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- Email: string (nullable = true)
 |-- publicID_address: string (nullable = true)
 |-- CreationAppSource_address: string (nullable = true)
 |-- LastUpdateAppSource_address: string (nullable = true)
 |-- AddressNameDesc: string (nullable = true)
 |-- AddressObjective: string (nullable = true)
 |-- AddressQuality: string (nullable = true)
 |-- City: string (nullable = true)
 |-- Country: string (nullable = true)
 |-- ExtraData: string (nullable = true)
 |-- Region: string (nullable = true)
 |-- Street1: string (nullable = true)
 |-- Street2: string (nullable = true)
 |-- Street3: string (nullable = true)
 |-- Street4: string (nullable = true)
 |-- ZipCode: string (nullable = true)
 |-- IsPrimaryAddress: string (nullable = true)
 |-- ExternalAddressID: string (nullable = true)
 |-- publicID_MOBILE: string (nullable = true)
 |-- CreationAppSource_MOBILE: string (nullable = true)
 |-- LastUpdateAppSource_MOBILE: string (nullable = true)
 |-- MOBILE: string (nullable = true)
 |-- publicID_FIXE: string (nullable = true)
 |-- CreationAppSource_FIXE: string (nullable = true)
 |-- LastUpdateAppSource_FIXE: string (nullable = true)
 |-- FIXE: string (nullable = true)
 |-- service: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- publicID_Services: string (nullable = true)
 |    |    |-- CreationAppSource_Services: string (nullable = true)
 |    |    |-- LastUpdateAppSource_Services: string (nullable = true)
 |    |    |-- ServiceTypeId: string (nullable = true)
 |    |    |-- ServiceId: string (nullable = true)
 |    |    |-- ServiceStatus: boolean (nullable = true)
 |    |    |-- ActivationDate: timestamp (nullable = true)
 |    |    |-- DeactivationDate: timestamp (nullable = true)
 |-- publicID_Title: string (nullable = true)
 |-- CreationAppSource_Title: string (nullable = true)
 |-- LastUpdateAppSource_Title: string (nullable = true)
 |-- Title: string (nullable = true)
 |-- publicID_Civility: string (nullable = true)
 |-- CreationAppSource_Civility: string (nullable = true)
 |-- LastUpdateAppSource_Civility: string (nullable = true)
 |-- Civility: string (nullable = true)
 |-- publicID_Gender: string (nullable = true)
 |-- CreationAppSource_Gender: string (nullable = true)
 |-- LastUpdateAppSource_Gender: string (nullable = true)
 |-- Gender: string (nullable = true)
 |-- publicID_MaritalStatus: string (nullable = true)
 |-- CreationAppSource_MaritalStatus: string (nullable = true)
 |-- LastUpdateAppSource_MaritalStatus: string (nullable = true)
 |-- MaritalStatus: string (nullable = true)
 |-- publicID_BirthDate: string (nullable = true)
 |-- CreationAppSource_BirthDate: string (nullable = true)
 |-- LastUpdateAppSource_BirthDate: string (nullable = true)
 |-- BirthDate: date (nullable = true)
 |-- publicID_CSP: string (nullable = true)
 |-- CreationAppSource_CSP: string (nullable = true)
 |-- LastUpdateAppSource_CSP: string (nullable = true)
 |-- CSP: string (nullable = true)
 |-- publicID_NbChildren: string (nullable = true)
 |-- CreationAppSource_NbChildren: string (nullable = true)
 |-- LastUpdateAppSource_NbChildren: string (nullable = true)
 |-- NbChildren: string (nullable = true)
 |-- publicID_PMR: string (nullable = true)
 |-- CreationAppSource_PMR: string (nullable = true)
 |-- LastUpdateAppSource_PMR: string (nullable = true)
 |-- PMR: string (nullable = true)
 |-- publicID_DegreeDisability: string (nullable = true)
 |-- CreationAppSource_DegreeDisability: string (nullable = true)
 |-- LastUpdateAppSource_DegreeDisability: string (nullable = true)
 |-- DegreeDisability: string (nullable = true)
 |-- publicID_CompanyName: string (nullable = true)
 |-- CreationAppSource_CompanyName: string (nullable = true)
 |-- LastUpdateAppSource_CompanyName: string (nullable = true)
 |-- CompanyName: string (nullable = true)
 |-- publicID_LanguageId: string (nullable = true)
 |-- CreationAppSource_LanguageId: string (nullable = true)
 |-- LastUpdateAppSource_LanguageId: string (nullable = true)
 |-- LanguageId: string (nullable = true)
 |-- publicID_NationalityId: string (nullable = true)
 |-- CreationAppSource_NationalityId: string (nullable = true)
 |-- LastUpdateAppSource_NationalityId: string (nullable = true)
 |-- NationalityId: string (nullable = true)

遵循此架构的示例数据:

AHA d4cd8d01-6a4f-446c-838e-ded98c1e8d53    TOTO    TOTO    NULL    .   xxx@gmail.com   NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    [{"publicID_Services":"d4cd8d01-6a4f-446c-838e-ded98c1e8d53","CreationAppSource_Services":"TOTO","LastUpdateAppSource_Services":"TOTO","ServiceTypeId":"OPTINS","ServiceId":"PARTENAIRES","ServiceStatus":true,"ActivationDate":"2015-09-18 00:00:00","DeactivationDate":"9999-12-31 23:59:59.999"},{"publicID_Services":"d4cd8d01-6a4f-446c-838e-ded98c1e8d53","CreationAppSource_Services":"TOTO","LastUpdateAppSource_Services":"TOTO","ServiceTypeId":"OPTINS","ServiceId":"NEWSLETTER","ServiceStatus":true,"ActivationDate":"2015-09-18 00:00:00","DeactivationDate":"9999-12-31 23:59:59.999"}]  NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

我使用命令: df_block_identity.write.saveAsTable('sb_party_hub_dev.golden', mode='overwrite', format="parquet") 此命令完成正常。我可以看到Hive元商店里的table。
但是当你试图从Hive里请求 select * from sb_party_hub_dev.golden ,我收到错误:
java.io.ioexception:org.apache.parquet.io.parquetdecodingexception:无法读取文件中块-1中0处的值adl://home/hive/warehouse/sb_party_hub_dev.db/golden/part-r-00000-e3dcac27-021e-43e8-8687-01ae305d5b5d.snappy.parquet
当我删除字段时 service 是数组类型 select 检索表的内容。
我应该在pyspark代码中更改什么,以便在配置单元中编写表并能够无错误地查询它?
编辑:
我尝试了另一种格式: df_block_identity.write.saveAsTable('sb_party_hub_dev.golden', mode='overwrite', format="orc") 使用这种格式,我可以通过配置单元访问数据。为什么Parquet地板会失效呢?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题