如何修复athena中多个列上的配置单元\u光标\u错误

nwsw7zdq  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(200)

我试图在aws雅典娜中执行以下select语句:

SELECT
    col_1,
    col_2
FROM "my_database"."my_table"
WHERE
        partition_1='20171130'
    AND
        partition_2='Y'
LIMIT 10

我有个错误:

Your query has the following error(s):

HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 in file s3://my-s3-path/my-table/partition_1=20171130/partition_2=Y/part-1111-11111111-1111-1111-1111-111111111111.snappy.parquet

This query ran against the "my_database" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 1111111-1111-1111-1111-111111111111.

但当我只删除一列它的工作!e、 g.选择一列作品:
SELECT col_1 FROM "my_database"."my_table" WHERE partition_1='20171130' AND partition_2='Y' LIMIT 10 SELECT col_2 FROM "my_database"."my_table" WHERE partition_1='20171130' AND partition_2='Y' LIMIT 10 我还发现,我可以向select语句添加多个列,但仅使用一些组合就失败了。但为什么呢?表定义为:

{
  "Table": {
    "StorageDescriptor": {
      "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
      "SortColumns": [],
      "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
      "SerdeInfo": {
        "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
        "Parameters": {
          "serialization.format": "1"
        }
      },
      "BucketColumns": [],
      "Parameters": {
        "CrawlerSchemaDeserializerVersion": "1.0",
        "compressionType": "none",
        "UPDATED_BY_CRAWLER": "myCrawler",
        "classification": "parquet",
        "recordCount": "40190451",
        "typeOfData": "file",
        "CrawlerSchemaSerializerVersion": "1.0",
        "objectCount": "18",
        "averageRecordSize": "35",
        "exclusions": "[\"s3://my-s3-path/my_table/_**\"]",
        "sizeKey": "1078884110"
      },
      "Location": "s3://my-s3-path/my-table/",
      "NumberOfBuckets": -1,
      "StoredAsSubDirectories": false,
      "Columns": [
        {
          "Type": "smallint",
          "Name": "col_1"
        },
        {
          "Type": "decimal(18,6)",
          "Name": "col_2"
        }
      ],
      "Compressed": false
    },
    "UpdateTime": 1515503623.0,
    "PartitionKeys": [
      {
        "Type": "string",
        "Name": "partition_1"
      },
      {
        "Type": "string",
        "Name": "partition_2"
      }
    ],
    "Name": "my_table",
    "Parameters": {
      "CrawlerSchemaDeserializerVersion": "1.0",
      "compressionType": "none",
      "UPDATED_BY_CRAWLER": "myCrawler",
      "classification": "parquet",
      "recordCount": "40190451",
      "typeOfData": "file",
      "CrawlerSchemaSerializerVersion": "1.0",
      "objectCount": "18",
      "averageRecordSize": "35",
      "exclusions": "[\"s3://my-s3-path/my_table/_**\"]",
      "sizeKey": "1078884110"
    },
    "LastAccessTime": 1515503623.0,
    "CreatedBy": "arn:aws:sts::111111111111:assumed-role/MyRole/myCrawler",
    "TableType": "EXTERNAL_TABLE",
    "Owner": "owner",
    "CreateTime": 1515503623.0,
    "Retention": 0
  }
}
e37o9pze

e37o9pze1#

通常,此错误是由于模式或文件中的数据不匹配造成的。对于parquet,请确保文件的模式与配置单元模式完全匹配。甚至列的顺序有时也很重要,例如structs。
此外,如果输出文件行不包含列的任何数据,则parquet编写器有时会删除稀疏字段。这在分区的情况下很常见。

相关问题