serde格式并上传到amazontathena的awss3 bucket(presto，hive)

bq8i3lrv 于 2021-06-24 发布在 Hive

关注(0)|答案(1)|浏览(342)

我正在尝试将一个json文件转换为json-serde格式，使用python将json-serde文件上传到awss3 bucket中，以便amazonathena（presto/hive）可以读取s3 bucket中的文件。
根据aws产品文档，典型的json文件不是有效格式；json文件需要采用json serde格式：https://docs.aws.amazon.com/athena/latest/ug/json-serde.html
在本地，我可以使用以下代码将json文件转换为json serde格式：

import json
with open('xx_original_file.json','r',encoding='utf-8') as json_file:
    data = json.load(json_file)
result = [json.dumps(record) for record in data]
with open('xx_new_file.json', 'w') as obj:
    for i in result:
        obj.write(i+'\n')

在python中有没有一种等效的方法可以让我在s3 bucket中存储一个新的json serde文件？到目前为止，我构建的python脚本不断出现错误：

import json
import os
import boto3

s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = [json.dumps(record) for record in jsonObject]
new_results = []
for i in result:
    new_results.append(i+'\n')
new_key = 'xx_new_file.json'
s3.put_object(Bucket=bucket,Key=new_key,Body=new_results)

错误消息：paramvalidationerror:参数验证失败：参数体的类型无效，值：{json data}类型：<class'list'>，有效类型：<class'bytes'>，<class'bytearray'>，类文件对象

Hive JSON python amazon-s3 amazon-athena

来源：https://stackoverflow.com/questions/63274172/convert-standard-json-file-to-json-serde-format-using-python-upload-to-aws-s3

1条答案

按热度按时间

cxfofazt1#

这是一个简单的修复方法，我需要将列表转换为字符串，然后将其转换为字节。

import json
import boto3
s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = "\n".join([json.dumps(record) for record in jsonObject])
body = result.encode('utf-8')
new_bucket = 'my_bucket_name'
new_key = 'xx_new_file.json'
s3.put_object(Bucket=new_bucket,Key=new_key,Body=body)

赞(0）回复(0）举报 2021-06-24

我来回答

serde格式并上传到amazontathena的awss3 bucket(presto，hive)

1条答案

相关问题

热门标签

最新问答