aws emr上的s3生命周期错误使用Pypark读取aws红移

ckocjqey  于 2021-07-12  发布在  Spark
关注(0)|答案(0)|浏览(223)

我正在尝试使用pyspark将红移表读入emr集群。我目前正在shell上使用pyspark运行我的代码,但最终我想制作一个脚本,我可以使用pyspark提交 spark-submit . 我使用4个jar文件使pyspark能够连接并从redshift读取数据。
我开始使用pyspark: pyspark --jars minimal-json-0.9.5.jar,RedshiftJDBC4-no-awssdk-1.2.41.1065.jar,spark-avro_2.11-3.0.0.jar,spark-redshift_2.10-2.0.1.jar 然后我运行以下代码:

key = "<key>"
secret = "<secret>"

redshift_url = "jdbc:redshift://<cluster>:<port>/<dbname>?user=<username>&password=<password>"
redshift_query = "select * from test"
redshift_temp_s3 = "s3a://{}:{}@<bucket-name>/".format(key, secret)

data = spark.read.format("com.databricks.spark.redshift")
       .option("url", redshift_url)
       .option("query", redshift_query)
       .option("tempdir", redshift_temp_s3)
       .option("forward_spark_s3_credentials", "true")
       .load()

错误堆栈跟踪:

WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration
com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket is not valid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidBucketName; Request ID: FS6MDX8P2MBG5T0G; S3 Extended Request ID: qH1q9y1C2EWIozr3WH2Qt7ujoBCpwLuJW6W77afE2SKrDiLOnKvhGvPC8mSWxDKmR6Dx0AlyoB4=; Proxy: null), S3 Extended Request ID: qH1q9y1C2EWIozr3WH2Qt7ujoBCpwLuJW6W77afE2SKrDiLOnKvhGvPC8mSWxDKmR6Dx0AlyoB4=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1828)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1412)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1374)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)

然后等待几秒钟,然后显示正确的输出。我还看到了在s3 bucket中创建的文件夹。我没有启用bucket版本控制,但它确实创建了一个生命周期。我不明白为什么它先显示错误,然后再显示正确的输出。
任何帮助都将不胜感激。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题