hadoop 具有magic s3 committer的Spark作业在处理输出路径中_magic目录内的.pendingset文件时失败

jhdbpxl9  于 10个月前  发布在  Hadoop
关注(0)|答案(1)|浏览(102)

I am trying to use s3 magic committer in my spark job to write data on s3 bucket but it eventually failed while committing the file at the destination below is the error
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received. (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; Request ID: 1689881089086345; S3 Extended Request ID: null; Proxy: null), S3 Extended Request ID: null at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403) at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:421) at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6531) at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1861) at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1821) at org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:2432) at org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$putObject$6(WriteOperationHelper.java:517) at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:115) ... 15 more
I tried to commit data using s3 magic committer

ekqde3dh

ekqde3dh1#

上传的计算MD5校验和(作为头附加到.pendingset JSON列表的PUT,该列表包含要为该任务提交的所有文件)与AWS获得的不匹配。
我认为这是某种形式的网络问题,或HTTPS相关。
现在,请求ID是一个简单的数字

Request ID: 1689881089086345; S3 Extended Request ID: null;

字符串
这不是AWS S3请求ID,它始终是UUID。这意味着你正在使用一些第三方商店。
与s3商店的作者交谈。如果这是可复制的,那么知道它是谁的商店会很有趣,即使s3a代码库的一般观点是 * 我们不会阻止第三方实现,但希望他们解决自己的兼容性问题 *
另见:https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/troubleshooting_s3a.html#SdkClientException_Unable_to_verify_integrity_of_data_upload

相关问题