使用hadoop将数据卸载到aws s3的apache spark错误

wtzytmuj 于 2021-05-29 发布在 Hadoop

关注(0)|答案(1)|浏览(266)

我在用 Apache Spark 并尝试将数据卸载到 AWS S3 处理完之后。类似于： data.write().parquet("s3a://" + bucketName + "/" + location); 配置似乎不错：

String region = System.getenv("AWS_REGION");
        String accessKeyId = System.getenv("AWS_ACCESS_KEY_ID");
        String secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY");

        spark.sparkContext().hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsRegion", region);
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsAccessKeyId", accessKeyId);
        spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", secretAccessKey);
``` `%HADOOP_HOME%` 与spark（v2.6.5）使用的版本完全相同，并添加到路径中：

C:>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
key manage keys via the KeyProvider
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME