我正在尝试从junit测试中的如下字符串创建一个数据集。
SparkSession sparkSession = SparkSession.builder().appName("Job Test").master("local[*]")
.getOrCreate();
String some1_json = readFileAsString("some1.json");
String some2_json = readFileAsString("some2.json");
String id = "some_id";
List<String[]> rowStrs = new ArrayList<>();
rowStrs.add(new String[] {some_id, some1_json, some2_json});
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkSession.sparkContext());
JavaRDD<Row> rowRDD = javaSparkContext.parallelize(rowStrs).map(RowFactory::create);
StructType schema = new StructType(new StructField[]{
DataTypes.createStructField("id", DataTypes.StringType, false),
DataTypes.createStructField("some1_json", DataTypes.StringType, false),
DataTypes.createStructField("some2_json", DataTypes.StringType, false)});
Dataset<Row> datasetUnderTest = sparkSession.sqlContext().createDataFrame(rowRDD, schema);
datasetUnderTest.show();
但我看到下面这个错误
java.lang.ExceptionInInitializerError
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:103)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.catalog$lzycompute(BaseSessionStateBuilder.scala:133)
...
....
Caused by: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2284)
...
...
我错过了什么?我的主要方法很好,但这个测试失败了。看起来有些东西没有从类路径正确读取。
1条答案
按热度按时间pexxcrt21#
通过从所有与spark相关的依赖项中排除下面的依赖项来修复它