scala—为什么子模块中需要spark hive依赖项,甚至在公共模块中使用

xam8gpfp  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(207)

我正在为spark和hive建立一个多模块maven项目。
结构将是

MainProject
    -Common
    -ProjectSpecifcTransformations

我在common和projectspecificttransformations模块的测试用例中用hive示例化sparksession。

val spark = SparkSession.builder().master("local[*]").appName("testing").config("spark.sql.catalogImplementation", "hive").enableHiveSupport().getOrCreate()

在我的父pom中,我通过dependencymanagement添加了依赖项

<properties>
        <jdk.version>1.8</jdk.version>
        <scala.version>2.11.8</scala.version>
        <scala.core.version>2.11</scala.core.version>
        <spark.core.version>2.2.0</spark.core.version>
</properties> 
<dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>${scala.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_${scala.core.version}</artifactId>
                <version>${spark.core.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-hive_${scala.core.version}</artifactId>
                <version>${spark.core.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_${scala.core.version}</artifactId>
                <version>${spark.core.version}</version>
            </dependency>
            <dependency>
                <groupId>org.scalatest</groupId>
                <artifactId>scalatest-funsuite_${scala.core.version}</artifactId>
                <version>3.0.0-SNAP13</version>
            </dependency>
            <dependency>
                <groupId>org.scalatest</groupId>
                <artifactId>scalatest-funsuite_sjs0.6_${scala.core.version}</artifactId>
                <version>3.0.0-SNAP13</version>
            </dependency>
            <dependency>
                <groupId>org.scalatest</groupId>
                <artifactId>scalatest_${scala.core.version}</artifactId>
                <version>3.0.0</version>
                <scope>test</scope>
            </dependency>
            <dependency>
                <groupId>org.scalatest</groupId>
                <artifactId>scalatest-junit_${scala.core.version}</artifactId>
                <version>3.0.0-SNAP13</version>
            </dependency>
        </dependencies>
<dependencyManagement>

在我的公共项目中,我添加了spark的依赖项,如下所示。

<dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.core.version}</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.core.version}</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.core.version}</artifactId>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest-funsuite_${scala.core.version}</artifactId>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest-funsuite_sjs0.6_${scala.core.version}</artifactId>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_${scala.core.version}</artifactId>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest-junit_${scala.core.version}</artifactId>
        </dependency>
    </dependencies>

当通过maven(clean install)构建公共jar时,测试用例显示它工作得非常好,jar也在正确构建。
然后我正在构建下一个模块,即projectspecifictransformations,在这里我还使用公共模块依赖关系。
所以这里我的pom看起来像下面。

<dependencies>
        <dependency>
            <groupId>projectGroupID</groupId>
            <artifactId>ProjectArtifactId-common</artifactId>
            <version>1.0.0</version>
        </dependency>
</dependencies>

当我在intellij中运行测试用例时,它工作得很好。
但是当我试图通过maven构建它时,我得到了下面这个错误

Cause: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1053)
  at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
  at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:129)
  at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:126)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:938)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:938)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  ...
  Cause: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
  at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:193)
  at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
  at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
  at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1050)
  ...
  Cause: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
  ...
  Cause: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
  at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
  at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
  at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  ...
  Cause: java.lang.reflect.InvocationTargetException:
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
  at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
  at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
  at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
  ...
  Cause: javax.jdo.JDOFatalInternalException: Unexpected exception caught.
  at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193)
  at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
  at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
  at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
  at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
  at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
  at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
  at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
  at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
  at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
  ...
  Cause: java.lang.reflect.InvocationTargetException:
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
  at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
  at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
  at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
  ...
  Cause: org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
  at org.datanucleus.NucleusContext.<init>(NucleusContext.java:283)
  at org.datanucleus.NucleusContext.<init>(NucleusContext.java:247)
  at org.datanucleus.NucleusContext.<init>(NucleusContext.java:225)
  at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.<init>(JDOPersistenceManagerFactory.java:416)
  at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:301)
  at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)

但是如果我在我的转换模块的pom中使用这种依赖关系,就像下面那样,那么错误就会消失。

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.core.version}</artifactId>
        </dependency>

问题:
由于没有spark-hive依赖关系,我的转换项目可以很好地编译,而且我在公共模块中已经有了依赖关系,为什么我还要在依赖模块中使用它呢。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题