com.google.gson.jsonsyntaxception:java.lang.illegalstateexception:在交叉验证\u度量\u摘要处需要begin\u对象

pbwdgjma  于 2021-05-29  发布在  Spark
关注(0)|答案(0)|浏览(193)

我用的是 H2ODRF 以及 H2OGridSearch 用随机离散网格搜索超参数优化建立随机森林管道模型。但是,当我将nfolds设置为任何大于1的数字并调用 fit() ,我得到一个错误。我的代码如下所示:

val drf =  new H2ODRF()
    .setFeaturesCols(featuresCols)
    .setLabelCol(labelCol)
    .setColumnsToCategorical(categoricalCols)
    .setSplitRatio(splitRatio)
    .setNfolds(4)

val nps = Map(
        "ntrees" -> Array(10, 50).map(_.asInstanceOf[AnyRef]))

val search = new H2OGridSearch()
    .setHyperParameters(hyperParams)
    .setAlgo(drf)

val model = search.fit(data) // data is a Spark DataFrame
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 608096 path $.cross_validation_metrics_summary[0].data[0][0]
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:224)
  at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
  at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
  at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
  at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:129)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:220)
  at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:41)
  at com.google.gson.internal.bind.ArrayTypeAdapter.read(ArrayTypeAdapter.java:72)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.read(ReflectiveTypeAdapterFactory.java:129)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:220)
  at com.google.gson.Gson.fromJson(Gson.java:887)
  at com.google.gson.Gson.fromJson(Gson.java:852)
  at com.google.gson.Gson.fromJson(Gson.java:801)
  at ai.h2o.sparkling.backend.utils.RestCommunication$class.ai$h2o$sparkling$backend$utils$RestCommunication$$deserialize(RestCommunication.scala:164)
  at ai.h2o.sparkling.backend.utils.RestCommunication$$anonfun$request$1.apply(RestCommunication.scala:147)
  at ai.h2o.sparkling.backend.utils.RestCommunication$$anonfun$request$1.apply(RestCommunication.scala:145)
  at ai.h2o.sparkling.utils.ScalaUtils$.withResource(ScalaUtils.scala:28)
  at ai.h2o.sparkling.backend.utils.RestCommunication$class.request(RestCommunication.scala:145)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.request(H2OGridSearch.scala:46)
  at ai.h2o.sparkling.backend.utils.RestCommunication$class.query(RestCommunication.scala:54)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.query(H2OGridSearch.scala:46)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.getGridModels(H2OGridSearch.scala:129)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.fit(H2OGridSearch.scala:163)
  at ai.h2o.sparkling.ml.algos.H2OGridSearch.fit(H2OGridSearch.scala:46)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153)
  at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
  at scala.collection.Iterator$class.foreach(Iterator.scala:891)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
  at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
  at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
  at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
  ... 59 elided
Caused by: java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 608096 path $.cross_validation_metrics_summary[0].data[0][0]
  at com.google.gson.stream.JsonReader.beginObject(JsonReader.java:385)
  at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:213)
  ... 90 more

这个错误似乎是由错误引起的 cross_validation_metrics_summary 仅当nfolds大于1时返回的字段。有解决这个问题的办法吗?
编辑:我使用的是前列腺数据和spark版本 2.4.4 ,scala版本 2.11.12 ,并使用以下起泡水版本 ai.h2o:sparkling-water-package_2.11:3.30.0.4-1-2.4 .
编辑:在浏览了闪闪发光的源代码之后,问题似乎开始出现在错误配置的模式中 GridSchemaV99 . 我是否应该更新一个设置/配置来查找不同的模式?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题