IllegalArgumentException при получении Roc для логистической регрессии в Spark - PullRequest
0 голосов
/ 30 апреля 2018

Я создал набор данных:

trainingData.show(10);

имеет значение:

+------------------+-----+
|          features|label|
+------------------+-----+
|[2.0,2.0,1.0,24.0]|    1|
|[2.0,2.0,2.0,26.0]|    1|
|[2.0,2.0,2.0,34.0]|    0|
|[2.0,2.0,1.0,37.0]|    0|
|[1.0,2.0,1.0,57.0]|    0|
|[1.0,1.0,2.0,37.0]|    0|
|[1.0,1.0,2.0,29.0]|    0|
|[2.0,2.0,2.0,23.0]|    0|
|[2.0,3.0,1.0,28.0]|    0|
|[1.0,3.0,2.0,35.0]|    0|
+------------------+-----+
only showing top 10 rows

Я печатаю коэффициенты:

 LogisticRegression lr = new LogisticRegression()
                .setMaxIter(10)
                .setRegParam(0.3)
                .setElasticNetParam(0.8);

        // Fit the model
        LogisticRegressionModel lrModel = lr.fit(trainingData);

        // Print the coefficients and intercept for logistic regression
        System.out.println("Coefficients: " + lrModel.coefficients() + " Intercept: " + lrModel.intercept());

Это было напечатано как:

Coefficients: [0.0,0.0,0.0,0.0] Intercept: -1.258687003849865
0.5284225707357179

Первая проблема - все коэффициенты равны нулю. С другой стороны, я получаю сообщение об ошибке:

Caused by: java.lang.IllegalArgumentException: null
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source) ~[xbean-asm5-shaded-4.4.jar:4.4]
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source) ~[xbean-asm5-shaded-4.4.jar:4.4]
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source) ~[xbean-asm5-shaded-4.4.jar:4.4]
    at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) ~[scala-library-2.11.8.jar:na]
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103) ~[scala-library-2.11.8.jar:na]
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103) ~[scala-library-2.11.8.jar:na]
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) ~[scala-library-2.11.8.jar:na]
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) ~[scala-library-2.11.8.jar:na]
    at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103) ~[scala-library-2.11.8.jar:na]
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) ~[scala-library-2.11.8.jar:na]
    at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.xbean.asm5.ClassReader.a(Unknown Source) ~[xbean-asm5-shaded-4.4.jar:4.4]
    at org.apache.xbean.asm5.ClassReader.b(Unknown Source) ~[xbean-asm5-shaded-4.4.jar:4.4]
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) ~[xbean-asm5-shaded-4.4.jar:4.4]
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source) ~[xbean-asm5-shaded-4.4.jar:4.4]
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at scala.collection.immutable.List.foreach(List.scala:381) ~[scala-library-2.11.8.jar:na]
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2292) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2066) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2092) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:939) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.rdd.RDD.collect(RDD.scala:938) ~[spark-core_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$4$lzycompute(BinaryClassificationMetrics.scala:192) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$4(BinaryClassificationMetrics.scala:146) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions$lzycompute(BinaryClassificationMetrics.scala:148) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions(BinaryClassificationMetrics.scala:148) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.createCurve(BinaryClassificationMetrics.scala:223) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.roc(BinaryClassificationMetrics.scala:86) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.ml.classification.BinaryLogisticRegressionSummary$class.roc(LogisticRegression.scala:1538) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.ml.classification.BinaryLogisticRegressionSummaryImpl.roc$lzycompute(LogisticRegression.scala:1683) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at org.apache.spark.ml.classification.BinaryLogisticRegressionSummaryImpl.roc(LogisticRegression.scala:1683) ~[spark-mllib_2.11-2.3.0.jar:2.3.0]
    at com.colendi.kahin.Correlator.run(Correlator.java:93) ~[classes/:na]
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
    at java.base/java.lang.reflect.Method.invoke(Method.java:564) ~[na:na]
    at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(InitDestroyAnnotationBeanPostProcessor.java:365) ~[spring-beans-5.0.5.RELEASE.jar:5.0.5.RELEASE]
    at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeInitMethods(InitDestroyAnnotationBeanPostProcessor.java:308) ~[spring-beans-5.0.5.RELEASE.jar:5.0.5.RELEASE]
    at org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeInitialization(InitDestroyAnnotationBeanPostProcessor.java:135) ~[spring-beans-5.0.5.RELEASE.jar:5.0.5.RELEASE]
    ... 17 common frames omitted

Когда я пытаюсь получить roc();

    // Extract the summary from the returned LogisticRegressionModel instance trained in the earlier example
    BinaryLogisticRegressionTrainingSummary trainingSummary = lrModel.binarySummary();

    // Obtain the loss per iteration.
    double[] objectiveHistory = trainingSummary.objectiveHistory();
    for (double lossPerIteration : objectiveHistory) {
        System.out.println(lossPerIteration);
    }

    // Obtain the receiver-operating characteristic as a dataframe and areaUnderROC.
    Dataset<Row> roc = trainingSummary.roc();

Как я могу это решить?

...