Сбой задачи моделирования темы Pyspark;не может интерпретировать журнал ошибок - PullRequest
0 голосов
/ 04 ноября 2019

Несколько строк кода ниже. Я бы добавил больше, но подозреваю, что ошибка связана с моей средой, а не с кодом. После этого урока довольно много строк за строкой, за исключением того, что я использую разные данные и другую версию Spark.

def topic_render(topic, vocabArray):
    terms = topic[0]
    result = []
    for i in range(0, 5):
        term = vocabArray[terms[i]]
        result.append(term)
    return result

lda_model = LDA.train(result_tfidf[['index','features']]
            .rdd.mapValues(Vectors.fromML)
            .map(list), k=10, maxIterations=100)
topicIndices = spark.sparkContext.parallelize(lda_model.describeTopics(maxTermsPerTopic = 5))
#The above line passes
topics_final = topicIndices.map(lambda topic: topic_render(topic, vocabArray)).collect()
#Crashes on this line; error log incomprehensible

Ниже приведены несколько строк вывода журнала (это действительнодолго и в основном просто повторяет этот раздел). Мне очень трудно понять, что происходит, - я не думаю, что мне нужен двоичный файл winutils в двоичном пути hadoop или библиотека native-hadoop, потому что я вижу эти ошибки каждый раз, когда делаю что-то в Spark, и это никогда не вызывалопроблемы раньше.

19/11/03 16:21:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/03 16:21:14 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/03 16:21:21 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
[Stage 0:>                                                         (0 + 4) / 56]19/11/03 16:21:32 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
        at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
        at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/03 16:21:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
[Stage 0:>                                                         (0 + 4) / 56]19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
[Stage 0:>                                                         (0 + 4) / 56]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...