Question

Код всегда работал в ожидании, пока вчера диск кластера не сломался. Команда поддержки ИТ исправила кластер. Но когда я снова запускаю код, он всегда зависает, и процессоры перестают работать.

Через некоторое время выходит ошибка:

# -*- coding: utf-8 -*-
import datetime,random,hashlib,traceback,json,os,sys,logging,time#,memcache
#from func_conf import *
from pyspark import SparkContext, SparkConf, StorageLevel
from pyspark.sql import Row
#from scipy.stats import entropy

from pyspark import SparkContext, SparkConf, StorageLevel
from pyspark.sql import Row
conf=SparkConf().setAppName("2048roject").setMaster("local[*]")\
     .set("spark.driver.maxResultSize", "80g").set("spark.executor.memory", "10g").set("spark.driver.memory", "60g").set("spark.local.dir","/data01/tmp/")
sc=SparkContext.getOrCreate(conf)
#from news_client import *
#import redis
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

delta_d = 30
before_yesterday = datetime.datetime.strftime(datetime.datetime.utcnow() - datetime.timedelta(delta_d+1), '%Y%m%d')
print before_yesterday
userid_profile = sqlContext.read.parquet('s3o://ASH-PROFILE/davinci/data/user_feature_long_term/region=af/date={}/feature_name=nl_key_entities_v2/version=0/nation=ng/language=en/part-*.snappy.parquet'\
                      .format(before_yesterday)).rdd

print userid_profile.count()



sc.stop()

[Этап 2: ================================================ ==========> (970 + 10) / 980] 19/06/01 03:01:22 ОШИБКА PythonRunner: работник Python неожиданно завершил работу (произошел сбой) org.apache.spark.api.python.PythonException: обратная связь (последний вызов был последним): Файл "/usr/hdp/2.5.6.0-40/spark2/python/lib/pyspark.zip/pyspark/worker.py", строка 157, в основном is_sql_udf = read_int (infile) Файл "/usr/hdp/2.5.6.0-40/spark2/python/lib/pyspark.zip/pyspark/serializers.py", строка 545, в read_int поднять EOFError EOFError

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Причина: java.io.IOException: не может прочитать класс org.apache.parquet.format.PageHeader: не знаю, какой тип: 14 в org.apache.parquet.format.Util.read (Util.java:216) в org.apache.parquet.format.Util.readPageHeader (Util.java:65) в org.apache.parquet.hadoop.ParquetFileReader $ WorkaroundChunk.readPageHeader (ParquetFileReader.java:668) в org.apache.parquet.hadoop.ParquetFileReader $ Chunk.readAllPages (ParquetFileReader.java:546) в org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup (ParquetFileReader.java:496) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup (VectorizedParquetRecordReader.java:270) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch (VectorizedParquetRecordReader.java:225) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue (VectorizedParquetRecordReader.java:137) в org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext (RecordReaderIterator.scala: 39) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.nextIterator (FileScanRDD.scala: 128) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.scan_nextBatch $ (неизвестный источник) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext (неизвестный источник) в org.apache.spark.sql.execution.BufferedRowIterator.hasNext (BufferedRowIterator.java:43) в org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$ anon $ 1.hasNext (WholeStageCodegenExec.scala: 370) на scala.collection.Iterator $$ anon $ 11.hasNext (Iterator.scala: 408) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.hasNext (SerDeUtil.scala: 117) в scala.collection.Iterator $ class.foreach (Iterator.scala: 893) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.foreach (SerDeUtil.scala: 112) в org.apache.spark.api.python.PythonRDD $ .writeIteratorToStream (PythonRDD.scala: 504) в org.apache.spark.api.python.PythonRunner $ WriterThread $$ anonfun $ run $ 3.apply (PythonRDD.scala: 328) в org.apache.spark.util.Utils $ .logUncaughtExceptions (Utils.scala: 1953) в org.apache.spark.api.python.PythonRunner $ WriterThread.run (PythonRDD.scala: 269) Вызывается: parquet.org.apache.thrift.protocol.TProtocolException: не знаю, какой тип: 14 на parquet.org.apache.thrift.protocol.TCompactProtocol.getTType (TCompactProtocol.java:806) на parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin (TCompactProtocol.java:500) в org.apache.parquet.format.InterningProtocol.readFieldBegin (InterningProtocol.java:158) в org.apache.parquet.format.PageHeader.read (PageHeader.java:828) в org.apache.parquet.format.Util.read (Util.java:213) ... еще 23 19/06/01 03:01:22 ОШИБКА PythonRunner: Это могло быть вызвано предыдущим исключением:java.io.IOException: не может читать класс org.apache.parquet.format.PageHeader: не знаю, какой тип: 14 в org.apache.parquet.format.Util.read (Util.java:216) в org.apache.parquet.format.Util.readPageHeader (Util.java:65) в org.apache.parquet.hadoop.ParquetFileReader $ WorkaroundChunk.readPageHeader (ParquetFileReader.java:668) в org.apache.parquet.hadoop.ParquetFileReader $ Chunk.readAllPages (ParquetFileReader.java:546) в org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup (ParquetFileReader.java:496) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup (VectorizedParquetRecordReader.java:270) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch (VectorizedParquetRecordReader.java:225) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue (VectorizedParquetRecordReader.java:137) в org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext (RecordReaderIterator.scala: 39) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.nextIterator (FileScanRDD.scala: 128) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.scan_nextBatch $ (неизвестный источник) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext (неизвестный источник) в org.apache.spark.sql.execution.BufferedRowIterator.hasNext (BufferedRowIterator.java:43) в org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$ anon $ 1.hasNext (WholeStageCodegenExec.scala: 370) на scala.collection.Iterator $$ anon $ 11.hasNext (Iterator.scala: 408) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.hasNext (SerDeUtil.scala: 117) в scala.collection.Iterator $ class.foreach (Iterator.scala: 893) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.foreach (SerDeUtil.scala: 112) в org.apache.spark.api.python.PythonRDD $ .writeIteratorToStream (PythonRDD.scala: 504) в org.apache.spark.api.python.PythonRunner $ WriterThread $$ anonfun $ run $ 3.apply (PythonRDD.scala: 328) в org.apache.spark.util.Utils $ .logUncaughtExceptions (Utils.scala: 1953) в org.apache.spark.api.python.PythonRunner $ WriterThread.run (PythonRDD.scala: 269) Вызывается: parquet.org.apache.thrift.protocol.TProtocolException: не знаю, какой тип: 14 на parquet.org.apache.thrift.protocol.TCompactProtocol.getTType (TCompactProtocol.java:806) на parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin (TCompactProtocol.java:500) в org.apache.parquet.format.InterningProtocol.readFieldBegin (InterningProtocol.java:158) в org.apache.parquet.format.PageHeader.read (PageHeader.java:828) в org.apache.parquet.format.Util.read (Util.java:213) ... еще 23 19/06/01 03:01:22 ОШИБКА Исполнитель: Исключение в задании 51.0 на этапе 2.0 (TID 87) java.io.IOException: не может читать класс org.apache.parquet.format.PageHeader: не знаю, какой тип: 14 в org.apache.parquet.format.Util.read (Util.java:216) в org.apache.parquet.format.Util.readPageHeader (Util.java:65) в org.apache.parquet.hadoop.ParquetFileReader $ WorkaroundChunk.readPageHeader (ParquetFileReader.java:668) в org.apache.parquet.hadoop.ParquetFileReader $ Chunk.readAllPages (ParquetFileReader.java:546) в org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup (ParquetFileReader.java:496) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup (VectorizedParquetRecordReader.java:270) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch (VectorizedParquetRecordReader.java:225)в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue (VectorizedParquetRecordReader.java:137) в org.apache.spark.sql.execution.datasources.RecordReaderIterator.RecordReaderIterator..apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.nextIterator (FileScanRD.128) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.scan_nextBatch $ (Источник) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext (Неизвестный источник) в org.apache.spark.sql.execution.BufferedRowIterator.hasNext (BufferedRowIterator.java:43) в org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$ anon $ 1.hasNext (WholeStageCodegenExec.scala: 370) в scala.collection.Iterator $$ anon $ 11.hasNext (Iterator.scala: 408) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.hasNext (SerDeUtil.scala: 117) в scala.collection.Iterator $ class.foreach (Iterator.scala: 893) в организации.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.foreach (SerDeUtil.scala: 112) в org.apache.spark.api.python.PythonRDD $ .writeIteratorToStream (PythonRDD.scala: 504) в org.apache.api.python.PythonRunner $ WriterThread $$ anonfun $ run $ 3.apply (PythonRDD.scala: 328) в org.apache.spark.util.Utils $ .logUncaughtExceptions (Utils.scala: 1953) в org.apache.spark.api.python.PythonRunner $ WriterThread.run (PythonRDD.scala: 269) Вызывается: parquet.org.apache.thrift.protocol.TProtocolException: не знаю, какой тип: 14 в parquet.org.apache.thrift.protocol.TCompactProtocol.getTType (TCompactProtocol.java:806) в parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin (TCompactProtocol.java:500) в org.apache.parquet.format.InterningProtocol.readFieldBegin (InterningProtocol.java:15org.apache.parquet.format.PageHeader.read (PageHeader.java:828) по адресу org.apache.parquet.format.Util.read (Util.java:213) ... еще 23 19/06/01 03:01:22 ОШИБКА TaskSetManager: Task51 на этапе 2.0 провалился 1 раз;прерывание трассировки задания (последний вызов был последним): файл "/data01/push_general/ttt.py", строка 24, печатный файл userid_profile.count () "/usr/hdp/2.5.6.0-40/spark2/python/lib/pyspark.zip/pyspark/rdd.py ", строка 1008, в файле count" /usr/hdp/2.5.6.0-40/spark2/python/lib/pyspark.zip/pyspark/rdd.py ", строка 999,в итоге файл "/usr/hdp/2.5.6.0-40/spark2/python/lib/pyspark.zip/pyspark/rdd.py", строка 873, в файле сгиба "/usr/hdp/2.5.6.0-40/spark2 / python / lib / pyspark.zip / pyspark / rdd.py ", строка 776, в файле для сбора" /usr/hdp/2.5.6.0-40/spark2/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py ", строка 1133, в call File" /usr/hdp/2.5.6.0-40/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py ", строка 63, в файле deco "/usr/hdp/2.5.6.0-40/spark2/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", строка 319, в get_return_value py4j.protocol.Py4JJavaError: Произошла ошибка при вызове z: org.apache.spark.api.python.PythonRDD.collectAndServe.: org.apache.spark.SparkException: Задание прервано из-за сбоя этапа: Задача 51 на этапе 2.0 не выполнена 1 раз, последний сбой: Потерянная задача 51.0 на этапе 2.0 (TID 87, localhost): java.io.IOException: невозможночитать класс org.apache.parquet.format.PageHeader: не знаю, какой тип: 14 в org.apache.parquet.format.Util.read (Util.java:216) в org.apache.parquet.format.Util.readPageHeader (Util.java:65) в org.apache.parquet.hadoop.ParquetFileReader $ WorkaroundChunk.readPageHeader (ParquetFileReader.java:668) в org.apache.parquet.hadoop.ParquetFileReader $ Chunk.readjF_Reader (Read)в org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup (ParquetFileReader.java:496)в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup (VectorizedParquetRecordReader.java:270) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch (VectorizedParquetRecordReader.java:225) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue (VectorizedParquetRecordReader.java:137) в org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext (RecordReaderIterator.scala: 39) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.nextIterator (FileScanRDD.scala: 128) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.scan_nextBatch $ (неизвестный источник) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext (неизвестный источник) в org.apache.spark.sql.execution.BufferedRowIterator.hasNext (BufferedRowIterator.java:43) в org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$ anon $ 1.hasNext (WholeStageCodegenExec.scala: 370) на scala.collection.Iterator $$ anon $ 11.hasNext (Iterator.scala: 408) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.hasNext (SerDeUtil.scala: 117) в scala.collection.Iterator $ class.foreach (Iterator.scala: 893) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.foreach (SerDeUtil.scala: 112) в org.apache.spark.api.python.PythonRDD $ .writeIteratorToStream (PythonRDD.scala: 504) в org.apache.spark.api.python.PythonRunner $ WriterThread $$ anonfun $ run $ 3.apply (PythonRDD.scala: 328) в org.apache.spark.util.Utils $ .logUncaughtExceptions (Utils.scala: 1953) в org.apache.spark.api.python.PythonRunner $ WriterThread.run (PythonRDD.scala: 269) Вызывается: parquet.org.apache.thrift.protocol.TProtocolException: не знаю, какой тип: 14 на parquet.org.apache.thrift.protocol.TCompactProtocol.getTType (TCompactProtocol.java:806) на parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin (TCompactProtocol.java:500) в org.apache.parquet.format.InterningProtocol.readFieldBegin (InterningProtocol.java:158) в org.apache.parquet.format.PageHeader.read (PageHeader.java:828) в org.apache.parquet.format.Util.read (Util.java:213) ... еще 23

трассировка стека драйверов: в org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages (DAGScheduler.scala: 1454) в org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply (DAGScheduler.scala: 1442) в org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply (DAGScheduler.scala: 1441) в scala.collection.mutable.ResizableArray $ class.foreach (ResizableArray.scala: 59) в scala.collection.mutable.ArrayBuffer.foreach (ArrayBuffer.scala: 48) в org.apache.spark.scheduler.DAGScheduler.abortStage (DAGScheduler.scala: 1441) в org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply (DAGScheduler.scala: 811) в org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply (DAGScheduler.scala: 811) в scala.Option.foreach (Option.scala: 257) в org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed (DAGScheduler.scala: 811) в org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive (DAGScheduler.scala: 1667) в org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala: 1622) в org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala: 1611) в org.apache.spark.util.EventLoop $$ anon $ 1.run (EventLoop.scala: 48) в org.apache.spark.scheduler.DAGScheduler.runJob (DAGScheduler.scala: 632) в org.apache.spark.SparkContext.runJob (SparkContext.scala: 1873)в org.apache.spark.SparkContext.runJob (SparkContext.scala: 1886) в org.apache.spark.SparkContext.runJob (SparkContext.scala: 1899) в org.apache.spark.SparkContext.runJob (SparkCon13xt.) в org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1.apply (RDD.scala: 912) в org.apache.spark.rdd.RDDOperationScope $ .withScope (RDDOperationScope.scala: 151) в org.apache.spark.rdd.RDDOperationScope $ .withScope (RDDOperationScope.scala: 112) в org.apache.spark.rdd.RDD.withScope (RDD.scala: 358) в org.apache.spark.rdd.RDD.collect (RDD.scala: 911) в org.apache.spark.api.python.PythonRDD $ .collectAndServe (PythonRDD.scala: 453) в org.apache.spark.api.python.PythonRDD.collectAndServe (PythonRDD.scala) в sun.reflect.NativeMethodAccessorImpl.invoke0 (нативный метод) в sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) в sun.reflect.DelegatingMethodAccessorImpl.invoke (methodava.tho.jj.Java: 498) на py4j.reflection.MethodInvoker.invoke (MethodInvoker.java:237) в py4j.reflection.ReflectionEngine.invoke (ReflectionEngine.java:357) в py4j.Gateway.invoke (Gateway.java:280) в py4j.commands.AbstractCommand.invokeMethoj (AbstractCom: 132) в py4j.commands.CallCommand.execute (CallCommand.java:79) в py4j.GatewayConnection.run (GatewayConnection.java:214) в java.lang.Thread.run (Thread.java:745) Причина: java.io.IOException: невозможно прочитать класс org.apache.parquet.format.PageHeader: не знаю, какой тип: 14 в org.apache.parquet.format.Util.read (Util.java:216) в org.apache.parquet.format.Util.readPageHeader (Util.java:65) в org.apache.parquet.hadoop.ParquetFileReader $ WorkaroundChunk.readPageHeader (ParquetFileReader.java:668) в org.apache.parquet.hadoop.ParagesFreadAader(ParquetFileReader.java:546) по адресу org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup (ParquetFileReader.java:496) по адресу org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRckndEfectorizedParquetRecordReader.java:270) в org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch (VectorizedParquetRecordReader.java:225) в org.apache.spark.sql.exourcesuearVectorizedParquetRecordReader.java:137) в org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext (RecordReaderIterator.scala: 39) в org.apache.spark.sql.execution.datasources.FileScanxt $ $FileScanRDD.scala: 91) в org.apache.spark.sql.execution.datasources.FileScanRDD $$ anon $ 1.nextIterator (FileScanRDD.scala: 128) в org.apache.spark.sql.execution.datasources.FileScanRDD $ $$ 1.hasNext (FileScanRDD.scala: 91) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.scan_nextBatch $ (неизвестный источник) в org.apache.spark.sql.catalyst.expressions.GeneratedClass $ GeneratedIterator.processNext (неизвестный источник) в org.apache.spark.sql.execution.BufferedRowIterator.hasNext (BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec $$ anonfun $ 8 $$ anon $ 1.hasNext (WholeStageCodegenExec.scala: 370) в scala.collection.Iterator $$ anon $ 11.hasNext (итератор.scala: 408) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.hasNext (SerDeUtil.scala: 117) в scala.collection.Iterator $ class.foreach (Iterator.scala: 893) в org.apache.spark.api.python.SerDeUtil $ AutoBatchedPickler.foreach (SerDeUtil.scala: 112) в org.apache.spark.api.python.PythonRDD $ .writeIteratorToStream (PythonRDD.scala: 504) в org.apache.spark.api.py.PythonRunner $ WriterThread $$ anonfun $ run $ 3.apply (PythonRDD.scala: 328) в org.apache.spark.util.Utils $ .logUncaughtExceptions (Utils.scala: 1953) в org.apache.spark.api.python.PythonRunner$ WriterThread.run (PythonRDD.scala: 269)Вызывается: parquet.org.apache.thrift.protocol.TProtocolException: не знаю, какой тип: 14 на parquet.org.apache.thrift.protocol.TCompactProtocol.getTType (TCompactProtocol.java:806) на parquet.org.apache.thrift.: 828) в org.apache.parquet.format.Util.read (Util.java:213) ... еще 23

Ошибка запуска pyspark, невозможно прочитать класс org.apache.parquet.format.PageHeader

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Ошибка запуска pyspark, невозможно прочитать класс org.apache.parquet.format.PageHeader

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Нет похожих вопросов