Я выбрал СДР по ключу, который содержит более 2-х ключей:
dfrdd= dfrdd.sampleByKey("_c0", fractions={1:0.3, 0: 0.3})
Но теперь я хочу преобразовать его из rdd в кадр данных:
df= dfrdd.toDF()
Но Я получаю упомянутую ошибку:
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 139, 172.30.48.187, executor 1): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/ibm/spark/python/lib/pyspark.zip/pyspark/worker.py", line 377, in main
process()
File "/opt/ibm/spark/python/lib/pyspark.zip/pyspark/worker.py", line 372, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/ibm/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 393, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/opt/ibm/spark/python/pyspark/rdd.py", line 1354, in takeUpToNumLeft
yield next(iterator)
File "/opt/ibm/spark/python/pyspark/rddsampler.py", line 109, in func
for key, val in iterator:
ValueError: too many values to unpack (expected 2)```