Сохранение фрейма данных PySpark в файл паркета - PullRequest
0 голосов
/ 13 января 2019

Я получаю исключение при попытке сохранить фрейм данных PySpark.

Вот мой код с примером игрушки:

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
import pyspark
import pandas as pd

toy_df = '{"userId":{"0":1,"1":1,"10":1,"100":3,"1000":15,"10000":71,"10001":71,"10002":71,"10003":71,"10004":71},"movieId":{"0":31,"1":1029,"10":1371,"100":296,"1000":157,"10000":581,"10001":589,"10002":908,"10003":1171,"10004":1259},"rating":{"0":2.5,"1":3.0,"10":2.5,"100":4.5,"1000":2.0,"10000":4.0,"10001":3.0,"10002":5.0,"10003":5.0,"10004":4.0},"timestamp":{"0":1260748800000,"1":1260748800000,"10":1260748800000,"100":1298851200000,"1000":1052870400000,"10000":974592000000,"10001":974592000000,"10002":974592000000,"10003":974592000000,"10004":974592000000}}'
toy_df = pd.read_json(toy_df)

# Make the pandas dataframe a pyspark dataframe
toy = spark.createDataFrame(toy_df)

# Write the pyspark dataframe to disk
toy.write.save('toy', format='parquet', mode='append')

Ошибка:

Py4JJavaError: Произошла ошибка при вызове o152.save.

...