(Apache Spark версии 2.3.1 для блоков данных)
Здравствуйте, у меня есть дамп JSON, который выглядит следующим образом
[{"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10342083, "venue_id": 273277, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18647, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 472158, "visitorteam_coach_id": 474616}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 18783, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 15251, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 14:00:00", "timezone": "UTC", "timestamp": 1530885600, "time": "14:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}, {"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10344350, "venue_id": 8869, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18743, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 474720, "visitorteam_coach_id": 474796}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 16781, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 18704, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 18:00:00", "timezone": "UTC", "timestamp": 1530900000, "time": "18:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}]
Я пытаюсь преобразовать его в фрейм данных непосредственно из переменной вместо загрузки файла JSON; главным образом потому, что я получаю данные JSON из GET-запроса к API.
Это мой код для конвертации -
countries = spark.read.option("multiline", "true").json(json.dumps(ts)).show(false)
Дает мне эту ошибку, пожалуйста, укажите мне правильное направление. Я проверил, но я просто вижу решения для Scala. Ищите исправление Python к тому же.
IllegalArgumentException: u'java.net.URISyntaxException: Относительный путь
в абсолютном URI:"[{\" Зачёт \ ":% 20% 7B% 5C% 22visitorteam_position% 5C% 22:% 201,% 20% 5C% 22localteam_position% 5C% 22:% 201% 7D,% 20% 5C% 22season_id% 5C% 22:% 20892,% 20% 5C% 22pitch% 5C% 22:% 20null,% 20% 5C% 22commentaries% 5C% 22:% 20null,% 20% 5C% 22id% 5C% 22:% 2010342083,% 20% 5C% 22venue_id% 5C% 22:% 20273277,% 20% 5C% 22formations% 5C% 22:% 20% 7B% 5C% 22localteam_formation% 5C% 22:% 20null,% 20% 5C% 22visitorteam_formation% 5C% 22:% 20null% 7D,% 20% 5C% 22aggregate_id% 5C% 22:% 20null,% 20% 5C% 22round_id% 5C% 22:% 20null,% 20% 5C% 22visitorteam_id% 5C% 22:% 2018647,% 20% 5C % 22winning_odds_calculated% 5C% 22:% 20false,% 20% 5C% 22deleted% 5C% 22:% 20false,% 20% 5C% 22coaches% 5C% 22:% 20% 7B% 5C% 22localteam_coach_id% 5C% 22:% 20472158 ,% 20% 5C% 22visitorteam_coach_id% 5C% 22:% 20474616% 7D,% 20% 5C% 22attendance% 5C% 22:% 20null,% 20% 5C% 22scores% 5C% 22:% 20% 7B% 5C% 22ft_score % 5C% 22:% 20null,% 20% 5C% 22visitorteam_score% 5C% 22:% 200,% 20% 5C% 22et_score% 5C% 22:% 20null,% 20% 5C% 22localteam_pen_score% 5C% 22:% 20null, % 20% 5C% 22visitorteam_pen_score% 5C% 22:% 20null,% 20% 5C% 22localteam_score% 5C% 22:% 200,% 20% 5C% 22ht_score% 5C% 22:% 20null% 7D,% 20% 5C% 22referee_id% 5C% 22:% 2018783,% 20% 5C% 22stage_id% 5C% 22:% 201728,% 20% 5C% 22weather_report% 5C% 22:% 20null,% 20% 5C % 22league_id% 5C% 22:% 20732,% 20% 5C% 22localteam_id% 5C% 22:% 2015251,% 20% 5C% 22time% 5C% 22:% 20% 7B% 5C% 22status% 5C% 22:% 20 % 5C% 22NS% 5C% 22% 20% 5C% 22starting_at% 5C% 22:% 20% 7B% 5C% 22date% 5C% 22:% 20% 5C% 222018-07-06% 5C% 22% 20 % 5C% 22date_time% 5C% 22:% 20% 5C% 222018-07-06% 2014: 00: 00% 5C% 22% 20% 5C% 22timezone% 5C% 22:% 20% 5C% 5C% 22UTC% 22,% 20% 5C% 22timestamp% 5C% 22:% 201530885600,% 20% 5C% 22time% 5C% 22:% 20% 5C% 2214: 00: 00% 5C% 22% 7D,% 20% 5C% 22extra_minute % 5C% 22:% 20null,% 20% 5C% 22injury_time% 5C% 22:% 20null,% 20% 5C% 22second% 5C% 22:% 20null,% 20% 5C% 22added_time% 5C% 22:% 20null, % 20% 5C% 22minute% 5C% 22:% 20null% 7D,% 20% 5C% 22group_id% 5C% 22:% 20null% 7D,% 20% 7B% 5C% 22standings% 5C% 22:% 20% 7B% 5C% 22visitorteam_position% 5C% 22:% 201,% 20% 5C% 22localteam_position% 5C% 22:% 201% 7D,% 20% 5C% 22season_id% 5C% 22:% 20892,% 20% 5C% 22pitch% 5C% 22:% 20null,% 20% 5C% 22commentaries% 5C% 22:% 20null,% 20% 5C% 22id% 5C% 22:% 2010344350,% 20% 5C% 22venue_id% 5C% 22:% 208869,% 20% 5C% 22f ormations% 5C% 22:% 20% 7B% 5C% 22localteam_formation% 5C% 22:% 20null,% 20% 5C% 22visitorteam_formation% 5C% 22:% 20null% 7D,% 20% 5C% 22aggregate_id% 5C% 22:% 20null,% 20% 5C% 22round_id% 5C% 22:% 20null,% 20% 5C% 22visitorteam_id% 5C% 22:% 2018743,% 20% 5C% 22winning_odds_calculated% 5C% 22:% 20false,% 20% 5C% 22deleted % 5C% 22:% 20false,% 20% 5C% 22coaches% 5C% 22:% 20% 7B% 5C% 22localteam_coach_id% 5C% 22:% 20474720,% 20% 5C% 22visitorteam_coach_id% 5C% 22:% 20474796% 7D ,% 20% 5C% 22attendance% 5C% 22:% 20null,% 20% 5C% 22scores% 5C% 22:% 20% 7B% 5C% 22ft_score% 5C% 22:% 20null,% 20% 5C% 22visitorteam_score% 5C % 22:% 200,% 20% 5C% 22et_score% 5C% 22:% 20null,% 20% 5C% 22localteam_pen_score% 5C% 22:% 20null,% 20% 5C% 22visitorteam_pen_score% 5C% 22:% 20null,% 20 % 5C% 22localteam_score% 5C% 22:% 200,% 20% 5C% 22ht_score% 5C% 22:% 20null% 7D,% 20% 5C% 22referee_id% 5C% 22:% 2016781,% 20% 5C% 22stage_id% 5C % 22:% 201728,% 20% 5C% 22weather_report% 5C% 22:% 20null,% 20% 5C% 22league_id% 5C% 22:% 20732,% 20% 5C% 22localteam_id% 5C% 22:% 2018704,% 20 % 5C% 22time% 5C% 22:% 20% 7B% 5C% 22status% 5C% 22:% 20% 5C% 22NS% 5C% 22% 20% 5C% 22starting_at% 5C% 22: % 20% 7B% 5C% 22date% 5C% 22:% 20% 5C% 222018-07-06% 5C% 22% 20% 5C% 22date_time% 5C% 22:% 20% 5C% 222018-07-06% 2018: 00: 00% 5C% 22% 20% 5C% 22timezone% 5C% 22:% 20% 5C% 22UTC% 5C% 22% 20% 5C% 22timestamp% 5C% 22:% 201530900000,% 20% 5C % 22time% 5C% 22:% 20% 5C% 2218: 00: 00% 5C% 22% 7D,% 20% 5C% 22extra_minute% 5C% 22:% 20null,% 20% 5C% 22injury_time% 5C% 22:% 20null,% 20% 5C% 22second% 5C% 22:% 20null,% 20% 5C% 22added_time% 5C% 22:% 20null,% 20% 5C% 22minute% 5C% 22:% 20null% 7D,% 20% 5C % 22group_id% 5C% 22:% 20null% 7D% 5D% 22'
Выход для
печать (ц)
Out[45]:
[{u'aggregate_id': None,
u'attendance': None,
u'coaches': {u'localteam_coach_id': 472158, u'visitorteam_coach_id': 474616},
u'commentaries': None,
u'deleted': False,
u'formations': {u'localteam_formation': None,
u'visitorteam_formation': None},
u'group_id': None,
u'id': 10342083,
u'league_id': 732,
u'localteam_id': 15251,
u'pitch': None,
u'referee_id': 18783,
u'round_id': None,
u'scores': {u'et_score': None,
u'ft_score': None,
u'ht_score': None,
u'localteam_pen_score': None,
u'localteam_score': 0,
u'visitorteam_pen_score': None,
u'visitorteam_score': 0},
u'season_id': 892,
u'stage_id': 1728,
u'standings': {u'localteam_position': 1, u'visitorteam_position': 1},
u'time': {u'added_time': None,
u'extra_minute': None,
u'injury_time': None,
u'minute': None,
u'second': None,
u'starting_at': {u'date': u'2018-07-06',
u'date_time': u'2018-07-06 14:00:00',
u'time': u'14:00:00',
u'timestamp': 1530885600,
u'timezone': u'UTC'},
u'status': u'NS'},
u'venue_id': 273277,
u'visitorteam_id': 18647,
u'weather_report': None,
u'winning_odds_calculated': False},
{u'aggregate_id': None,
u'attendance': None,
u'coaches': {u'localteam_coach_id': 474720, u'visitorteam_coach_id': 474796},
u'commentaries': None,
u'deleted': False,
u'formations': {u'localteam_formation': None,
u'visitorteam_formation': None},
u'group_id': None,
u'id': 10344350,
u'league_id': 732,
u'localteam_id': 18704,
u'pitch': None,
u'referee_id': 16781,
u'round_id': None,
u'scores': {u'et_score': None,
u'ft_score': None,
u'ht_score': None,
u'localteam_pen_score': None,
u'localteam_score': 0,
u'visitorteam_pen_score': None,
u'visitorteam_score': 0},
u'season_id': 892,
u'stage_id': 1728,
u'standings': {u'localteam_position': 1, u'visitorteam_position': 1},
u'time': {u'added_time': None,
u'extra_minute': None,
u'injury_time': None,
u'minute': None,
u'second': None,
u'starting_at': {u'date': u'2018-07-06',
u'date_time': u'2018-07-06 18:00:00',
u'time': u'18:00:00',
u'timestamp': 1530900000,
u'timezone': u'UTC'},
u'status': u'NS'},
u'venue_id': 8869,
u'visitorteam_id': 18743,
u'weather_report': None,
u'winning_odds_calculated': False}]
печать (json.dumps (ц))
Out[44]: '[{"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10342083, "venue_id": 273277, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18647, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 472158, "visitorteam_coach_id": 474616}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 18783, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 15251, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 14:00:00", "timezone": "UTC", "timestamp": 1530885600, "time": "14:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}, {"standings": {"visitorteam_position": 1, "localteam_position": 1}, "season_id": 892, "pitch": null, "commentaries": null, "id": 10344350, "venue_id": 8869, "formations": {"localteam_formation": null, "visitorteam_formation": null}, "aggregate_id": null, "round_id": null, "visitorteam_id": 18743, "winning_odds_calculated": false, "deleted": false, "coaches": {"localteam_coach_id": 474720, "visitorteam_coach_id": 474796}, "attendance": null, "scores": {"ft_score": null, "visitorteam_score": 0, "et_score": null, "localteam_pen_score": null, "visitorteam_pen_score": null, "localteam_score": 0, "ht_score": null}, "referee_id": 16781, "stage_id": 1728, "weather_report": null, "league_id": 732, "localteam_id": 18704, "time": {"status": "NS", "starting_at": {"date": "2018-07-06", "date_time": "2018-07-06 18:00:00", "timezone": "UTC", "timestamp": 1530900000, "time": "18:00:00"}, "extra_minute": null, "injury_time": null, "second": null, "added_time": null, "minute": null}, "group_id": null}]'
Заранее спасибо!
PS. - Вот ссылка о том, как это сделать с помощью Scala - http://spark.apache.org/docs/2.2.0/sql-programming-guide.html#tab_scala_5