Это ожидаемое поведение, так как у нас есть только one record
(в ссылке, предоставленной в вопросе) в наличии мета (объекта) и данных (массива).
Как one json
запись состоит из нескольких строк, поэтому нам нужно включить опцию multiLine
.
spark.read.option("multiLine",true).option("mode","PERMISSIVE").json("tmp.json").show()
//sample data
//+--------------------+--------------------+
//| data| meta|
//+--------------------+--------------------+
//|[[row-8eh8_xxkx-u...|[[[[1439474950, t...|
//+--------------------+--------------------+
//access meta struct columns
df.select("meta.view.*").show()
//+--------------------+-------------+--------------------+--------------------+----------+--------------------+-----------+-------------+--------------------+--------------------+---------------+----------------+---------+--------------+--------------------+--------------------+----------+----------------+--------+--------------------+----------+------------------------+---------------+----------------+----------------+--------------------+------+--------+-------------+-------------+--------------------+-------+--------------------+---------------+---------+----------------+--------+
//| approvals|averageRating| category| columns| createdAt| description|displayType|downloadCount| flags| grants|hideFromCatalog|hideFromDataJson| id|indexUpdatedAt| metadata| name|newBackend|numberOfComments| oid| owner|provenance|publicationAppendEnabled|publicationDate|publicationGroup|publicationStage| query|rights|rowClass|rowsUpdatedAt|rowsUpdatedBy| tableAuthor|tableId| tags|totalTimesRated|viewCount|viewLastModified|viewType|
//+--------------------+-------------+--------------------+--------------------+----------+--------------------+-----------+-------------+--------------------+--------------------+---------------+----------------+---------+--------------+--------------------+--------------------+----------+----------------+--------+--------------------+----------+------------------------+---------------+----------------+----------------+--------------------+------+--------+-------------+-------------+--------------------+-------+--------------------+---------------+---------+----------------+--------+
//|[[1439474950, tru...| 0|Environmental Hea...|[[, meta_data,, :...|1439381433|The Environmental...| table| 26159|[default, restora...|[[[public], false...| false| false|cjae-szjv| 1528204279|[[table, fatrow, ...|Air Quality Measu...| true| 0|12801487|[Tracking, 94g5-7...| official| false| 1439474950| 3957835| published|[[[true, [2171820...|[read]| | 1439402317| 94g5-7as2|[Tracking, 94g5-7...|3960642|[environmental ha...| 0| 3843| 1528203875| tabular|
//+--------------------+-------------+--------------------+--------------------+----------+--------------------+-----------+-------------+--------------------+--------------------+---------------+----------------+---------+--------------+--------------------+--------------------+----------+----------------+--------+--------------------+----------+------------------------+---------------+----------------+----------------+--------------------+------+--------+-------------+-------------+--------------------+-------+--------------------+---------------+---------+----------------+--------+
//to access data array we need to explode
df.selectExpr("explode(data)").show()
//+--------------------+
//| col|
//+--------------------+
//|[row-8eh8_xxkx-u3...|
//|[row-u2v5_78j5-px...|
//|[row-68zj_7qfn-sx...|
//|[row-8b4d~zt5j~da...|
//|[row-5gee.63td_z6...|
//|[row-tzyx.ssxh_pz...|
//|[row-3yj2_u42c_mr...|
//|[row-va7z.p2v8.7p...|
//|[row-r7kk_e3dm-z2...|
//|[row-bnrc~w34s-4a...|
//|[row-ezrk~m5dc_5n...|
//|[row-nyya.dvnz~c6...|
//|[row-dq3i_wt6d_c6...|
//|[row-u6rc-k3mf-cn...|
//|[row-t9c6-4d4b_r6...|
//|[row-vq6r~mxzj-e6...|
//|[row-vxqn-mrpc~5b...|
//|[row-3akn_5nzm~8v...|
//|[row-ugxn~bhax.a2...|
//|[row-ieav.mdz9-m8...|
//+--------------------+
Load multiple json records:
//json array with two records
spark.read.json(Seq(("""
[{"id":1,"name":"a"},
{"id":2,"name":"b"}]
""")).toDS).show()
//as we have 2 json objects and loaded as 2 rows
//+---+----+
//| id|name|
//+---+----+
//| 1| a|
//| 2| b|
//+---+----+