Я выполняю запрос Spark SQL.
sqlContext.sql("""SELECT classification,
docType,
keywordList,
target_code
from psyc""").show(15)
Я получаю странные результаты в строке, где столбцы отображаются случайным образом. (пятый ряд снизу). Я проверил уникальные значения в разных столбцах в файле CSV, и эти столбцы не содержат этих значений.
+--------------------+--------------------+--------------------+--------------------+
| classification| docType| keywordList| target_code|
+--------------------+--------------------+--------------------+--------------------+
|[{'code': '3297',...| Journal Article|"['biochemical ma...|[1940, 5940, 5977...|
|[{'code': '3410',...| Journal Article|['learning portfo...|[10330, 12810, 30...|
|[{'code': '3410',...| Journal Article|['medical educati...|[11448, 16140, 30...|
|[{'code': '2227',...| Journal Article|['medical communi...|[10540, 30340, 30...|
|[{'code': '3410',...| Journal Article|['teaching course...|[30340, 40680, 46...|
|[{'code': '2224',...| Journal Article|['outpatient clin...|[23340, 35970, 40...|
|[{'code': '3410',...| Journal Article|['problem-based l...|[10140, 30340, 30...|
|[{'code': '3410',...| Journal Article|['multifaceted ed...|[30340, 30350, 30...|
|[{'code': '3410',...| Journal Article|['computer-aided ...|[23415, 30340, 30...|
|[{'code': '3410',...| Journal Article|['professional ro...|[12810, 30340, 30...|
| either a confide...| all rights reser...| Journal Article|['recognition mem...|
|[{'code': '2240',...| Journal Article|['subjective prob...|[11230, 34420, 40...|
|[{'code': '2343',...| Journal Article|['item recognitio...|[30570, 43350, 47...|
|[{'code': '2343',...| Journal Article|['list length', '...|[20350, 30570, 39...|
|[{'code': '2340',...| Journal Article|['Euclidean dista...|[30883, 43000, 48...|
+--------------------+--------------------+--------------------+--------------------+
Вот остаток кода
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
spark = SparkSession \
.builder \
.appName("Python Spark SQL ") \
.getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
fp = os.path.join(BASE_DIR,'psyc.csv')
df = spark.read.csv(fp,header=True)
df.printSchema()
df.createOrReplaceTempView("psyc")