Я изучаю IBM Apache Spark. Я использую набор данных HMP. Я следовал инструкциям из учебника, но код работает не так, как задумано. Вот мой код:
!git clone https://github.com/wchill/HMP_Dataset
from pyspark.sql.types import StructType, StructField, IntegerType
schema = StructType([
StructField("x",IntegerType(), True),
StructField("y",IntegerType(), True),
StructField("z",IntegerType(), True)
])
import os
file_list = os.listdir("HMP_Dataset")
file_list_filtered = [file for file in file_list if "_" in file]
from pyspark.sql.functions import lit
for cat in file_list_filtered:
data_files = os.listdir("HMP_Dataset/" + cat)
for data_file in data_files:
print(data_file)
temp_df = spark.read.option("header","false").option( "delimeter" , " ").csv("HMP_Dataset/" + cat + "/" + data_file, schema=schema)
temp_df = temp_df.withColumn("class",lit(cat))
temp_df = temp_df.withColumn("source",lit(data_file))
if df is None:
df = temp_df
else:
df = df.union(temp_df)
Схема x, y, z остается нулевой при использовании метода df.show (). Вот вывод:
+----+----+----+-----------+--------------------+
| x| y| z| class| source|
+----+----+----+-----------+--------------------+
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
|null|null|null|Brush_teeth|Accelerometer-201...|
+----+----+----+-----------+--------------------+
only showing top 20 rows
Столбцы x, y, z должны иметь числа. Что я делаю неправильно? Я использовал точный код, показанный в обучающем видео. Я использую IBM Watson Studio для запуска программы. Ссылка на учебник https://www.coursera.org/learn/advanced-machine-learning-signal-processing/lecture/8cfiW/introduction-to-sparkml