У меня есть фрейм данных в Pyspark
df.show()
+---+----+-------+----------+-----+------+
| id|name|testing|avg_result|score|active|
+---+----+-------+----------+-----+------+
| 1| sam| null| null| null| true|
| 2| Ram| Y| 0.05| 10| false|
| 3| Ian| N| 0.01| 1| false|
| 4| Jim| N| 1.2| 3| true|
+---+----+-------+----------+-----+------+
Схема ниже:
DataFrame[id: int, name: string, testing: string, avg_result: string, score: string, active: boolean]
Я хочу преобразовать Y
в True
, N
в False
true
в True
и false
в False
.
Когда я делаю, как показано ниже:
for col in cols:
df = df.withColumn(col, f.when(f.col(col) == 'N', 'False').when(f.col(col) == 'Y', 'True').
when(f.col(col) == 'true', True).when(f.col(col) == 'false', False).otherwise(f.col(col)))
Я получаю ошибку ниже, и в кадре данных нет изменений
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CASE WHEN (testing = N) THEN False WHEN (testing = Y) THEN True WHEN (testing = true) THEN true WHEN (testing = false) THEN false ELSE testing' due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;"
+---+----+-------+----------+-----+------+
| id|name|testing|avg_result|score|active|
+---+----+-------+----------+-----+------+
| 1| sam| null| null| null| true|
| 2| Ram| Y| 0.05| 10| false|
| 3| Ian| N| 0.01| 1| false|
| 4| Jim| N| 1.2| 3| true|
+---+----+-------+----------+-----+------+
Когда мне нравится, как показано ниже
for col in cols:
df = df.withColumn(col, f.when(f.col(col) == 'N', 'False').when(f.col(col) == 'Y', 'True').otherwise(f.col(col)))
Я получаю ошибку ниже
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CASE WHEN if ((isnull(active) || isnull(cast(N as double)))) null else CASE cast(cast(N as double) as double) WHEN cast(1 as double) THEN active WHEN cast(0 as double) THEN NOT active ELSE false THEN False WHEN if ((isnull(active) || isnull(cast(Y as double)))) null else CASE cast(cast(Y as double) as double) WHEN cast(1 as double) THEN active WHEN cast(0 as double) THEN NOT active ELSE false THEN True ELSE active' due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;"
Но кадр данных меняется на
+---+----+-------+----------+-----+------+
| id|name|testing|avg_result|score|active|
+---+----+-------+----------+-----+------+
| 1| sam| null| null| null| true|
| 2| Ram| True| 0.05| 10| false|
| 3| Ian| False| 0.01| 1| false|
| 4| Jim| False| 1.2| 3| true|
+---+----+-------+----------+-----+------+
New attempt
for col in cols:
df = df.withColumn(col, f.when(f.col(col) == 'N', 'False').when(f.col(col) == 'Y', 'True').
when(f.col(col) == 'true', 'True').when(f.col(col) == 'false', 'False').otherwise(f.col(col)))
Error received
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CASE WHEN if ((isnull(active) || isnull(cast(N as double)))) null else CASE cast(cast(N as double) as double) WHEN cast(1 as double) THEN active WHEN cast(0 as double) THEN NOT active ELSE false THEN False WHEN if ((isnull(active) || isnull(cast(Y as double)))) null else CASE cast(cast(Y as double) as double) WHEN cast(1 as double) THEN active WHEN cast(0 as double) THEN NOT active ELSE false THEN True WHEN if ((isnull(active) || isnull(cast(true as double)))) null else CASE cast(cast(true as double) as double) WHEN cast(1 as double) THEN active WHEN cast(0 as double) THEN NOT active ELSE false THEN True WHEN if ((isnull(active) || isnull(cast(false as double)))) null else CASE cast(cast(false as double) as double) WHEN cast(1 as double) THEN active WHEN cast(0 as double) THEN NOT active ELSE false THEN False ELSE active' due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;"
Как я могу получить фрейм данных как
+---+----+-------+----------+-----+------+
| id|name|testing|avg_result|score|active|
+---+----+-------+----------+-----+------+
| 1| sam| null| null| null| True|
| 2| Ram| True| 0.05| 10| False|
| 3| Ian| False| 0.01| 1| False|
| 4| Jim| False| 1.2| 3| True|
+---+----+-------+----------+-----+------+