CSV-файл
В столбце df есть несколько строк, которые не начинаются с di git, я хочу их удалить, я пробовал код ниже, но они не работает
import re
df = sqlContext.read.csv("/FileStore/tables/mtmedical_V6-16623.csv", header='true', inferSchema="true")
df.show()
import pyspark.sql.functions as f
w=df.filter(df['_c0'].isdigit()) #error1
w=df.filter(df['_c0'].startswith(('1','2','3','4','5','6','7','8','9'))) #error2
w.show()
ошибки:
'Column' object is not callable #no1
py4j.Py4JException: Method startsWith([class java.util.ArrayList]) does not exist #no2
вот таблица, вы можете видеть, что строка под строкой 7 в столбце '_c0' не начинается с di git , как удалить такие строки?
+--------------------+--------------------+--------------------+--------------------+--------------------+-------------------------------------------------------+--------------------+--------------------+
| _c0| description| medical_specialty| age| gender|sample_name (What has been done to patient = Treatment)| transcription| keywords|
+--------------------+--------------------+--------------------+--------------------+--------------------+-------------------------------------------------------+--------------------+--------------------+
| 1| A 23-year-old wh...| Allergy / Immuno...| 23| female| Allergic Rhinitis |SUBJECTIVE:, Thi...|allergy / immunol...|
| 2| Consult for lapa...| Bariatrics| null| male| Laparoscopic Gas...|PAST MEDICAL HIST...|bariatrics, lapar...|
| 3| Consult for lapa...| Bariatrics| 42| male| Laparoscopic Gas...|"HISTORY OF PRESE...| at his highest h...|
| 4| 2-D M-Mode. Dopp...| Cardiovascular /...| null| null| 2-D Echocardiogr...|2-D M-MODE: , ,1....|cardiovascular / ...|
| 5| 2-D Echocardiogram| Cardiovascular /...| null| male| 2-D Echocardiogr...|1. The left vent...|cardiovascular / ...|
| 6| Morbid obesity. ...| Bariatrics| 30| male| Laparoscopic Gas...|PREOPERATIVE DIAG...|bariatrics, gastr...|
| 7| Liposuction of t...| null| null| null| null| null| null|
|", Bariatrics,31,...| 1. Deformity| right breast rec...|2. Excess soft t...| anterior abdomen...| 3. Lipodystrophy...|POSTOPERATIVE DIA...| 1. Deformity|
| 8| 2-D Echocardiogram| Cardiovascular /...| null| male| 2-D Echocardiogr...|2-D ECHOCARDIOGRA...|cardiovascular / ...|