Я пытаюсь запустить скрипт pyspark. Он читает файл, затем создает представление, а затем запускает запрос SQL поверх него. Столбцы разделены знаком '|' разделитель. Но в одном столбце данных есть несколько '|' поэтому после получения '|' там разделены данные.
Запись в столбце -
"From: Azadtalab, Maysam [JOICA Non-J&J] Sent: Monday, September 26, 2016 1:42 PM To: MedInfo Canada [JOICA] Subject: Request for studies Hi MedInfo, Could you please send the following information: Request: - Difference between brand and generic - OROS tech with Graph - Fallu study - Van Stralen study Product: CONCERTA Reporter Type: Pharmacist Salutation: Ms. First Name: Ann Last Name: Bertrand Province: ON Email/Address/Fax: annvbertrand@gmail.com<mailto:annvbertrand@gmail.com> Language: English Fulfillment Mode: Email Employee Name: Maysam Azadtalab Thanks Maysam Azadtalab Healthcare Relationship Specialist - Concerta(r) | Janssen Pharmaceutical Canada 19 Green Belt Dr., Toronto, ON, M3C1L9 Office: +1 (416) 382 5182| +1 (800) 387 8781 ext.5182 E-mail: mazadtal@its.jnj.com<mailto:mazadtal@its.jnj.com> [cid:image001.png@01CE2FB2.759A3E50] Confidentiality Notice: This e-mail transmission may contain confidential or legally privileged information and is intended only for the individual or entity named in the e-mail address. Any disclosure, copying, distribution or reliance upon the contents of this e-mail not otherwise authorized by the sending is strictly prohibited. If you have received this e-mail transmission in error, please immediately reply to the sender, so that proper delivery of the e-mail can be effected, and then please delete the message from your inbox. Thank you."
Получение записи в выходном файле:
"From: Azadtalab, Maysam [JOICA Non-J&J] Sent: Monday, September 26, 2016 1:42 PM To: MedInfo Canada [JOICA] Subject: Request for studies Hi MedInfo, Could you please send the following information: Request: - Difference between brand and generic - OROS tech with Graph - Fallu study - Van Stralen study Product: CONCERTA Reporter Type: Pharmacist Salutation: Ms. First Name: Ann Last Name: Bertrand Province: ON Email/Address/Fax: annvbertrand@gmail.com<mailto:annvbertrand@gmail.com> Language: English Fulfillment Mode: Email Employee Name: Maysam Azadtalab Thanks Maysam Azadtalab Healthcare Relationship Specialist - Concerta(r)
Код для чтения файла RawLayer:
df_read_file = sqlContext.read.format('com.databricks.spark.csv').option("mode", "DROPMALFORMED").option("delimiter", '|').options(header='true',quote='\"',inferSchema='false').load(row['Source File Name Lnd'])
Пожалуйста, помогите мне и дайте мне знать, если вам нужна дополнительная информация.