SELECT url,
regexp_replace(title, '(http|ftp|file|https)://[-a-z0-9+&@#/\%?=~_-|!:,.;/]*|\<.*?\>|(=+)\s*(.*?)\s*(=+)|&\w+;', '') AS text_body
FROM df_table_doc
0 https://demo.com New Arch {Onboarding}..Lets (Onboard) it..
1 https://example.com New Arch (Onboarding)
Добавление шаблона \{.*?\}
для замены чего-либо в пределах {}
завершается с ошибкой:
IndexError: tuple index out of range
IndexError Traceback (most recent call last)
<ipython-input-1-20460659c049> in <module>
----> 1 get_ipython().run_cell_magic('spark_sql', '--limit 200', "select url, regexp_replace(title, '(http|ftp|file|https)://[-a-z0-9+&@#/\\%?=~_-|!:,.;/]*|\\<.*?\\>|\\{.*?\\}|(=+)\\s*(.*?)\\s*(=+)|&\\w+;', '') as text_body\n from df_table_doc\n")