В моей таблице есть столбец, который содержит значение
"SANDOSTATINA LAR 10mg 2ml", "2mg SANDOSTATINA LAR", "LONGASTATINA LAR 5MG"
Я должен извлечь слова, которые содержат только
"SANDOSTATINA LAR" & "LONGASTATINA LAR"
Кто-нибудь может подсказать, как мне получить это с помощью spark
запрос:
val df001=spark.sql("""select account.account_name,account.account_code,account.onekey_code,account.province,product.brand_sku_name, from_unixtime(unix_timestamp(period, 'yyyy-MM-dd'), 'yyyy-MM') as sales_month,sales.sales_office,case when sales.sales_office = 'ITPU' then sales.net_sales else null end as net_sales_itpu,case when sales.sales_office = 'ITOS' then sales.net_sales else null end as net_sales_itos from
(select brand_sku_name,product_key from ph_com_r_ita_sales_integrator.it_dim_product where brand_sku_name like '%LONGASTATINA LAR%' or brand_sku_name like '%SANDOSTATINA LAR%') product
inner join
(select net_sales,period, account_code as sales_account_code,product_key as sales_product_key, equivalent_units,sales_office,data_source_channel from ph_com_r_ita_sales_integrator.it_fact_sales where sales_office = 'ITOS' or sales_office = 'ITPU' and data_source = 'Ex Factory') sales
on sales.sales_product_key = product.product_key
inner join
(select account_name,account_code,onekey_code,province from ph_com_r_ita_sales_integrator.it_dim_account) account
on account.account_code = sales.sales_account_code""")
df001.select(col("brand_sku_name").show
==============================
SANDOSTATINA LAR 20 MG/2 ML
LONGASTATINA LAR 10 MG/2 ML
SANDOSTATINA LAR 10 MG/2 ML
10 MG/2 ML SANDOSTATINA LAR
==============================
my Expected o/p
==============
SANDOSTATINA LAR
LONGASTATINA LAR
SANDOSTATINA LAR
SANDOSTATINA LAR