У меня есть список строк ссылок, преобразованный из фрейма данных.
Список строк ссылок
brand_list = ['scurfa', 'seagull', 'seagull', 'seiko']
Пример ввода 1 для description_list
VINTAGE KING SEIKO 44-9990 Gold Medallion,Manual Winding with mod caseback.Serviced 2019.
Пример ввода 2 для description_list
Power reserve function at 12; push-pull crown at 4
Seiko NE57 auto movement with power reserve
Multilayered dial with SuperLuminova BG-W9
Желаемый вывод
SEIKO 44-9990 #extract together with model name
Seiko NE57 #extract together with model name
Это мой пример кода, но вывод не тот, который я хочу
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import numpy as np
stop_words = set(stopwords.words('english'))
def clean(doc):
no_punct = ""
word_tokens = word_tokenize(doc.lower())
filtered_sentence = [w for w in word_tokens if not w in stop_words]
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
return filtered_sentence
description_list = clean(soup_content.find('blockquote', { "class": "postcontent restore" }).text)
if pandas.Series(np.array(description_list)).isin(np.array(brand_list)).any() == True:
brand_result = [i for i in description_list if i in brand_list]
print(brand_result[0])
if pandas.Series(np.array(description_list)).isin(np.array(model_list)).any() == True:
model_result = [i for i in description_list if i in model_list]
print(model_result[0])
else:
print('Unknown')
else:
print('Unknown')
print('Unknown')