Если я правильно понял вашу проблему, введите следующие данные:
stemmed_search = {'Group_1':['solicit', 'requier', 'día'], 'Group_2':['infraestruc', 'construccion', 'gas', 'nigrogen']}
test = ['solicit', 'requier', 'día', 'infraestruc', 'construccion', 'gas', 'nigrogen']
test2 = ['solicit', 'lol', 'lol', 'infraestruc', 'construccion', 'gas', 'nigrogen']
df = pd.DataFrame([[test,test2]], ['Stem']).T
Stem
0 [solicit, requier, día, infraestruc, construcc...
1 [solicit, lol, lol, infraestruc, construccion,...
Этот код для вас:
def compar2(test):
test = set(test)
return [1 if len(set(group) - test) < len(group) * 0.5 else 0 for _,group in stemmed_search.items()]
df['Text'] = df.Stem.apply(lambda x: compar2(x))
И дает:
Stem Text
0 [solicit, requier, día, infraestruc, construcc... [1, 1]
1 [solicit, lol, lol, infraestruc, construccion,... [0, 1]
РЕДАКТИРОВАТЬ : Другой пример:
def category_name(test):
return [k for k,group in stemmed_search.items() if len(set(group) - set(test)) < len(group) * 0.5]
stemmed_search = {'Food': ['pizza', 'chips', 'cheese', 'tomato', 'apple'],
'Animal': ['horse', 'snake', 'dog', 'cat'],
'School': ['book', 'pen', 'vocabolary', 'homework', 'student']
}
stemmed_articles = [['macheroni', 'car', 'pizza', 'free', 'dog', 'apple', 'chips'],
['dog', 'hungry', 'cat', 'kill', 'snake', 'gas', 'apple'],
['student', 'train', 'car', 'pen', 'homework', 'table', 'book']
]
df = pd.DataFrame([stemmed_articles], ['stemmed_articles']).T
df['categories'] = df.stemmed_articles.apply(lambda x: compar2(x))
df['categories_name'] = df.stemmed_articles.apply(lambda x: category_name(x))
И дает:
stemmed_articles categories categories_name
0 [macheroni, car, pizza, free, dog, apple, chips] [1, 0, 0] [Food]
1 [dog, hungry, cat, kill, snake, gas, apple] [0, 1, 0] [Animal]
2 [student, train, car, pen, homework, table, book] [0, 0, 1] [School]