Я пытаюсь сделать мульти метку классификации для текстовых данных Мои данные выглядят так:
xtrain = array(['solve multi step real life mathematical problems posed positive negative rational numbers ',
'explain use relationship sine cosine complementary anglesstandards hs math content geometry similarity right triangles ',
'derive formula ab sin c area triangle drawing auxiliary line vertex perpendicular opposite sidestandards hs math content geometry similarity right triangles trigonometry apply trigonometry general triangles'])
ytrain = array(['common core state standards mathcommon core state standards math',
'real number systemcommon core state standards math standards hs math content number quantity',
'quantitiescommon core state standards math standards hs math content number quantity'])
labels = [['rational numbers','fractions as decimals','mathematics'],['trigonometric and inverse
trigonometric functions'],['trigonometric and inverse
,trigonometric,functions',trigonometry]]
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
label_fit = mlb.fit_transform(labels)
classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))])
model= classifier.fit(xtrain, y_train)
predicted = model.predict(xtest)
classes=mlb.classes_
Class_list = list(classes)
result = [[x for x, y in zip(Class_list, i) if y] for i in predicted ]
result_df = pd.DataFrame.from_records(result)
Мои данные выглядят так:
Text Label Label Label
Seeing Structure in Expressions factoring polynomials
Trigonometric Functions
Use polynomial identities to solve problems difference of two squares polynomials specialpolynomials
Аналогично, прогноз для нескольких записей пуст. Я попытался с RandomForest Classifier, а также. Это все еще не предсказывает для немногих. Пожалуйста, помогите.