Вот код для StratifiedKFold
с l oop
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=2020)
for train_idx, val_idx in kfold.split(train,labels):
x_train,y_train=train[train_idx],labels[train_idx]
x_val,y_val=train[val_idx],labels[val_idx]
count_vectorizer = CountVectorizer()
count_vectorizer.fit(x_train)
X_train_cv = count_vectorizer.transform(x_train)
X_val_cv = count_vectorizer.transform(x_val)
cv_classifier = LogisticRegression(solver='lbfgs',C=25,max_iter=500)
cv_classifier.fit(X_train_cv, y_train)
y_pred = cv_classifier.predict(X_val_cv)
f1=f1_score(y_val, y_pred,average='macro')
print(f1)
Результат, который я получил,
0.49
0.46
0.48
0.48
0.50
После cross_val_score
кода
from sklearn.model_selection import cross_val_score
cv_classifier = LogisticRegression(solver='lbfgs',C=25,max_iter=500,class_weight='balance')
count_vectorizer = CountVectorizer()
count_vectorizer.fit(train)
train_cv = count_vectorizer.transform(train)
print(cross_val_score(cv_classifier,train_cv, labels, cv=StratifiedKFold(5,shuffle = True),scoring='f1_macro'))
Результат, который я получил,
0.70 0.74 0.70 0.734 0.679
EIDT Я добавил pipeline
cv_classifier = LogisticRegression(solver='lbfgs',C=25,max_iter=500,class_weight='balance')
classifier_pipeline = make_pipeline(CountVectorizer(), cv_classifier)
print(cross_val_score(classifier_pipeline,train, labels, cv=StratifiedKFold(5,shuffle = True),scoring='f1_macro'))