Вывод cross_val_score не отражает уточненную оценку gridsearchcv.Пожалуйста, перейдите вниз, чтобы увидеть мои результаты.
Я выполнил перекрестную проверку вложенности поиска по сетке, чтобы сначала выбрать оптимальные гиперпараметры для моей модели, а затем оценил модель, используя cross_val_score (см. Код ниже).Для моего gridsearch я выбрал refit_score, чтобы обновить оценщик.Для моего cross_val_score у меня есть несколько оценок метрики.
def grid_search_nested_cv(model,refit_score='precision_score'):
"""
fits a GridSearchCV classifier using refit_score for optimization
prints classifier performance metrics
performs both an inner and outer cross validation
"""
# To be used within GridSearch for parameter tuning
inner_cv = KFold(n_splits=5, shuffle=True, random_state=42)
# To be used in outer CV for model evaluation
outer_cv = KFold(n_splits=5, shuffle=True, random_state=42)
# Non_nested parameter search and scoring
grid_search = GridSearchCV(model, param_grid, scoring=scorers, refit=refit_score,
cv=inner_cv, return_train_score=True, n_jobs=-1)
print('Best params for {}'.format(refit_score))
print(grid_search.best_params_)
score_types=['accuracy', 'f1_weighted', 'precision_weighted', 'recall_weighted']
for scores_ in score_types:
performance = cross_val_score(grid_search, X_df, y_df, scoring=scores_,cv=outer_cv, n_jobs=-1)
print("{}: ".format(scores_) + str(round(100*performance.mean(), 2)) + "%")
return grid_search
# MODEL 1 :RANDOM FOREST (PRECISION WEIGHTED)
# Initialize Random Forest Classifier model
clf = RandomForestClassifier(n_jobs=-1)
# use a full grid over interested parameters
param_grid = {
'min_samples_split': [...],
'n_estimators' : [...],
'max_depth': [...],
'max_features': [..],
}
grid_search_clf = grid_search_nested_cv(clf,refit_score='precision_score')
# MODEL 2: RANDOM FOREST (RECALL WEIGHTED)
# Initialize Random Forest Classifier model
clf = RandomForestClassifier(n_jobs=-1)
# use a full grid over interested parameters
param_grid = {
'min_samples_split': [...],
'n_estimators' : [...],
'max_depth': [...],
'max_features': [..],
}
grid_search_clf = grid_search_nested_cv(clf,refit_score='recall_score')
MODEL 1: Precision weighted
Best params for precision_score
{'max_depth': 20, 'max_features': 0.3, 'min_samples_split': 15, 'n_estimators': 10}
Accuracy: 67.24%
F1: 63.58%
Precision: 61.02%
Recall: 70.38%
MODEL 2: RECALL weighted
Best params for recall_score
{'max_depth': 10, 'max_features': 0.5, 'min_samples_split': 18, 'n_estimators': 6}
Accuracy: 66.66%
F1: 63.39%
Precision: 63.25%
Recall: 61.37%
Model 1 (precision weighted) has a higher recall score than model 2 (recall weighted)
Model 2 has a higher precision score than model 1.
Почему это так?Должно ли быть наоборот?