Конкатенация функций
enter code here
X_train_set4 = hstack((X_train_price, X_train_quantity,X_train_projects,X_train_categories_one_hot,
X_train_subcategories_one_hot,X_train_states_one_hot,
X_train_teacher_prefix_one_hot,X_train_grade_category_one_hot,
X_train_tfidf_w2v_essay_vectors,X_train_tfidf_w2v_title_vectors)).tocsr()
X_test_set4 = hstack((X_test_price, X_test_quantity,X_test_projects,X_test_categories_one_hot,
X_test_subcategories_one_hot,X_test_states_one_hot,X_test_teacher_prefix_one_hot,
X_test_grade_category_one_hot,X_test_tfidf_w2v_essay_vectors,
X_test_tfidf_w2v_title_vectors)).tocsr()
X_cv_set4 = hstack((X_cv_price, X_cv_quantity, X_cv_projects, X_cv_categories_one_hot,
X_cv_subcategories_one_hot, X_cv_states_one_hot, X_cv_teacher_prefix_one_hot,
X_cv_grade_category_one_hot, X_cv_tfidf_w2v_essay_vectors,
X_cv_tfidf_w2v_title_vectors)).tocsr()
Печать формы
print("Set-4 Data Matrix")
print(X_train_set4.shape, Y_train.shape)
print(X_test_set4.shape, Y_test.shape)
print(X_cv_set4.shape, Y_cv.shape)
Создание модели
dtc = DecisionTreeClassifier(class_weight='balanced')
parameters = {'max_depth':[2, 3, 4, 5, 6, 7, 8, 9, 10], 'min_samples_split':[5, 10, 100, 500]}
clf4 = GridSearchCV(dtc, parameters, cv=5, scoring='roc_auc', return_train_score='True')
clf4.fit(X_train_set4, Y_train)
Ошибка
Получение значения ошибки. Я пытался проверить на наличие нанов, но не смог найти X_train. В Y_train нет нянь
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').