Я пытаюсь вычислить информационные значения, и у меня возникают некоторые проблемы.
# Calculate Information Values
factor_vars = c(ITEM_CATEGORY_DESCR,ITEM_DESCR,PRODUCT_SUB_LINE_DESCR,
MAJOR_CATEGORY_DESCR,
CUSTOMER_NAME,CUST_BRANCH_DESCR,PROGRAM_LEVEL_DESCR,CUST_STATE_KEY,
CUST_REGION_DESCR, CUST_CITY) # get all categorical variables
all_iv = data.frame(VARS=factor_vars, IV=numeric(length(factor_vars)),
STRENGTH=character(length(factor_vars)), stringsAsFactors = FALSE)
# init output dataframe
for (factor_var in factor_vars){
all_iv[all_iv$VARS == factor_var, "IV"] <-
InformationValue::IV(X=new_df[, factor_var], Y=new_df$product_category)
all_iv[all_iv$VARS == factor_var, "STRENGTH"] <-
attr(InformationValue::IV(X=new_df[, factor_var],
Y=new_df$product_category), "howgood")
}
Сообщение об ошибке отображается как:
Error in `[.data.frame`(new_df, , factor_var) :
undefined columns selected
In addition: Warning messages:
1: In `[<-.factor`(`*tmp*`, which(Y == valueOfGood), value = 1) :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, which(!(Y == "1")), value = 0) :
invalid factor level, NA generated
Как это исправить?