Я попытался подготовить входные и выходные данные для проблемы выбора характеристики c, но обнаружил проблему в некоторых столбцах, которая не выглядит как Unicode:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-89-78f2cf157d88> in <module>
1 # prepare input data
----> 2 X_train_enc, X_test_enc = prepare_inputs(X_train, X_test)
3 # prepare output
4 y_train_enc, y_test_enc = prepare_targets(y_train, y_test)
<ipython-input-86-e63e5d5fad63> in prepare_inputs(X_train, X_test)
3 oe.fit(X_train)
4 X_train_enc = oe.transform(X_train)
----> 5 X_test_enc = oe.transform(X_test)
6 return X_train_enc, X_test_enc
7
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py in transform(self, X)
812
813 """
--> 814 X_int, _ = self._transform(X)
815 return X_int.astype(self.dtype, copy=False)
816
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\preprocessing\_encoders.py in _transform(self, X, handle_unknown)
105 msg = ("Found unknown categories {0} in column {1}"
106 " during transform".format(diff, i))
--> 107 raise ValueError(msg)
108 else:
109 # Set the problematic rows to an acceptable value and
ValueError: Found unknown categories ['Fès-Meknès'] in column 4 during transform
Вот выдержка из столбцов:
Do you agree Gender Age City Urban/Rural Output
0 Yes Female 25-34 Madrid Urban Will buy
1 No Male 18-25 Fès-Meknès Rural Won't
2 ... ... ... ... ... Undecided
....
Fès-Meknès должно быть Fès-Meknès
.
Вот код, который я сделал, чтобы получить данные:
def load_dataset():
connection = psycopg2.connect(user = "user",
password = "passwd",
host = "host",
port = "5432",
database = "database")
sql = "select * from capi limit 10;"
# load the table
df = pd.read_sql_query(sql, connection)
# retrieve numpy array
dataset = df.values
# split into input (X) and output (y) variables
cols = df.iloc[:,5:].columns.array
filtered_cols = ['TL_Segment']
cols = [col for col in cols if col not in filtered_cols]
X = df.loc[:, cols] #independent columns
X = X.astype(str)
y = df['TL_Segment'] #target column i.e price range
return X.values, y.values
, используя правильное кодирование: попытался не учитывать эти строки с помощью try catch: def prepare_inputs(X_train, X_test):
oe = OrdinalEncoder()
oe.fit(X_train)
try:
X_train_enc = oe.transform(X_train)
try: # imbricated in order not to return nothing in one of the two things returned
X_test_enc = oe.transform(X_test)
except ValueError as e:
print(e)
except ValueError as e:
print(e)
return X_train_enc, X_test_enc
Но я все еще получаю следующее: Found unknown categories ['Fès-Meknès'] in column 4 during transform
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-126-78f2cf157d88> in <module>
1 # prepare input data
----> 2 X_train_enc, X_test_enc = prepare_inputs(X_train, X_test)
3 # prepare output
4 y_train_enc, y_test_enc = prepare_targets(y_train, y_test)
<ipython-input-124-2376647ab46e> in prepare_inputs(X_train, X_test)
10 except ValueError as e:
11 print(e)
---> 12 return X_train_enc, X_test_enc
13
UnboundLocalError: local variable 'X_test_enc' referenced before assignment