ValueError: обнаружено повторяющееся имя столбца функции для столбцов: - PullRequest
0 голосов
/ 19 июня 2020

Я слежу за учебником на YouTube по тензорному потоку, так как я полный нуб. Я пытаюсь получить значение точности, но выхожу на ошибку. Я думаю, что по какой-то причине кажется, что он создает несколько столбцов для возраста и стоимости проезда. Но я не могу понять почему. Я использую tenorflow версии 2.2.0 и python 3.7.7. Если вам нужна другая информация, просто спросите и спасибо, что нашли время мне помочь.

ошибка

Traceback (most recent call last):
  File "C:\Users\will\Documents\#tensorflow crash course.py", line 104, in <module>
    linear_est.train(train_input_fn)  # train
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 349, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1182, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1211, in _train_model_default
    self.config)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1170, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow_estimator\python\estimator\canned\linear.py", line 943, in _model_fn
    sparse_combiner=sparse_combiner)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow_estimator\python\estimator\canned\linear.py", line 667, in _linear_model_fn_v2
    features=features)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow_estimator\python\estimator\canned\linear.py", line 599, in _linear_model_fn_builder_v2
    name='linear/linear_model')
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow\python\feature_column\feature_column_v2.py", line 712, in __init__
    **kwargs)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow\python\feature_column\feature_column_v2.py", line 491, in __init__
    self._feature_columns = _normalize_feature_columns(feature_columns)
  File "C:\Users\will\miniconda3\lib\site-packages\tensorflow\python\feature_column\feature_column_v2.py", line 2819, in _normalize_feature_columns
    name_to_column[column.name]))
ValueError: Duplicate feature column name found for columns: NumericColumn(key='age', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None) and NumericColumn(key='age', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None). This usually means that these columns refer to same base feature. Either one must be discarded or a duplicated but renamed item must be inserted in features dict.

код

from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from IPython.display import clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc

import tensorflow as tf 

print(tf.version)#checks version wanted 2.0 for this tutorial

dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') #training data from tf website
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') #training data
#print(dftrain.head())
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')
line because this video tutorial leaves out details

CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
                       'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']

feature_columns = []#timestamp 1:33:00 a lot to unpack here
for feature_name in CATEGORICAL_COLUMNS:
    vocabulary = dftrain[feature_name].unique() #gets list of all unique values from given feature column
    feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))

    for feature_name in NUMERIC_COLUMNS:
        feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

feature_columns
#print(dftrain["embark_town"].unique())

def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):#1:40:00 something about turning data into an object
  def input_function():  # inner function, this will be returned
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  # create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000)  # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)  # split dataset into batches of 32 and repeat process for number of epochs
    return ds  # return a batch of the dataset
  return input_function  # return a function object for use

train_input_fn = make_input_fn(dftrain, y_train)  # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)

linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
# We create a linear estimtor by passing the feature columns we created earlier

linear_est.train(train_input_fn)  # train
result = linear_est.evaluate(eval_input_fn)  # get model metrics/stats by testing on tetsing data

clear_output()  # clears consoke output
print(result['accuracy'])  # the result variable is simply a dict of stats about our model

1 Ответ

0 голосов
/ 19 июня 2020

Я не тестировал, но думаю, что есть ошибка с вашим for loops.

Вы уверены, что использовать вложенный for loops?

CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
                       'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']

feature_columns = []#timestamp 1:33:00 a lot to unpack here
for feature_name in CATEGORICAL_COLUMNS:
    vocabulary = dftrain[feature_name].unique() #gets list of all unique values from given feature column
    feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))

    for feature_name in NUMERIC_COLUMNS:
        feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

В этом Код, который вы добавляете в каждый l oop NUMERIC_COLUMNS в feature_columns, что, вероятно, приводит к вашей проблеме. Как уже упоминалось, я его не тестировал. Так что попробуйте это:

CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
                       'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']

feature_columns = []#timestamp 1:33:00 a lot to unpack here
for feature_name in CATEGORICAL_COLUMNS:
    vocabulary = dftrain[feature_name].unique() #gets list of all unique values from given feature column
    feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))

for feature_name in NUMERIC_COLUMNS:
    feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

В этом случае следует добавлять только единицы.

Надеюсь, это поможет, в противном случае просто дайте мне комментарий.

...