ValueError: Элементы feature_columns должны быть _FeatureColumn. (Тензор потока 1.13) - PullRequest
2 голосов
/ 17 мая 2019

Я сталкиваюсь с ошибкой ValueError при запуске Tensorflow-1.13 + Horovod-0.16 + Spark-0.24 + Petastorm-0.17.Это простая реализация model_fn и некоторых индикаторных столбцов, но она выдает ошибку, похожую на Элементы feature_columns должны быть _FeatureColumn.(Tensorflow 1.8)


[1,1]<stderr>:    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
[1,1]<stderr>:  File "/usr/local/lib/python3.5/site-packages/tensorflow_estimator/python/estimator/", line 1112, in _call_model_fn
[1,1]<stderr>:    model_fn_results = self._model_fn(features=features, **kwargs)
[1,1]<stderr>:  File "/mnt/Optimitron/", line 616, in tf_model_fn
[1,1]<stderr>:  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/feature_column/", line 302, in input_layer
[1,1]<stderr>:    cols_to_output_tensors=cols_to_output_tensors)
[1,1]<stderr>:  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/feature_column/", line 181, in _internal_input_layer
[1,1]<stderr>:    feature_columns = _normalize_feature_columns(feature_columns)
[1,1]<stderr>:  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/feature_column/", line 2263, in _normalize_feature_columns
[1,1]<stderr>:    'Given (type {}): {}.'.format(type(column), column))
[1,1]<stderr>:ValueError: Items of feature_columns must be a _FeatureColumn. Given (type <class 'collections.IndicatorColumn'>): IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(...

Код работает нормально, если не запустить его через () и использовать обычный tf.Session ()или сеанс hvd.init ().Feature_columns генерируются как

def get_tf_features(indexers, whitelist_features=None):

        indexers (dict):
        whitelist_features (list[str]): optional list of features we want to include


    feature_columns = []
    if whitelist_features:
        selected_features = set(indexers.keys()).intersection(set(whitelist_features))
        selected_features = set(indexers.keys())

    for feature in selected_features:
        feature_type = FEATURE_INDEXERS.get(feature, OneHotIndexer).FEATURE_TYPE
        if feature_type in (FeatureType["ONE_HOT"], FeatureType["MULTI_HOT"]):
            col = tf.feature_column.indicator_column(
                    vocabulary_list=[k for k in indexers[feature]],
        elif feature_type == FeatureType["MULTI_HOT_SCORED"]:
            col = tf.feature_column.weighted_categorical_column(
                    key=feature + INDEX_SUFFIX,
                    vocabulary_list=[k for k in indexers[feature]],
                feature + SCORE_SUFFIX,
            raise Exception("whoops")


    return feature_columns

feature_columns = get_tf_columns(indexers)

, а model_fn - это просто линейный классификатор:

def tf_model_fn(features, labels, mode, params):

        features: This is the first item returned from the input_fn passed to train, evaluate, and predict.
            This should be a single tf.Tensor or dict of same.
        labels: This is the second item returned from the input_fn passed to train, evaluate, and predict.
            This should be a single tf.Tensor or dict of same (for multi-head models).
            If mode is tf.estimator.ModeKeys.PREDICT, labels=None will be passed.
            If the model_fn's signature does not accept mode, the model_fn must still be able to handle labels=None.
        mode (tf.estimator.ModeKeys): Optional. Specifies if this training, evaluation or prediction. See tf.estimator.ModeKeys.
        params (dict): optional dict of hyperparameters, received from Estimator instantiation


    import horovod.tensorflow as hvd

    # Build the dense model
    net = tf.feature_column.input_layer(features, list(params['feature_columns']))
    for units in params['hidden_units']:
        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)
    logits = tf.layers.dense(net, 1, activation=None)

    # Generate predictions (for PREDICT and EVAL mode)
    probabilities = tf.nn.sigmoid(logits)
    predictions = {
        'probabilities': probabilities,
        'logits': logits,
    # Compute log-loss (for sigmoid activation)
    # y_train.reshape((y_train.shape[0], 1))
    labels_reshaped = tf.reshape(labels, [-1, 1])
    loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=labels_reshaped, logits=logits)
    # loss = tf.losses.log_loss(labels, probabilities)

    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = params.get('optimizer', hvd.DistributedOptimizer(tf.train.FtrlOptimizer(learning_rate=params["learning_rate"])))

        train_op = optimizer.minimize(
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)
    elif mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss)
    elif mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
        raise Exception("incorrect mode!")

, а модель обучается с помощью, feature_columns, ...)

Я знаю, чтовсе столбцы передаются правильно, но из другой связанной проблемы кажется, что что-то не так с тем, как Spark упаковывает столбцы для Tensorflow?

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.