Тренировочный трансформатор с Tensor2Tensor с использованием собственных данных - PullRequest
0 голосов
/ 17 апреля 2019

Я пытаюсь обучить сеть Transformer, используя Tensor2Tensor. Я адаптирую пример Cloud Poetry , чтобы он соответствовал моей собственной задаче kt_problem, в которой я сопоставляю последовательности с плавающей точкой с последовательностями с плавающей точкой вместо предложений с предложениями.

Я адаптировал функции generate_data() и generate_samples() в соответствии с разрозненными спецификациями для использования собственных данных с тензорным тензором (например, генерация данных README , строка 174 Problem класс и т. д.). Они следующие:

  def generate_samples(self, data_dir, tmp_dir, train):
    import numpy as np
    features = pd.read_csv("data/kt/features.csv", dtype=np.float64)
    targets = pd.read_csv("data/kt/targets.csv", dtype=np.float64)
    for i in range(len(features)-1):
        yield {
                "inputs": list(features.iloc[i]),
                "targets": list(targets.iloc[i])
        }




  def generate_data(self, data_dir, tmp_dir, task_id=-1):
        generator_utils.generate_dataset_and_shuffle(
        self.generate_samples(data_dir,tmp_dir,1),
        self.training_filepaths(data_dir,4,False),
        self.generate_samples(data_dir,tmp_dir,0),
        self.dev_filepaths(data_dir,3,False))

определено в моем классе KTProblem.

После внесения этого изменения я могу успешно запустить

PROBLEM='kt_problem'    #my own problem, for which I've defined a class

%%bash
DATA_DIR=./t2t_data     
TMP_DIR=$DATA_DIR/tmp

t2t-datagen \
  --t2t_usr_dir=./kt/trainer \
  --problem=$PROBLEM \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR

и генерирует кучу файлов train и dev. Но когда я пытаюсь обучить этому преобразователю этот код,

%%bash
DATA_DIR=./t2t_data
OUTDIR=./trained_model

t2t-trainer \
  --data_dir=$DATA_DIR \
  --t2t_usr_dir=./kt/trainer \
  --problem=$PROBLEM \
  --model=transformer \
  --hparams_set=transformer_kt \
  --output_dir=$OUTDIR --job-dir=$OUTDIR --train_steps=10

Выдает следующую ошибку:

ValueError: x has to be a floating point tensor since it's going to be scaled. Got a <dtype: 'int32'> tensor instead.

Как вы можете видеть в generate_samples(), сгенерированные данные np.float64, и поэтому я уверен, что мои входные данные не должны быть int32. Трассировка стека (размещена справа внизу) очень длинная, и я просматривал каждую из перечисленных строк и проверял тип входов, чтобы увидеть, где этот int32 вход попал на картинку, но я не могу его найти. Я хочу знать (1), почему, если мои входные данные являются числами с плавающей точкой, почему / как / где они становятся числами с плавающей точкой, но в основном (2) в целом, как один код отладки подобен этому? До сих пор мой подход заключался в том, чтобы помещать операторы печати прямо перед каждой строкой трассировки стека, но это кажется таким наивным способом отладки. Было бы лучше использовать VScode, или какой урок мне нужно выучить здесь, когда библиотека tensor2tensor, в этом случае, не ведет себя так, как я думаю, но я не хочу, чтобы узнать близко что делает каждая функция в трассировке стека?

Трассировка стека:

INFO:tensorflow:Importing user module trainer from path /home/crytting/kt/kt
WARNING:tensorflow:From /home/crytting/kt/tensor2tensor/tensor2tensor/utils/trainer_lib.py:240: RunConfig.__init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
INFO:tensorflow:Configuring DataParallelism to replicate the model.
INFO:tensorflow:schedule=continuous_train_and_eval
INFO:tensorflow:worker_gpu=1
INFO:tensorflow:sync=False
WARNING:tensorflow:Schedule=continuous_train_and_eval. Assuming that training is running on a single machine.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:ps_devices: ['gpu:0']
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f04151caba8>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_train_distribute': None, '_eval_distribute': None, '_device_fn': None, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_protocol': None, '_session_config': gpu_options {
  per_process_gpu_memory_fraction: 0.95
}
allow_soft_placement: true
graph_options {
  optimizer_options {
    global_jit_level: OFF
  }
}
isolate_session_state: true
, '_save_checkpoints_steps': 1000, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': './trained_model', 'use_tpu': False, 't2t_device_info': {'num_async_replicas': 1}, 'data_parallelism': <tensor2tensor.utils.expert_utils.Parallelism object at 0x7f0464512dd8>}
WARNING:tensorflow:Estimator's model_fn (<function T2TModel.make_estimator_model_fn.<locals>.wrapping_model_fn at 0x7f0414891e18>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:ValidationMonitor only works with --schedule=train_and_evaluate
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 1000 or save_checkpoints_secs None.
WARNING:tensorflow:From /home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
INFO:tensorflow:Reading data files from ./t2t_data/kt_problem-train*
INFO:tensorflow:partition: 0 num_data_files: 4
WARNING:tensorflow:From /home/crytting/kt/tensor2tensor/tensor2tensor/utils/data_reader.py:275: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From /home/crytting/kt/tensor2tensor/tensor2tensor/utils/data_reader.py:37: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:Shapes are not fully defined. Assuming batch_size means tokens.
WARNING:tensorflow:From /home/crytting/kt/tensor2tensor/tensor2tensor/utils/data_reader.py:233: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'train'
INFO:tensorflow:Using variable initializer: uniform_unit_scaling
INFO:tensorflow:Building model body
WARNING:tensorflow:From /home/crytting/kt/tensor2tensor/tensor2tensor/models/transformer.py:156: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Traceback (most recent call last):
  File "/home/crytting/anaconda3/envs/kt/bin/t2t-trainer", line 33, in <module>
    tf.app.run()
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/crytting/anaconda3/envs/kt/bin/t2t-trainer", line 28, in main
    t2t_trainer.main(argv)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/bin/t2t_trainer.py", line 400, in main
    execute_schedule(exp)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/bin/t2t_trainer.py", line 356, in execute_schedule
    getattr(exp, FLAGS.schedule)()
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/utils/trainer_lib.py", line 400, in continuous_train_and_eval
    self._eval_spec)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate
    return executor.run()
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 611, in run
    return self.run_local()
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 712, in run_local
    saving_listeners=saving_listeners)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1155, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/utils/t2t_model.py", line 1414, in wrapping_model_fn
    use_tpu=use_tpu)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/utils/t2t_model.py", line 1477, in estimator_model_fn
    logits, losses_dict = model(features)  # pylint: disable=not-callable
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 530, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/utils/t2t_model.py", line 323, in call
    sharded_logits, losses = self.model_fn_sharded(sharded_features)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/utils/t2t_model.py", line 400, in model_fn_sharded
    sharded_logits, sharded_losses = dp(self.model_fn, datashard_to_features)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/utils/expert_utils.py", line 231, in __call__
    outputs.append(fns[i](*my_args[i], **my_kwargs[i]))
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/utils/t2t_model.py", line 428, in model_fn
    body_out = self.body(transformed_features)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/models/transformer.py", line 280, in body
    **decode_kwargs
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/models/transformer.py", line 217, in decode
    **kwargs)
  File "/home/crytting/kt/tensor2tensor/tensor2tensor/models/transformer.py", line 156, in transformer_decode
    1.0 - hparams.layer_prepostprocess_dropout)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 2979, in dropout
    return dropout_v2(x, rate, noise_shape=noise_shape, seed=seed, name=name)
  File "/home/crytting/anaconda3/envs/kt/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 3021, in dropout_v2
    " be scaled. Got a %s tensor instead." % x.dtype)
ValueError: x has to be a floating point tensor since it's going to be scaled. Got a <dtype: 'int32'> tensor instead.
...