Набор данных Tensorflow 2.0 tenensflow.python.framework.errors_impl.InternalError: Невозможно проанализировать протора тензора - PullRequest
1 голос
/ 24 октября 2019

Я пытаюсь создать реализацию ESPCN в TensorFlow 2.0 (https://arxiv.org/abs/1609.05158),), и я получаю эту ошибку, когда запускаю код в Google Colab с аппаратным ускорителем, установленным в TPU:

2019-10-24 06:18:29.040953: E tensorflow/core/framework/dataset.cc:76] The Encode() method is not implemented for DatasetVariantWrapper objects.
Traceback (most recent call last):
  File "train.py", line 64, in <module>
    train_dataset = tpu_strategy.experimental_distribute_dataset(train_dataset)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 674, in experimental_distribute_dataset
    return self._extended._experimental_distribute_dataset(dataset)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/tpu_strategy.py", line 256, in _experimental_distribute_dataset
    split_batch_by=self._num_replicas_in_sync)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 81, in get_distributed_dataset
    input_context=input_context)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 558, in __init__
    input_context=input_context)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 520, in __init__
    cloned_dataset, len(input_workers.worker_devices), i)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_ops.py", line 49, in auto_shard_dataset
    return distribute._AutoShardDataset(dataset, num_shards, index)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 56, in __init__
    **self._flat_structure)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_experimental_dataset_ops.py", line 171, in auto_shard_dataset
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:CPU:0 in order to run AutoShardDataset: Unable to parse tensor proto
Additional GRPC error information:
{"created":"@1571891825.075392283","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Unable to parse tensor proto","grpc_status":3} [Op:AutoShardDataset]
2019-10-24 04:37:05.421592: E tensorflow/core/distributed_runtime/rpc/eager/grpc_eager_client.cc:72] Remote EagerContext with id 7626715715211053942 does not seem to exist.

Набор данных создается с помощью этой функции

def get_training_set(upscale_factor):
    root_dir = download_bsd300()
    train_dir = join(root_dir, "train/*.jpg")
    names = tf.data.Dataset.list_files(train_dir)
    images = names.map(get_image_from_file)
    return images

и функции get_image_from_file:

def get_image_from_file(filename, crop_size=256):
    image = tf.io.read_file(filename)
    image = tf.image.decode_jpeg(image)
    image= tf.cast(image, tf.float32)
    image_height = tf.shape(image)[0]
    image_width = tf.shape(image)[1]
    offset_height = (image_height-crop_size) // 2
    offset_width = (image_width-crop_size) // 2
    original_image = tf.image.crop_to_bounding_box(image, offset_height, offset_width, crop_size, crop_size)
    downsampled_image = tf.image.resize(original_image, [crop_size // 2, crop_size // 2])
    # convert to 0~1 and change HWC to CHW 
    # (Because the network accepts single channel.
    # The network will reshape NCHW input to (NC)*H*W*1.)
    original_image = tf.transpose(original_image / 255.0, [2, 0, 1])
    downsampled_image = tf.transpose(downsampled_image / 255.0, [2, 0, 1])
    return downsampled_image, original_image

Набор данных отлично работает без функции experimental_distribute_dataset, поэтому я думаю, что-то пошло не таккогда код преобразовывался. Однако, поскольку я новичок в TPU, трудно понять, почему ... кто-нибудь может мне помочь?

Я создал хранилище github и клонировал его на Colab для его запуска.

Заранее спасибо, и извините за мой плохой английский.

...