У меня очень простая многоагентная среда, настроенная для использования с ray.rllib, и я пытаюсь запустить простой базовый тест сценария обучения PPO vs. Random Policy следующим образом:
register_env("my_env", lambda _: MyEnv(num_agents=2))
mock = MyEnv()
obs_space = mock.observation_space
act_space = mock.action_space
tune.run(
"PPO",
stop={"training_iteration": args.num_iters},
config={
"env": "my_env",
"num_gpus":1,
"multiagent": {
"policies": {
"ppo_policy": (None, obs_space, act_space, {}),
"random": (RandomPolicy, obs_space, act_space, {}),
},
"policy_mapping_fn": (
lambda agent_id: {1:"appo_policy", 2:"random"}[agent_id]),
},
},
)
При тестировании я получаю сообщение об ошибке:
Traceback (most recent call last):
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 381, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/worker.py", line 1513, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.__init__() (pid=18163, ip=192.168.1.25)
File "python/ray/_raylet.pyx", line 414, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 450, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 452, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 407, in ray._raylet.execute_task.function_executor
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 90, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 455, in __init__
super().__init__(config, logger_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/tune/trainable.py", line 174, in __init__
self._setup(copy.deepcopy(self.config))
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 596, in _setup
self._init(self.config, self.env_creator)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _init
self.optimizer = make_policy_optimizer(self.workers, config)
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/agents/ppo/ppo.py", line 95, in choose_policy_optimizer
shuffle_sequences=config["shuffle_sequences"])
File "/home/me/anaconda3/envs/dorsa/lib/python3.7/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 99, in __init__
"Only TF graph policies are supported with multi-GPU. "
ValueError: Only TF graph policies are supported with multi-GPU. Try setting `simple_optimizer=True` instead.
Я попытался установить simple_optimizer:True
в конфигурации, но это дало мне NotImplementedError
в функции set_weights
в rllib класс политики ...
Я отключил "PPO"
в конфигурации для "PG"
, и все прошло нормально, поэтому вряд ли это связано с тем, как я определил свою среду. Есть идеи как это исправить?