Сельдерей + Django на Elastic Beanstalk, вызывающий ошибку: <class 'xmlrpclib.Fault'>, <Ошибка 6: 'SHUTDOWN_STATE'> - PullRequest
0 голосов
/ 02 мая 2018

У меня установлено приложение Django 2 на AWS Elastic Beanstalk. Я настроил Celery для выполнения асинхронных задач на той же машине. С тех пор как я добавил Celery, каждый раз, когда я заново развертываю свое приложение eb deploy myapp-env, я получаю следующую ошибку:

ERROR: [Instance: i-0bfa590abfb9c4878] Command failed on instance. Return code: 2 Output: (TRUNCATED)...
ERROR: already shutting down
error: <class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800
error: <class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800. 
Hook /opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
INFO: Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
ERROR: Unsuccessful command execution on instance id(s) 'i-0bfa590abfb9c4878'. Aborting the operation.
ERROR: Failed to deploy application.

Чтобы иметь возможность обновить код моего приложения, мне нужно заново создать новую среду, выполнив следующие действия:

$ eb terminate myapp-env # terminate the environment
$ eb create myapp-env # re-create it

Конфигурационный файл Celery, который я использую, следующий:

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      # Create required directories
      sudo mkdir -p /var/log/celery/
      sudo mkdir -p /var/run/celery/

      # Create group called 'celery'
      sudo groupadd -f celery
      # add the user 'celery' if it doesn't exist and add it to the group with same name
      id -u celery &>/dev/null || sudo useradd -g celery celery
      # add permissions to the celery user for r+w to the folders just created
      sudo chown -R celery:celery /var/log/celery/
      sudo chown -R celery:celery /var/run/celery/

      # Get django environment variables
      celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
      celeryenv=${celeryenv%?}

      # Create CELERY configuration script
      celeryconf="[program:celeryd]
      directory=/opt/python/current/app
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery worker -A celery_conf.celery_app:app --loglevel=INFO --logfile=\"/var/log/celery/%%n%%I.log\" --pidfile=\"/var/run/celery/%%n.pid\"

      user=celery
      numprocs=1
      stdout_logfile=/var/log/celery-worker.log
      stderr_logfile=/var/log/celery-worker.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=998

      environment=$celeryenv"


      # Create CELERY BEAT configuraiton script
      celerybeatconf="[program:celerybeat]
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery beat -A celery_conf.celery_app:app --loglevel=INFO --logfile=\"/var/log/celery/celery-beat.log\" --pidfile=\"/var/run/celery/celery-beat.pid\"

      directory=/opt/python/current/app
      user=celery
      numprocs=1
      stdout_logfile=/var/log/celerybeat.log
      stderr_logfile=/var/log/celerybeat.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 60

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=999

      environment=$celeryenv"

      # Create the celery supervisord conf script
      echo "$celeryconf" | tee /opt/python/etc/celery.conf
      echo "$celerybeatconf" | tee /opt/python/etc/celerybeat.conf

      # Add configuration script to supervisord conf (if not there already)
      if ! grep -Fxq "celery.conf" /opt/python/etc/supervisord.conf
        then
          echo "[include]" | tee -a /opt/python/etc/supervisord.conf
          echo "files: uwsgi.conf celery.conf celerybeat.conf" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Enable supervisor to listen for HTTP/XML-RPC requests.
      # supervisorctl will use XML-RPC to communicate with supervisord over port 9001.
      # Source: https://askubuntu.com/questions/911994/supervisorctl-3-3-1-http-localhost9001-refused-connection
      if ! grep -Fxq "[inet_http_server]" /opt/python/etc/supervisord.conf
        then
          echo "[inet_http_server]" | tee -a /opt/python/etc/supervisord.conf
          echo "port = 127.0.0.1:9001" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Reread the supervisord config
      supervisorctl -c /opt/python/etc/supervisord.conf reread

      # Update supervisord in cache without restarting all services
      supervisorctl -c /opt/python/etc/supervisord.conf update

      # Start/Restart celeryd through supervisord
      supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
      supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat


commands:
  01_killotherbeats:
    command: "ps auxww | grep 'celery beat' | awk '{print $2}' | sudo xargs kill -9 || true"
    ignoreErrors: true
  02_restartbeat:
    command: "supervisorctl -c /opt/python/etc/supervisord.conf restart celerybeat"
    leader_only: true

/ вар / Журнал / ЕВ-activity.log

[2018-05-02T13:30:06.402Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2/AppDeployStage1/AppDeployEnactHook] : Completed activity. Result:
  Successfully execute hooks in directory /opt/elasticbeanstalk/hooks/appdeploy/enact.
[2018-05-02T13:30:06.402Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2/AppDeployStage1/AppDeployPostHook] : Starting activity...
[2018-05-02T13:30:06.402Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2/AppDeployStage1/AppDeployPostHook/run_supervised_celeryd.sh] : Starting activity...
[2018-05-02T13:30:07.544Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2/AppDeployStage1/AppDeployPostHook/run_supervised_celeryd.sh] : Activity execution failed, because: [program:celeryd]
  directory=/opt/python/current/app
  ; Set full path to celery program if using virtualenv
  command=/opt/python/run/venv/bin/celery worker -A celery_conf.celery_app:app --loglevel=INFO --logfile="/var/log/celery/%%n%%I.log" --pidfile="/var/run/celery/%%n.pid"

  user=celery
  numprocs=1
  stdout_logfile=/var/log/celery-worker.log
  stderr_logfile=/var/log/celery-worker.log
  autostart=true
  autorestart=true
  startsecs=10

  ; Need to wait for currently executing tasks to finish at shutdown.
  ; Increase this if you have very long running tasks.
  stopwaitsecs = 60

  ; When resorting to send SIGKILL to the program to terminate it
  ; send SIGKILL to its whole process group instead,
  ; taking care of its children as well.
  killasgroup=true

  ; if rabbitmq is supervised, set its priority higher
  ; so it starts first
  priority=998

  environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%(ENV_PATH)s",RDS_PORT="5432",APP_ENV="test",RDS_PASSWORD="XXXXXXXXXX",DJANGO_SETTINGS_MODULE="olem.settings_prod",PYCURL_SSL_LIBRARY="nss",RDS_USERNAME="testuser",RDS_DB_NAME="olemapptestingDB",RDS_HOSTNAME="olemapptesting.cw2kqid7nxi9.us-east-1.rds.amazonaws.com"
  [program:celerybeat]
  ; Set full path to celery program if using virtualenv
  command=/opt/python/run/venv/bin/celery beat -A celery_conf.celery_app:app --loglevel=INFO --logfile="/var/log/celery/celery-beat.log" --pidfile="/var/run/celery/celery-beat.pid"

  directory=/opt/python/current/app
  user=celery
  numprocs=1
  stdout_logfile=/var/log/celerybeat.log
  stderr_logfile=/var/log/celerybeat.log
  autostart=true
  autorestart=true
  startsecs=10

  ; Need to wait for currently executing tasks to finish at shutdown.
  ; Increase this if you have very long running tasks.
  stopwaitsecs = 60

  ; When resorting to send SIGKILL to the program to terminate it
  ; send SIGKILL to its whole process group instead,
  ; taking care of its children as well.
  killasgroup=true

  ; if rabbitmq is supervised, set its priority higher
  ; so it starts first
  priority=999

  environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%(ENV_PATH)s",RDS_PORT="5432",APP_ENV="test",RDS_PASSWORD="XXXXXXXXXX",DJANGO_SETTINGS_MODULE="olem.settings_prod",PYCURL_SSL_LIBRARY="nss",RDS_USERNAME="testuser",RDS_DB_NAME="olemapptestingDB",RDS_HOSTNAME="olemapptesting.cw2kqid7nxi9.us-east-1.rds.amazonaws.com"
  [include]
  files: uwsgi.conf celery.conf celerybeat.conf
  ERROR: supervisor shutting down
  ERROR: already shutting down
  error: <class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800
  error: <class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800 (ElasticBeanstalk::ExternalInvocationError)
caused by: [program:celeryd]
  directory=/opt/python/current/app
  ; Set full path to celery program if using virtualenv
  command=/opt/python/run/venv/bin/celery worker -A celery_conf.celery_app:app --loglevel=INFO --logfile="/var/log/celery/%%n%%I.log" --pidfile="/var/run/celery/%%n.pid"

  user=celery
  numprocs=1
  stdout_logfile=/var/log/celery-worker.log
  stderr_logfile=/var/log/celery-worker.log
  autostart=true
  autorestart=true
  startsecs=10

  ; Need to wait for currently executing tasks to finish at shutdown.
  ; Increase this if you have very long running tasks.
  stopwaitsecs = 60

  ; When resorting to send SIGKILL to the program to terminate it
  ; send SIGKILL to its whole process group instead,
  ; taking care of its children as well.
  killasgroup=true

  ; if rabbitmq is supervised, set its priority higher
  ; so it starts first
  priority=998

  environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%(ENV_PATH)s",RDS_PORT="5432",APP_ENV="test",RDS_PASSWORD="XXXXXXXXXX",DJANGO_SETTINGS_MODULE="olem.settings_prod",PYCURL_SSL_LIBRARY="nss",RDS_USERNAME="testuser",RDS_DB_NAME="olemapptestingDB",RDS_HOSTNAME="olemapptesting.cw2kqid7nxi9.us-east-1.rds.amazonaws.com"
  [program:celerybeat]
  ; Set full path to celery program if using virtualenv
  command=/opt/python/run/venv/bin/celery beat -A celery_conf.celery_app:app --loglevel=INFO --logfile="/var/log/celery/celery-beat.log" --pidfile="/var/run/celery/celery-beat.pid"

  directory=/opt/python/current/app
  user=celery
  numprocs=1
  stdout_logfile=/var/log/celerybeat.log
  stderr_logfile=/var/log/celerybeat.log
  autostart=true
  autorestart=true
  startsecs=10

  ; Need to wait for currently executing tasks to finish at shutdown.
  ; Increase this if you have very long running tasks.
  stopwaitsecs = 60

  ; When resorting to send SIGKILL to the program to terminate it
  ; send SIGKILL to its whole process group instead,
  ; taking care of its children as well.
  killasgroup=true

  ; if rabbitmq is supervised, set its priority higher
  ; so it starts first
  priority=999

  environment=PYTHONPATH="/opt/python/current/app/:",PATH="/opt/python/run/venv/bin/:%(ENV_PATH)s",RDS_PORT="5432",APP_ENV="test",RDS_PASSWORD="XXXXXXXXXX",DJANGO_SETTINGS_MODULE="olem.settings_prod",PYCURL_SSL_LIBRARY="nss",RDS_USERNAME="testuser",RDS_DB_NAME="olemapptestingDB",RDS_HOSTNAME="olemapptesting.cw2kqid7nxi9.us-east-1.rds.amazonaws.com"
  [include]
  files: uwsgi.conf celery.conf celerybeat.conf
  ERROR: supervisor shutting down
  ERROR: already shutting down
  error: <class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800
  error: <class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800 (Executor::NonZeroExitStatus)


[2018-05-02T13:30:07.544Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2/AppDeployStage1/AppDeployPostHook/run_supervised_celeryd.sh] : Activity failed.
[2018-05-02T13:30:07.545Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2/AppDeployStage1/AppDeployPostHook] : Activity failed.
[2018-05-02T13:30:07.545Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2/AppDeployStage1] : Activity failed.
[2018-05-02T13:30:07.545Z] INFO  [5456]  - [Application update app-v2_0-226-g83fa-180502_152745@2] : Completed activity. Result:
  Application update - Command CMD-AppDeploy failed

Журналы супервизора (/opt/python/log/supervisord.log)

2018-05-02 16:13:07,913 CRIT Supervisor running as root (no user in config file)
2018-05-02 16:13:07,928 INFO RPC interface 'supervisor' initialized
2018-05-02 16:13:07,928 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-05-02 16:13:07,929 INFO supervisord started with pid 2991
2018-05-02 16:13:08,931 INFO spawned: 'httpd' with pid 3076
2018-05-02 16:13:10,017 INFO success: httpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-02 16:15:03,412 INFO stopped: httpd (exit status 0)
2018-05-02 16:15:04,418 INFO spawned: 'httpd' with pid 4568
2018-05-02 16:15:05,554 INFO success: httpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-02 16:15:16,700 INFO waiting for httpd to die
2018-05-02 16:15:17,481 INFO stopped: httpd (exit status 0)
2018-05-02 16:15:17,485 CRIT Supervisor running as root (no user in config file)
2018-05-02 16:15:17,485 WARN Included extra file "/opt/python/etc/uwsgi.conf" during parsing
2018-05-02 16:15:17,485 INFO RPC interface 'supervisor' initialized
2018-05-02 16:15:17,485 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-05-02 16:15:17,485 INFO supervisord started with pid 2991
2018-05-02 16:15:17,873 INFO spawned: 'httpd' with pid 4826
2018-05-02 16:15:17,875 INFO spawned: 'uwsgi' with pid 4827
2018-05-02 16:15:18,509 INFO spawned: 'celeryd' with pid 4953
2018-05-02 16:15:18,511 INFO spawned: 'celerybeat' with pid 4954
2018-05-02 16:15:19,481 INFO success: httpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-02 16:15:19,481 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-02 16:15:19,496 INFO stopped: celeryd (terminated by SIGTERM)
2018-05-02 16:15:19,835 INFO spawned: 'celeryd' with pid 4962
2018-05-02 16:15:28,859 INFO success: celerybeat entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2018-05-02 16:15:29,860 INFO success: celeryd entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2018-05-02 16:15:30,307 INFO stopped: celerybeat (exit status 0)
2018-05-02 16:15:31,312 INFO spawned: 'celerybeat' with pid 4975
2018-05-02 16:15:41,326 INFO success: celerybeat entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

#################### NEW LOGS AFTER eb deploy ####################

2018-05-02 16:20:51,734 INFO exited: celerybeat (terminated by SIGKILL; not expected)
2018-05-02 16:20:51,958 INFO spawned: 'celerybeat' with pid 5275
2018-05-02 16:21:52,037 WARN killing 'celerybeat' (5275) with SIGKILL
2018-05-02 16:21:52,040 INFO stopped: celerybeat (terminated by SIGKILL)
2018-05-02 16:21:53,046 INFO spawned: 'celerybeat' with pid 5301
2018-05-02 16:22:03,060 INFO success: celerybeat entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2018-05-02 16:22:20,654 INFO stopped: httpd (exit status 0)
2018-05-02 16:22:21,659 INFO spawned: 'httpd' with pid 5687
2018-05-02 16:22:22,716 INFO success: httpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-02 16:22:29,436 INFO waiting for httpd, celerybeat, uwsgi, celeryd to die
2018-05-02 16:22:33,272 INFO waiting for httpd, celerybeat, uwsgi, celeryd to die
2018-05-02 16:22:36,276 INFO waiting for httpd, celerybeat, uwsgi, celeryd to die
2018-05-02 16:22:39,279 INFO waiting for httpd, celerybeat, uwsgi, celeryd to die
2018-05-02 16:22:40,281 WARN killing 'uwsgi' (4827) with SIGKILL
2018-05-02 16:22:40,292 INFO stopped: uwsgi (terminated by SIGKILL)
2018-05-02 16:22:40,477 INFO stopped: celerybeat (exit status 0)
2018-05-02 16:22:40,554 INFO stopped: httpd (exit status 0)
2018-05-02 16:22:42,177 INFO stopped: celeryd (exit status 0)
2018-05-02 16:22:42,184 CRIT Supervisor running as root (no user in config file)
2018-05-02 16:22:42,184 WARN Included extra file "/opt/python/etc/uwsgi.conf" during parsing
2018-05-02 16:22:42,184 WARN Included extra file "/opt/python/etc/celery.conf" during parsing
2018-05-02 16:22:42,184 WARN Included extra file "/opt/python/etc/celerybeat.conf" during parsing
2018-05-02 16:22:42,185 INFO RPC interface 'supervisor' initialized
2018-05-02 16:22:42,185 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2018-05-02 16:22:42,185 INFO RPC interface 'supervisor' initialized
2018-05-02 16:22:42,185 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-05-02 16:22:42,185 INFO supervisord started with pid 2991
2018-05-02 16:22:43,187 INFO spawned: 'celeryd' with pid 5924
2018-05-02 16:22:43,189 INFO spawned: 'celerybeat' with pid 5925
2018-05-02 16:22:43,190 INFO spawned: 'uwsgi' with pid 5926
2018-05-02 16:22:43,191 INFO spawned: 'httpd' with pid 5927
2018-05-02 16:22:44,326 INFO success: uwsgi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-02 16:22:44,326 INFO success: httpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-05-02 16:22:53,364 INFO success: celeryd entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2018-05-02 16:22:53,365 INFO success: celerybeat entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

Как мне это решить?

ОБНОВЛЕНИЕ 1

Глядя на файл журнала супервизора, я думаю, что error: <class 'xmlrpclib.Fault'>, <Fault 6: 'SHUTDOWN_STATE'> происходит из-за сигнала SIGKILL, где он не ожидается. См. Строку 2018-05-02 16:20:51,734 INFO exited: celerybeat (terminated by SIGKILL; not expected)

ОБНОВЛЕНИЕ 2

Файл конфигурации супервизора (/opt/python/etc/supervisord.conf)

[unix_http_server]
file=/opt/python/run/supervisor.sock   ; (the path to the socket file)
;chmod=0700                 ; socket file mode (default 0700)
;chown=nobody:nogroup       ; socket file uid:gid owner

[supervisord]
logfile=/opt/python/log/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=10MB        ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10           ; (num of main logfile rotation backups;default 10)
loglevel=info                ; (log level;default info; others: debug,warn,trace)
pidfile=/opt/python/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
minfds=1024                  ; (min. avail startup file descriptors;default 1024)
minprocs=200                 ; (min. avail process descriptors;default 200)
directory=/opt/python/current/app    ; (default is not to cd during start)
;nocleanup=true              ; (don't clean up tempfiles at start;default false)

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///opt/python/run/supervisor.sock

[program:httpd]
command=/opt/python/bin/httpdlaunch
numprocs=1
directory=/opt/python/current/app
autostart=true
autorestart=unexpected
startsecs=1                   ; number of secs prog must stay running (def. 1)
startretries=3                ; max # of serial start failures (default 3)
exitcodes=0,2                 ; 'expected' exit codes for process (default 0,2)
killasgroup=false             ; SIGKILL the UNIX process group (def false)
redirect_stderr=false
[include]
files = uwsgi.conf
[include]
files: uwsgi.conf celery.conf celerybeat.conf
[inet_http_server]
port = 127.0.0.1:9001
[include]
files: uwsgi.conf celery.conf celerybeat.conf

1 Ответ

0 голосов
/ 03 мая 2018

Так что есть два типа исправлений для этого

  • Просто добавьте sleep 15 секунд, чтобы дать достаточно времени для завершения работы предыдущего экземпляра супервизора
  • Создайте скрипт pre hooks, который проверит, запущен ли супервизор, и если это так, то просто дождется его смерти. Так что-то вроде /opt/elasticbeanstalk/hooks/appdeploy/pre/wait_for_supervised_to_die.sh
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...