Проблема: развертывание модели с использованием Azure Служба машинного обучения - PullRequest
0 голосов
/ 31 января 2020

Я создал модель классификатора, используя Azure Служба машинного обучения, после успешной регистрации модели я создал правильную среду для экземпляра контейнера, предоставив файл оценки, файл среды и файл конфигурации. К сожалению, когда я развертываю свое решение, оно выдает мне ошибку Однако вот мои журналы службы развертывания, чтобы получить более подробную информацию:

Журналы службы

2020-02-07T06:21:10,612616835+00:00 - rsyslog/run 
2020-02-07T06:21:10,616528746+00:00 - iot-server/run 
2020-02-07T06:21:10,617958751+00:00 - gunicorn/run 
2020-02-07T06:21:10,627065178+00:00 - nginx/run 
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2020-02-07T06:21:11,108893523+00:00 - iot-server/finish 1 0
2020-02-07T06:21:11,116794547+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 19.9.0
Listening at: http://127.0.0.1:31311 (12)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 45
Initializing logger
Starting up app insights client
Starting up request id generator
Starting up app insight hooks
Invoking user's init function
2020-02-07 06:21:15,494 | azureml.core.run | DEBUG | Could not load run context RunEnvironmentException:
    Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run.
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run."
    }
}, switching offline: False
2020-02-07 06:21:15,495 | azureml.core.run | DEBUG | Could not load the run context and allow_offline set to False
2020-02-07 06:21:15,495 | azureml.core.model | DEBUG | Checking root for demo_Model.pkl because candidate dir azureml-models had 1 nodes: azureml-models/demomodel/8/demo_Model.pkl
User's init function failed
Encountered Exception Traceback (most recent call last):
  File "/var/azureml-server/aml_blueprint.py", line 163, in register
    main.init()
  File "/var/azureml-app/main.py", line 88, in init
    driver_module.init()
  File "score.py", line 13, in init
    model_path = Model.get_model_path('demo_Model.pkl')
  File "/opt/miniconda/lib/python3.6/site-packages/azureml/core/model.py", line 697, in get_model_path
    return Model._get_model_path_local(model_name, version)
  File "/opt/miniconda/lib/python3.6/site-packages/azureml/core/model.py", line 718, in _get_model_path_local
    return Model._get_model_path_local_from_root(model_name)
  File "/opt/miniconda/lib/python3.6/site-packages/azureml/core/model.py", line 761, in _get_model_path_local_from_root
    "set logging level to DEBUG.".format(candidate_model_path))
azureml.exceptions._azureml_exception.ModelNotFoundException: ModelNotFoundException:
    Message: Model not found in cache or in root at ./demo_Model.pkl. For more info,set logging level to DEBUG.
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Model not found in cache or in root at ./demo_Model.pkl. For more info,set logging level to DEBUG."
    }
}

/opt/miniconda/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
Worker exiting (pid: 45)
Shutting down: Master
Reason: Worker failed to boot.
2020-02-07T06:21:15,663509630+00:00 - gunicorn/finish 3 0
2020-02-07T06:21:15,664398433+00:00 - Exit code 3 is not normal. Killing image.

Ошибка Запуск ...... .................................................. .................................................. .................................................. .................................................. ................................

TimedOut

ERROR - Service deployment polling reached non-successful terminal state, current service state: Unhealthy
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. From SDK you can run print(service.state) to know the current state of the webservice."
}

ERROR - Service deployment polling reached non-successful terminal state, current service state: Unhealthy
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. From SDK you can run print(service.state) to know the current state of the webservice."
}

---------------------------------------------------------------------------
WebserviceException                       Traceback (most recent call last)
~/anaconda3_501/lib/python3.6/site-packages/azureml/core/webservice/webservice.py in wait_for_deployment(self, show_output)
    530                                           'Error:\n'
--> 531                                           '{}'.format(self.state, logs_response, error_response), logger=module_logger)
    532             print('{} service creation operation finished, operation "{}"'.format(self._webservice_type,

WebserviceException: WebserviceException:
    Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. From SDK you can run print(service.state) to know the current state of the webservice."
}
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nMore information can be found using '.get_logs()'\nError:\n{\n  \"code\": \"DeploymentTimedOut\",\n  \"statusCode\": 504,\n  \"message\": \"The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. From SDK you can run print(service.state) to know the current state of the webservice.\"\n}"
    }
}

During handling of the above exception, another exception occurred:

WebserviceException                       Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3_501/lib/python3.6/site-packages/azureml/core/webservice/webservice.py in wait_for_deployment(self, show_output)
    538                                           'Current state is {}'.format(self.state), logger=module_logger)
    539             else:
--> 540                 raise WebserviceException(e.message, logger=module_logger)
    541 
    542     def _wait_for_operation_to_complete(self, show_output):

WebserviceException: WebserviceException:
    Message: Service deployment polling reached non-successful terminal state, current service state: Unhealthy
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. From SDK you can run print(service.state) to know the current state of the webservice."
}
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Service deployment polling reached non-successful terminal state, current service state: Unhealthy\nMore information can be found using '.get_logs()'\nError:\n{\n  \"code\": \"DeploymentTimedOut\",\n  \"statusCode\": 504,\n  \"message\": \"The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. From SDK you can run print(service.state) to know the current state of the webservice.\"\n}"
    }
}

Вот так выглядит мой код веб-сервиса:

%%time
from azureml.core.webservice import Webservice
from azureml.core.model import Model
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment

myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

service = Model.deploy(workspace=ws,
                       name='myimage',
                       models=[model], 
                       inference_config=inference_config,
                       deployment_config=aciconfig)

service.wait_for_deployment(show_output=True)

Может кто-нибудь сказать мне, что это на самом деле означает? Как я могу решить это?

Спасибо

Ахмад

1 Ответ

0 голосов
/ 10 февраля 2020

Обновление версии scikit-learn решило ее в моей среде.

Укажите версию в myenv.yml следующим образом. (В моей среде 0.20.3 изначально устанавливается и решается обновлением до 0.22.1)

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
  - azureml-defaults
- scikit-learn=0.22.1
channels:
- conda-forge
...