Загрузка PySpark ALSModel не выполняется при развертывании службы Azure ML с ошибкой java .util.NoSuchElementException: Param blockSize не существует - PullRequest
1 голос
/ 01 августа 2020

Я пытаюсь развернуть модель ALS, обученную с помощью PySpark, в службе Azure ML. Я предоставляю файл score.py, который загружает обученную модель с помощью функции ALSModel.load (). Ниже приведен код моего файла score.py.

import os
from azureml.core.model import Model
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import DoubleType, StringType
from pyspark.sql import SQLContext
from pyspark import SparkContext

sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
spark = sqlContext.sparkSession

input_schema = StructType([StructField("UserId", StringType())])
reader = spark.read
reader.schema(input_schema)


def init():
    global model
    # note here "iris.model" is the name of the model registered under the workspace
    # this call should return the path to the model.pkl file on the local disk.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), "recommendation-model")
    # Load the model file back into a LogisticRegression model
    model = ALSModel.load(model_path)
    

def run(data):
    try:
        input_df = reader.json(sc.parallelize([data]))
        input_df = indexer.transform(input_df)
        
        res = model.recommendForUserSubset(input_df[['UserId_index']], 10)

        # you can return any datatype as long as it is JSON-serializable
        return result.collect()[0]['recommendations']
    except Exception as e:
        traceback.print_exc()
        error = str(e)
        return error

Ниже приведена ошибка, которую я получаю, когда развертываю его как LocalWebService с помощью функции Model.deploy в Azure службе ML

Generating Docker build context.
Package creation Succeeded
Logging into Docker registry viennaglobal.azurecr.io
Logging into Docker registry viennaglobal.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM viennaglobal.azurecr.io/azureml/azureml_43542b56c5ec3e8d0f68e1556558411f
 ---> 5b3bb174ca5f
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 8e540c0746f7
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjNkN2M1ZjM4LTI1ODEtNGUxNi05NTdhLWEzOTU1OGI1ZjBiMyIsInJlc291cmNlR3JvdXBOYW1lIjoiZGV2LW9tbmljeC10ZnMtYWkiLCJhY2NvdW50TmFtZSI6ImRldi10ZnMtYWktd29ya3NwYWNlIiwid29ya3NwYWNlSWQiOiI1NjkzNGMzNC1iZmYzLTQ3OWUtODRkMy01OGI4YTc3ZTI4ZjEifSwibW9kZWxzIjp7fSwibW9kZWxzSW5mbyI6e319 | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in 502ad8edf91e
 ---> a1bc5e0283d0
Step 4/5 : RUN mv '/var/azureml-app/tmpvxhomyin.py' /var/azureml-app/main.py
 ---> Running in eb4ec1a0b702
 ---> 6a3296fe6420
Step 5/5 : CMD ["runsvdir","/var/runit"]
 ---> Running in 834fd746afef
 ---> 5b9f8be538c0
Successfully built 5b9f8be538c0
Successfully tagged recommend-service:latest
Container (name:musing_borg, id:0f3163692f5119685eee5ed59c8e00aa96cd472f765e7db67653f1a6ce852e83) cannot be killed.
Container has been successfully cleaned up.
Image sha256:0f146f4752878bbbc0e876f4477cc2877ff12a366fca18c986f9a9c2949d028b successfully removed.
Starting Docker container...
Docker container running.
Checking container health...
ERROR - Error: Container has crashed. Did your init method fail?


Container Logs:
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,312735664+00:00 - rsyslog/run 
2020-07-30T11:57:00,312768364+00:00 - gunicorn/run 
2020-07-30T11:57:00,313017966+00:00 - iot-server/run 
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
2020-07-30T11:57:00,313969073+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,597835804+00:00 - iot-server/finish 1 0
2020-07-30T11:57:00,598826211+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 19.9.0
Listening at: http://127.0.0.1:31311 (10)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 41
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/home/mmlspark/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.ml.spark#mmlspark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156;1.0
    confs: [default]
    found com.microsoft.ml.spark#mmlspark_2.11;0.15 in spark-list
    found io.spray#spray-json_2.11;1.3.2 in central
    found com.microsoft.cntk#cntk;2.4 in central
    found org.openpnp#opencv;3.2.0-1 in central
    found com.jcraft#jsch;0.1.54 in central
    found org.apache.httpcomponents#httpclient;4.5.6 in central
    found org.apache.httpcomponents#httpcore;4.4.10 in central
    found commons-logging#commons-logging;1.2 in central
    found commons-codec#commons-codec;1.10 in central
    found com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 in central
:: resolution report :: resolve 318ms :: artifacts dl 11ms
    :: modules in use:
    com.jcraft#jsch;0.1.54 from central in [default]
    com.microsoft.cntk#cntk;2.4 from central in [default]
    com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 from central in [default]
    com.microsoft.ml.spark#mmlspark_2.11;0.15 from spark-list in [default]
    commons-codec#commons-codec;1.10 from central in [default]
    commons-logging#commons-logging;1.2 from central in [default]
    io.spray#spray-json_2.11;1.3.2 from central in [default]
    org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
    org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
    org.openpnp#opencv;3.2.0-1 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   10  |   0   |   0   |   0   ||   10  |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
    unknown resolver repo-1

    unknown resolver repo-1


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156
    confs: [default]
    0 artifacts copied, 10 already retrieved (0kB/7ms)
2020-07-30 11:57:02 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Initialized PySpark session.
Initializing logger
2020-07-30 11:57:09,464 | root | INFO | Starting up app insights client
Starting up app insights client
2020-07-30 11:57:09,464 | root | INFO | Starting up request id generator
Starting up request id generator
2020-07-30 11:57:09,464 | root | INFO | Starting up app insight hooks
Starting up app insight hooks
2020-07-30 11:57:09,464 | root | INFO | Invoking user's init function
Invoking user's init function
2020-07-30 11:57:19,652 | root | ERROR | User's init function failed
User's init function failed
2020-07-30 11:57:19,656 | root | ERROR | Encountered Exception Traceback (most recent call last):
  File "/var/azureml-server/aml_blueprint.py", line 163, in register
    main.init()
  File "/var/azureml-app/main.py", line 44, in init
    model = ALSModel.load(model_path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
    return cls.read().load(path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
    java_obj = self._jread.load(path)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
    at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
    at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)


Encountered Exception Traceback (most recent call last):
  File "/var/azureml-server/aml_blueprint.py", line 163, in register
    main.init()
  File "/var/azureml-app/main.py", line 44, in init
    model = ALSModel.load(model_path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
    return cls.read().load(path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
    java_obj = self._jread.load(path)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
    at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
    at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)


Worker exiting (pid: 41)
Shutting down: Master
Reason: Worker failed to boot.
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:19,833136837+00:00 - gunicorn/finish 3 0
2020-07-30T11:57:19,834216245+00:00 - Exit code 3 is not normal. Killing image.

---------------------------------------------------------------------------
WebserviceException                       Traceback (most recent call last)
<ipython-input-43-d0992ae9d1c9> in <module>
      6 local_service = Model.deploy(workspace, "recommend-service", [register_model], inference_config, deployment_config)
      7 
----> 8 local_service.wait_for_deployment()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in decorated(self, *args, **kwargs)
     69                 raise WebserviceException('Cannot call {}() when service is {}.'.format(func.__name__, self.state),
     70                                           logger=module_logger)
---> 71             return func(self, *args, **kwargs)
     72         return decorated
     73     return decorator

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in wait_for_deployment(self, show_output)
    601                                    self._container,
    602                                    health_url=self._internal_base_url,
--> 603                                    cleanup_if_failed=False)
    604 
    605             self.state = LocalWebservice.STATE_RUNNING

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in container_health_check(docker_port, container, health_url, cleanup_if_failed)
    745             # The container has started and crashed.
    746             _raise_for_container_failure(container, cleanup_if_failed,
--> 747                                          'Error: Container has crashed. Did your init method fail?')
    748 
    749         # The container hasn't crashed, so try to ping the health endpoint.

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in _raise_for_container_failure(container, cleanup, message)
   1258         cleanup_container(container)
   1259 
-> 1260     raise WebserviceException(message, logger=module_logger)
   1261 
   1262 

WebserviceException: WebserviceException:
    Message: Error: Container has crashed. Did your init method fail?
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Error: Container has crashed. Did your init method fail?"
    }
}

Однако ALSModel.load () отлично работает при выполнении в записной книжке Jupyter.

1 Ответ

1 голос
/ 03 августа 2020

Пара вещей, которые нужно проверить:

  1. Зарегистрирована ли ваша модель в рабочей области? AZUREML_MODEL_DIR работает только для зарегистрированных моделей. См. эту ссылку для получения информации о регистрации модели
  2. Указываете ли вы ту же версию pyspark.ml.recommendation в своем InferenceConfig, что и используете локально? Такая ошибка может быть связана с разницей в версиях
  3. Вы смотрели результат print(service.get_logs())? Ознакомьтесь с нашей документацией по устранению неполадок и отладке здесь , чтобы узнать о других вещах, которые вы можете попробовать
...