xgboost в газированной воде выдает ошибку: XGBoost доступен не на всех узлах - PullRequest
0 голосов
/ 08 апреля 2020

Я пытаюсь запустить xgboost из пакета H2O в кластере искр. Я использую h2o в предварительном кластере на сервере Red Hat Enterprise Linux, версия: '3.10.0-1062.9.1.el7.x86_64'.

Я запускаю кластер H2O в среде Spark

.appName('APP1')\
.config('spark.executor.memory', '15g')\
.config('spark.executor.cores', '8')\
.config('spark.executor.instances','5')\
.config('spark.yarn.queue', "DS")\
.config('spark.yarn.executor.memoryOverhead', '1096')\
.enableHiveSupport()\
.getOrCreate()

from pysparkling import *
import h2o
h2oConf = H2OConf()
hc = H2OContext.getOrCreate() 


Connecting to H2O server at  ... successful.
H2O cluster uptime: 13 secs
H2O cluster timezone:   UTC
H2O data parsing timezone:  UTC
H2O cluster version:    3.28.1.2
H2O cluster version age:    23 days
H2O cluster name:   sparkling-water-app
H2O cluster total nodes:    5
H2O cluster free memory:    111.1 Gb
H2O cluster total cores:    160
H2O cluster allowed cores:  40
H2O cluster status: locked, healthy
H2O connection url: http
H2O connection proxy:   None
H2O internal security:  False
H2O API Extensions: XGBoost, Algos, Amazon S3, Sparkling Water REST API Extensions, AutoML, Core V3, TargetEncoder, Core V4
Python version: 2.7.13 final

Sparkling Water Context:
 * Sparkling Water Version: 3.28.1.2-1-2.2
 * H2O name: sparkling-water-app
 * cluster size: 5
 * list of used nodes:
  (executorId, host, port)

и я вижу xgboost в расширениях API H2O. Однако, когда я пытаюсь запустить его, я получаю сообщение об ошибке, что xgboost не доступен на всех узлах или он не зарегистрирован. Я импортирую xgboost из pysparkling.ml, и вот результат:

from pysparkling.ml import H2OXGBoost
estimator = H2OXGBoost(labelCol = "label")
model = estimator.fit(dt_train)


Py4JJavaErrorTraceback (most recent call last)
<ipython-input-44-a579ec322aa9> in <module>()
      3 
      4 
----> 5 model = estimator.fit(dt_train)

/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/base.py in fit(self, dataset, params)
     62                 return self.copy(params)._fit(dataset)
     63             else:
---> 64                 return self._fit(dataset)
     65         else:
     66             raise ValueError("Params must be either a param map or a list/tuple of param maps, "

/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/wrapper.py in _fit(self, dataset)
    263 
    264     def _fit(self, dataset):
--> 265         java_model = self._fit_java(dataset)
    266         return self._create_model(java_model)
    267 

/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/wrapper.py in _fit_java(self, dataset)
    260         """
    261         self._transfer_params_to_java()
--> 262         return self._java_obj.fit(dataset._jdf)
    263 
    264     def _fit(self, dataset):

/opt/continuum/anaconda3/envs/python27/lib/python2.7/site-packages/py4j/java_gateway.pyc in __call__(self, *args)
   1158         answer = self.gateway_client.send_command(command)
   1159         return_value = get_return_value(
-> 1160             answer, self.gateway_client, self.target_id, self.name)
   1161 
   1162         for temp_arg in temp_args:

/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
     61     def deco(*a, **kw):
     62         try:
---> 63             return f(*a, **kw)
     64         except py4j.protocol.Py4JJavaError as e:
     65             s = e.java_exception.toString()

/opt/continuum/anaconda3/envs/python27/lib/python2.7/site-packages/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)
    318                 raise Py4JJavaError(
    319                     "An error occurred while calling {0}{1}{2}.\n".
--> 320                     format(target_id, ".", name), value)
    321             else:
    322                 raise Py4JError(

Py4JJavaError: An error occurred while calling o6061.fit.
: water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for XGBoost model: XGBoost_model_1586296043974_6.  Details: ERRR on field: XGBoost: XGBoost is not available on all nodes!

    at water.exceptions.H2OModelBuilderIllegalArgumentException.makeFromBuilder(H2OModelBuilderIllegalArgumentException.java:19)
    at hex.tree.xgboost.XGBoost.init(XGBoost.java:159)
    at hex.tree.xgboost.XGBoost$XGBoostDriver.computeImpl(XGBoost.java:315)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:242)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1470)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

И когда я использую его в поиске по сетке, я получаю ошибку:

xgboost Grid Build progress: |████████████████████████████████████████████| 100%
Errors/Warnings building gridsearch model

Hyper-parameter: col_sample_rate, 0.6
Hyper-parameter: learn_rate, 0.01
Hyper-parameter: max_depth, 3
Hyper-parameter: sample_rate, 0.6
Hyper-parameter: tweedie_power, 1.75
failure_details: Algorithm 'XGBoost' is not registered. Available algos: [targetencoder,deeplearning,glm,glrm,kmeans,naivebayes,pca,svd,drf,gbm,isolationforest,aggregator,deepwater,word2vec,stackedensemble,coxph,generic,psvm]
failure_stack_traces: java.lang.IllegalStateException: Algorithm 'XGBoost' is not registered. Available algos: [targetencoder,deeplearning,glm,glrm,kmeans,naivebayes,pca,svd,drf,gbm,isolationforest,aggregator,deepwater,word2vec,stackedensemble,coxph,generic,psvm]
    at hex.ModelBuilder.make(ModelBuilder.java:173)
    at hex.ModelBuilder$TrainModelNestedRunnable.run(ModelBuilder.java:426)
    at water.H2O$RunnableWrapperTask.compute2(H2O.java:1380)
    at water.H2O$H2OCountedCompleter.compute1(H2O.java:1473)
    at water.H2O$RunnableWrapperTask$Icer.compute1(H2O$RunnableWrapperTask$Icer.java)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1469)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
...