Spark Не удается подключиться к Elasticsearch2.4 через HTTP и балансировщик нагрузки - PullRequest
0 голосов
/ 19 июня 2019

Наш эластичный поиск работает за балансировщиком нагрузки. URL для балансировщика нагрузки https://es.mycomp.com. Я могу разместить на нем документ как от почтальона, так и от curl. Так что брандмауэр открыт для моей коробки разработчиков. Но когда я публикую документы от spark, я получаю:

19/06/19 09:35:49 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
19/06/19 09:35:49 INFO HttpMethodDirector: Retrying request
19/06/19 09:35:49 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
19/06/19 09:35:49 INFO HttpMethodDirector: Retrying request
19/06/19 09:35:50 ERROR NetworkClient: Node [10.127.30.46:433] failed (Connection refused: connect); no other nodes left - aborting...
19/06/19 09:35:50 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:196)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.127.30.46:433]] 
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:414)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:122)
    at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:564)
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:184)
    ... 10 more

Вот мой код:

    val conf = new SparkConf().setMaster("local[2]").setAppName("ESPost")
      .set("es.index.auto.create", "true")
      .set("es.nodes", "es.mycomp.com")
      .set("es.port", "443")
      .set("es.nodes.client.only", "true")
      .set("es.http.timeout", "5m")
      .set("es.scroll.size", "50")

Мы используемasticsearch2.4, вот мой конфиг:

scalaVersion := "2.11.12"

val sparkVersion = "1.3.0"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.elasticsearch" %% "elasticsearch-spark" % "2.4.5",
  "org.apache.spark" % "spark-streaming_2.11" % sparkVersion
)
...