Наш эластичный поиск работает за балансировщиком нагрузки. URL для балансировщика нагрузки https://es.mycomp.com. Я могу разместить на нем документ как от почтальона, так и от curl. Так что брандмауэр открыт для моей коробки разработчиков. Но когда я публикую документы от spark, я получаю:
19/06/19 09:35:49 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
19/06/19 09:35:49 INFO HttpMethodDirector: Retrying request
19/06/19 09:35:49 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection refused: connect
19/06/19 09:35:49 INFO HttpMethodDirector: Retrying request
19/06/19 09:35:50 ERROR NetworkClient: Node [10.127.30.46:433] failed (Connection refused: connect); no other nodes left - aborting...
19/06/19 09:35:50 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:196)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:84)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.127.30.46:433]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:414)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:122)
at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:564)
at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:184)
... 10 more
Вот мой код:
val conf = new SparkConf().setMaster("local[2]").setAppName("ESPost")
.set("es.index.auto.create", "true")
.set("es.nodes", "es.mycomp.com")
.set("es.port", "443")
.set("es.nodes.client.only", "true")
.set("es.http.timeout", "5m")
.set("es.scroll.size", "50")
Мы используемasticsearch2.4, вот мой конфиг:
scalaVersion := "2.11.12"
val sparkVersion = "1.3.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.elasticsearch" %% "elasticsearch-spark" % "2.4.5",
"org.apache.spark" % "spark-streaming_2.11" % sparkVersion
)