Spark - подключение к MySQL через Zeppelin EMR - PullRequest
0 голосов
/ 29 мая 2018

Я пытаюсь подключиться к экземпляру MySQL из ноутбука AWS EMR - Zeppelin.Загружен разъем mysql в это место - /usr/lib/spark/jars/mysql-connector-java-5.0.4-bin.jar.И добавил это как артефакт в интерпретаторе цеппелина.

Инициирование драйвера,

Class.forName("com.mysql.jdbc.Driver")

res77: Класс [_] = класс com.mysql.jdbc.Driver

Использование кода Scala в качествездесь

Пробная версия 1,

val jdbcDF = spark.read.format("jdbc").options(
  Map("url" ->  "jdbc:mysql://our-host.readm.co.nz:3306/testdb?user=testuser&password=****",
  "dbtable" -> "testdb.tablename",
  "driver" -> "com.mysql.jdbc.Driver",
  "partitionColumn" -> "id", "lowerBound" -> "1", "upperBound" -> "41514638", "numPartitions" -> "20"
  )).load()

Получил эту ошибку,

com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.net.SocketException
MESSAGE: java.net.ConnectException: Connection timed out (Connection timed out)
STACKTRACE:
java.net.SocketException: java.net.ConnectException: Connection timed out (Connection timed out)
    at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:156)
    at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:276)
    at com.mysql.jdbc.Connection.createNewIO(Connection.java:2666)
    at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
    at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at <init>(<console>:50)
    at <init>(<console>:55)
    at <init>(<console>:57)
    at <init>(<console>:59)
    at <init>(<console>:61)
    at <init>(<console>:63)
    at <init>(<console>:65)
    at <init>(<console>:67)
    at <init>(<console>:69)
    at <init>(<console>:71)
    at <init>(<console>:73)
    at <init>(<console>:75)
    at <init>(<console>:77)
    at <init>(<console>:79)
    at <init>(<console>:81)
    at <init>(<console>:83)
    at <init>(<console>:85)
    at <init>(<console>:87)
    at <init>(<console>:89)
    at <init>(<console>:91)
    at <init>(<console>:93)
    at .<init>(<console>:97)
    at .<clinit>(<console>)
    at .$print$lzycompute(<console>:7)
    at .$print(<console>:6)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
    at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
    at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
    at sun.reflect.GeneratedMethodAccessor369.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1000)
    at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1205)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1172)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1165)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
** END NESTED EXCEPTION **
Last packet sent to the server was 1 ms ago.
  at com.mysql.jdbc.Connection.createNewIO(Connection.java:2741)
  at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
  at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
  ... 58 elided

Затем пробная версия 2,

val partitionColumn = "id"
val dbUrl = "jdbc:mysql://our-host.readm.co.nz:3306"

val lowerBound = 1
val upperBound = 41514638

val jdbcDF = spark.read.format("jdbc")
    .option("url", dbUrl)
    .option("databaseName", "testdb")
    .option("dbtable", "tablename")
    .option("user", "username")
    .option("password", "*****")
    .option("driver","com.mysql.jdbc.Driver")
    .option("partitionColumn", partitionColumn)
    .option("lowerBound", lowerBound)
    .option("upperBound", upperBound)
    .option("numPartitions", 20)
    .load()

Получил эту ошибку,

java.lang.NullPointerException
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
  ... 58 elided

Пробная версия 3,

import java.util.Properties

val table = "tablename"
val url = "jdbc:mysql://our-host.readm.co.nz"
val prop = new Properties()
prop.put("driver","com.mysql.jdbc.Driver")
prop.put("user","username")
prop.put("password","****")
prop.put("databaseName","testdb")

val jdbcDF = spark.read.jdbc(url,
        table,
        prop)

Получение этой ошибки,

java.lang.NullPointerException
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
  at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
  ... 58 elided

Не уверен, что здесь не так, можете ли вы мне помочь?Спасибо.

РЕДАКТИРОВАТЬ: - Это то, что я получаю после попытки ответить на вопрос @ geo

com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.net.SocketException
MESSAGE: java.net.ConnectException: Connection timed out (Connection timed out)
STACKTRACE:
java.net.SocketException: java.net.ConnectException: Connection timed out (Connection timed out)
    at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:156)
    at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:276)
    at com.mysql.jdbc.Connection.createNewIO(Connection.java:2666)
    at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
    at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at <init>(<console>:40)
    at <init>(<console>:45)
    at <init>(<console>:47)
    at <init>(<console>:49)
    at <init>(<console>:51)
    at <init>(<console>:53)
    at <init>(<console>:55)
    at <init>(<console>:57)
    at <init>(<console>:59)
    at <init>(<console>:61)
    at <init>(<console>:63)
    at <init>(<console>:65)
    at <init>(<console>:67)
    at .<init>(<console>:71)
    at .<clinit>(<console>)
    at .$print$lzycompute(<console>:7)
    at .$print(<console>:6)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
    at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
    at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
    at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
    at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
    at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
    at sun.reflect.GeneratedMethodAccessor340.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1000)
    at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1205)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1172)
    at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1165)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
** END NESTED EXCEPTION **
Last packet sent to the server was 7 ms ago.
  at com.mysql.jdbc.Connection.createNewIO(Connection.java:2741)
  at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
  at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
  ... 50 elided

Ответы [ 2 ]

0 голосов
/ 06 июня 2018

Как настроен ваш кластер?правила брандмауэра и т. д.?MySQL работает?В любом случае это больше похоже на проблему с соединением, чем на драйвер JDBC.Поэтому я бы порекомендовал: попробуйте подключиться к MySQL с хоста, на котором работает Zepplin, через bash

0 голосов
/ 05 июня 2018

Предполагая, что интерпретатор JDBC работает без проблем в Zeppelin, это то, что у меня сработало:

// DB connection
val options = Map(
    "url"       -> "jdbc:mysql://localhost:3306/dbname",
    "dbtable"   -> "dbtable",
    "user"      -> "user",
    "password"  -> "password"
)

val data = spark
    .read
    .format("jdbc")
    .options(options)
    .load

Само собой разумеется, что должен быть процесс mysql, работающий на localhost:3306 (или где бы вы ни находились)хостинг БД) и что брандмауэр / iptables позволяет вам входить / выходить из этого процесса.

...