У меня есть проект, в котором мне нужно настроить spark и hbase в локальной среде. Я скачал spark-2.2.1, hadoop 2.7 и hbase 1.1.8 и настроил соответственно на автономный одноузловой ОС Ubuntu 14.04.
Я могу вытащить и перенести данные из искры в HDFS, но не с помощью hbase.
core-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
HDFS-site.xml:
[root@localhost conf]# cat hdfs-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value>0.0.0.0</value>
</property> </configuration>
spark-env.sh
[root@localhost conf]# cat spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_INSTANCES=1
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_DIR=/app/spark/tmp
# Options read in YARN client mode
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1G
export SPARK_DRIVER_MEMORY=1G
export SPARK_YARN_APP_NAME=Spark
export SPARK_CLASSPATH=/opt/hbase/lib/*
HBase-site.xml:
[root@localhost conf]# cat hbase-site.xml <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration> <property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value> </property> <property>
<name>hbase.cluster.distributed</name>
<value>true</value> </property> <property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value> </property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>hdfs://localhost:9000/zookeeper</value> </property> <property>
<name>hbase.master.dns.interface</name>
<value>default</value> </property> <property>
<name>hbase.master.ipc.address</name>
<value>localhost</value> </property> <property>
<name>hbase.regionserver.dns.interface</name>
<value>default</value> </property> <property>
<name>hbase.regionserver.ipc.address</name>
<value>HOSTNAME</value> </property>
<property>
<name>hbase.zookeeper.dns.interface</name>
<value>default</value> </property>
</configuration>
искровой defaults.conf:
[root@localhost conf]# cat spark-defaults.conf
spark.master
spark://127.0.0.1:7077 spark.yarn.dist.files
/opt/spark/conf/hbase-site.xml
Ошибка:
Даже hbase lib (jars) экспортируется в spark-env.sh, он не может импортировать библиотеки hbase (например: HBaseConfiguration).
scala> import org.apache.hadoop.hbase.HBaseConfiguration
<console>:23: error: object hbase is not a member of package org.apache.hadoop
import org.apache.hadoop.hbase.HBaseConfiguration
^
Если я загружу эти фляги через --drive-class-path
spark-shell --master local --driver-class-path=/opt/hbase/lib/*
scala> conf.set("hbase.zookeeper.quorum","localhost")
scala> conf.set("hbase.zookeeper.property.clientPort", "2181")
scala> val connection: Connection = ConnectionFactory.createConnection(conf)
connection: org.apache.hadoop.hbase.client.Connection = hconnection-0x2a4cb8ae
scala> val tableName = connection.getTable(TableName.valueOf("employee"))
tableName: org.apache.hadoop.hbase.client.Table = employee;hconnection-0x2a4cb8ae
scala> val insertData = new Put(Bytes.toBytes("1"))
insertData: org.apache.hadoop.hbase.client.Put = {"totalColumns":0,"row":"1","families":{}}
scala>
| insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("Name"), Bytes.toBytes("Jeevan"))
res3: org.apache.hadoop.hbase.client.Put = {"totalColumns":1,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807}]}}
scala> insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("City"), Bytes.toBytes("San Jose"))
res4: org.apache.hadoop.hbase.client.Put = {"totalColumns":2,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807},{"qualifier":"City","vlen":8,"tag":[],"timestamp":9223372036854775807}]}}
scala> insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("Company"), Bytes.toBytes("Cisco"))
res5: org.apache.hadoop.hbase.client.Put = {"totalColumns":3,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807},{"qualifier":"City","vlen":8,"tag":[],"timestamp":9223372036854775807},{"qualifi
":"Company","vlen":5,"tag":[],"timestamp":9223372036854775807}]}}
scala> insertData.addColumn(Bytes.toBytes("emp personal data "), Bytes.toBytes("location"), Bytes.toBytes("San Jose"))
res6: org.apache.hadoop.hbase.client.Put = {"totalColumns":4,"row":"1","families":{"emp personal data ":[{"qualifier":"Name","v
n":6,"tag":[],"timestamp":9223372036854775807},{"qualifier":"City","vlen":8,"tag":[],"timestamp":9223372036854775807},{"qualifi
":"Company","vlen":5,"tag":[],"timestamp":9223372036854775807},{"qualifier":"location","vlen":8,"tag":[],"timestamp":9223372036
4775807}]}}
но я не вижу ни одной новой колонки в Hbase.
Может ли кто-нибудь помочь, пожалуйста. любая ссылка на конфигурацию была бы отличной. мне нужно настроить любого зоопарка? ценю твою помощь.