Я импортирую данные из mysql в HIVE, используя SQOOP на моем локальном компьютере, и ниже приведен пример использования
Таблица Customer2 существует в mysql, как показано ниже
mysql> select * from customer2;
+-------------+----------------+----------------+----------------+--------- ----------+-----------------+---------------+----------------+------------------+
| customer_id | customer_fname | customer_lname | customer_email | customer_password | customer_street | customer_city | customer_state | customer_zipcode |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
| 100008 | Christine | K | NULL | NULL | HK | HK | HK | 19293 |
| 100009 | Chris | Taylor | NULL | NULL | HK | HK | HK | 1925 |
| 100010 | Mark | Jamiel | NULL | NULL | HK | HK | HK | 19294 |
| 100011 | Tom | Pride | NULL | NULL | HK | HK | HK | 19295 |
| 100012 | Tom | Heather | NULL | NULL | CA | CA | CA | 19295 |
| 100013 | Maxim | Calay | NULL | NULL | CA | CA | CA | 19295 |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
Та же таблица customer2 существует в HIVE, разделенной на customer_city, как показано ниже:
hive (sumitpawar)> describe customer2;
OK
col_name data_type comment
customer_id int
customer_fname string
customer_lname string
customer_email string
customer_password string
customer_zipcode string
customer_city string
Partition Information
col_name data_type comment
customer_city string
Time taken: 0.098 seconds, Fetched: 12 row(s)
hive (sumitpawar)>
Затем я использую приведенную ниже команду SQOOP для импорта данных в HIVE и ожидаю, чтостроки будут перемещены в соответствующие разделы
Sqoop
sqoop import \
--options-file ./options_file.txt \
--table customer2 \
--columns 'customer_id,customer_fname,customer_lname,customer_email,customer_password,customer_zipcode' \
--hive-database sumitpawar \
--hive-import \
--null-string 'Empty' \
--null-non-string 0 \
-m 2 \
--mapreduce-job-name "JOB :import to hive from mysql" \
--warehouse-dir "/PRACTICALS/SQOP/retail_db/increment_hive_mysql4" \
--hive-partition-key customer_city \
--hive-partition-value 'CA'
содержимое файла_опций: option_file.txt
[cloudera@quickstart SQOOP]$ cat options_file.txt
#############################################
--connect
jdbc:mysql://localhost:3306/retail_db
--username
root
--password-file
/PRACTICALS/SQOOP/password
############################################
Журнал выполнения команды SQOOP
18/12/09 06:59:38 INFO mapreduce.Job: Running job: job_1542647782962_0231
18/12/09 06:59:49 INFO mapreduce.Job: Job job_1542647782962_0231 running in uber mode : false
18/12/09 06:59:49 INFO mapreduce.Job: map 0% reduce 0%
18/12/09 07:00:06 INFO mapreduce.Job: map 50% reduce 0%
18/12/09 07:00:07 INFO mapreduce.Job: map 100% reduce 0%
18/12/09 07:00:07 INFO mapreduce.Job: Job job_1542647782962_0231 completed successfully
18/12/09 07:00:07 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=311480
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=253
HDFS: Number of bytes written=220
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=29025
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=29025
Total vcore-milliseconds taken by all map tasks=29025
Total megabyte-milliseconds taken by all map tasks=29721600
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=253
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=417
CPU time spent (ms)=2390
Physical memory (bytes) snapshot=276865024
Virtual memory (bytes) snapshot=3020857344
Total committed heap usage (bytes)=121765888
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=220
18/12/09 07:00:07 INFO mapreduce.ImportJobBase: Transferred 220 bytes in 33.0333 seconds (6.66 bytes/sec)
18/12/09 07:00:07 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/12/09 07:00:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customer2` AS t LIMIT 1
18/12/09 07:00:07 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.12.0.jar!/hive-log4j.properties
OK
Time taken: 3.016 seconds
Loading data to table sumitpawar.customer2 partition (customer_city=CA)
Partition sumitpawar.customer2{customer_city=CA} stats: [numFiles=2, numRows=0, totalSize=440, rawDataSize=0]
OK
Time taken: 1.056 seconds
------------------------------------------------------------------------------------------------
Однако после выполнения запроса к таблице в HIVE я мог видеть значения NULL для всех столбцов, кроме customer_city
Вывод из таблицы HIVE
hive (sumitpawar)> select * from customer2;
OK
+-------------+----------------+----------------+----------------+----------- --------+-----------------+---------------+----------------+------------------+
| customer_id | customer_fname | customer_lname | customer_email | customer_password | customer_street | customer_city | customer_state | customer_zipcode |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
| NULL | NULL | NULL | NULL | NULL |NULL | NULL | NULL | NULL |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
Может кто-нибудь сообщить мне, если что-то не так в приведенном выше и как можно загрузить данные из таблиц HIVE с разделами SQOP по разделу?
С уважением, Сумит