Интеграция SQOOP с HIVE: загрузка данных в многораздельную таблицу HIVE - PullRequest
0 голосов
/ 12 декабря 2018

Я импортирую данные из mysql в HIVE, используя SQOOP на моем локальном компьютере, и ниже приведен пример использования

  1. Таблица Customer2 существует в mysql, как показано ниже

      mysql> select * from customer2;
      +-------------+----------------+----------------+----------------+---------    ----------+-----------------+---------------+----------------+------------------+
     | customer_id | customer_fname | customer_lname | customer_email | customer_password | customer_street | customer_city | customer_state | customer_zipcode |
     +-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
     |      100008 | Christine      | K              | NULL           | NULL                  | HK              | HK            | HK             | 19293            |
     |      100009 | Chris          | Taylor         | NULL           | NULL              | HK              | HK            | HK             | 1925             |
     |      100010 | Mark           | Jamiel         | NULL           | NULL              | HK              | HK            | HK             | 19294            |
     |      100011 | Tom            | Pride          | NULL           | NULL              | HK              | HK            | HK             | 19295            |
     |      100012 | Tom            | Heather        | NULL           | NULL              | CA              | CA            | CA             | 19295            |
     |      100013 | Maxim          | Calay          | NULL           | NULL              | CA              | CA            | CA             | 19295            |
     +-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
    
  2. Та же таблица customer2 существует в HIVE, разделенной на customer_city, как показано ниже:

    hive (sumitpawar)> describe customer2;
    OK
     col_name        data_type       comment
     customer_id           int                                         
     customer_fname          string                                      
     customer_lname          string                                      
     customer_email          string                                      
     customer_password       string                                      
     customer_zipcode        string                                      
     customer_city           string                                      
    
     Partition Information          
     col_name              data_type               comment             
    
     customer_city           string                                      
     Time taken: 0.098 seconds, Fetched: 12 row(s)
     hive (sumitpawar)> 
    
  3. Затем я использую приведенную ниже команду SQOOP для импорта данных в HIVE и ожидаю, чтостроки будут перемещены в соответствующие разделы

Sqoop

sqoop import \
--options-file ./options_file.txt \
--table customer2 \
--columns  'customer_id,customer_fname,customer_lname,customer_email,customer_password,customer_zipcode' \
--hive-database sumitpawar \
--hive-import \
--null-string 'Empty' \
--null-non-string 0 \
-m 2 \
--mapreduce-job-name "JOB :import to hive from mysql"  \
--warehouse-dir "/PRACTICALS/SQOP/retail_db/increment_hive_mysql4" \
--hive-partition-key customer_city \
--hive-partition-value 'CA'

содержимое файла_опций: option_file.txt

 [cloudera@quickstart SQOOP]$ cat options_file.txt 
  #############################################
 --connect
 jdbc:mysql://localhost:3306/retail_db

 --username
 root

 --password-file
 /PRACTICALS/SQOOP/password
 ############################################

Журнал выполнения команды SQOOP

    18/12/09 06:59:38 INFO mapreduce.Job: Running job: job_1542647782962_0231
    18/12/09 06:59:49 INFO mapreduce.Job: Job job_1542647782962_0231 running in uber mode : false
    18/12/09 06:59:49 INFO mapreduce.Job:  map 0% reduce 0%
    18/12/09 07:00:06 INFO mapreduce.Job:  map 50% reduce 0%
    18/12/09 07:00:07 INFO mapreduce.Job:  map 100% reduce 0%
    18/12/09 07:00:07 INFO mapreduce.Job: Job job_1542647782962_0231 completed successfully
    18/12/09 07:00:07 INFO mapreduce.Job: Counters: 30
                File System Counters
                        FILE: Number of bytes read=0
                        FILE: Number of bytes written=311480
                        FILE: Number of read operations=0
                        FILE: Number of large read operations=0
                        FILE: Number of write operations=0
                        HDFS: Number of bytes read=253
                        HDFS: Number of bytes written=220
                        HDFS: Number of read operations=8
                        HDFS: Number of large read operations=0
                        HDFS: Number of write operations=4
                  Job Counters 
                        Launched map tasks=2
                        Other local map tasks=2
                        Total time spent by all maps in occupied slots (ms)=29025
                        Total time spent by all reduces in occupied slots (ms)=0
                        Total time spent by all map tasks (ms)=29025
                        Total vcore-milliseconds taken by all map tasks=29025
                        Total megabyte-milliseconds taken by all map tasks=29721600
             Map-Reduce Framework
                        Map input records=6
                        Map output records=6
                        Input split bytes=253
                        Spilled Records=0
                        Failed Shuffles=0
                        Merged Map outputs=0
                        GC time elapsed (ms)=417
                        CPU time spent (ms)=2390
                        Physical memory (bytes) snapshot=276865024
                        Virtual memory (bytes) snapshot=3020857344
                        Total committed heap usage (bytes)=121765888
                  File Input Format Counters 
                        Bytes Read=0
                  File Output Format Counters 
                        Bytes Written=220
           18/12/09 07:00:07 INFO mapreduce.ImportJobBase: Transferred 220 bytes in 33.0333 seconds (6.66 bytes/sec)
           18/12/09 07:00:07 INFO mapreduce.ImportJobBase: Retrieved 6 records.
           18/12/09 07:00:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customer2` AS t LIMIT 1
           18/12/09 07:00:07 INFO hive.HiveImport: Loading uploaded data into Hive

           Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.12.0.jar!/hive-log4j.properties
           OK
           Time taken: 3.016 seconds
           Loading data to table sumitpawar.customer2 partition (customer_city=CA)
           Partition sumitpawar.customer2{customer_city=CA} stats: [numFiles=2, numRows=0, totalSize=440, rawDataSize=0] 
           OK
           Time taken: 1.056 seconds

------------------------------------------------------------------------------------------------

Однако после выполнения запроса к таблице в HIVE я мог видеть значения NULL для всех столбцов, кроме customer_city

Вывод из таблицы HIVE

hive (sumitpawar)> select * from customer2;
OK
+-------------+----------------+----------------+----------------+-----------   --------+-----------------+---------------+----------------+------------------+
| customer_id | customer_fname | customer_lname | customer_email |  customer_password | customer_street | customer_city | customer_state | customer_zipcode |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+
|      NULL | NULL      | NULL              | NULL           | NULL              |NULL              | NULL            | NULL             | NULL           |
+-------------+----------------+----------------+----------------+-------------------+-----------------+---------------+----------------+------------------+

Может кто-нибудь сообщить мне, если что-то не так в приведенном выше и как можно загрузить данные из таблиц HIVE с разделами SQOP по разделу?

С уважением, Сумит

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...