Apache Nutch updatehostdb работа не работает - PullRequest
0 голосов
/ 24 сентября 2019

У меня настроен Nutch 2.3.1 с Hadoop 2.7.9 и hbase 0.98.Я просканировал некоторый контент из Интернета.Когда я запускаю следующий запрос, я получаю исключение

bin/nutch updatehostdb -crawlId myTable

Исключение

2019-09-24 17:30:39,678 ERROR store.HBaseStore - org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family f does not exist in region a_host,,1569
328238519.2e906d55df83a6d7aaa9743fd07bcd4d. in table 'a_host', {NAME => 'il', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE'
, DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}, {N
AME => 'mtdt', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION 
=> 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}, {NAME => 'ol', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE =
> '65536', REPLICATION_SCOPE => '0'}
        at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:5555)
        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1935)
        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1915)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3171)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29941)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:110)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:90)
        at java.lang.Thread.run(Thread.java:748)

Обновление

У меня есть компиляция Nutch с Gora 0.6.1Следующая ошибка наблюдается в начале задания

2019-09-25 10:06:34,937 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'a_webpage'Assuming they are the same.
2019-09-25 10:06:35,213 ERROR store.HBaseStore - KeyClass in gora-hbase-mapping is not the same as the one in the databean.
2019-09-25 10:06:35,213 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'host'. PersistentClass schema's name: 'a_host'Assuming they are the same.
2019-09-25 10:06:35,372 WARN  store.HBaseStore - Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: 'a_host'Assuming they are the same.
...