Flume не записывает данные из твиттера в папку / tmp / xx - PullRequest
0 голосов
/ 25 июня 2018

Я загружаю данные из твиттера, используя flume, в папку hdfs. Команда flume-ng успешно выполнена и показывает следующее сообщение:

[![18/06/24 22:52:33 INFO twitter.TwitterSource: Processed 17,500 docs
18/06/24 22:52:37 INFO twitter.TwitterSource: Processed 17,600 docs
18/06/24 22:52:39 INFO hdfs.BucketWriter: Closing hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp
18/06/24 22:52:39 INFO hdfs.BucketWriter: Renaming hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp to hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675
18/06/24 22:52:39 INFO hdfs.HDFSEventSink: Writer callback called.
18/06/24 22:52:40 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/06/24 22:52:40 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/tmp/pk/FlumeData.1529905960074.tmp
18/06/24 22:52:40 INFO twitter.TwitterSource: Processed 17,700 docs
18/06/24 22:52:44 INFO twitter.TwitterSource: Processed 17,800 docs
18/06/24 22:52:47 INFO twitter.TwitterSource: Processed 17,900 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Processed 18,000 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Total docs indexed: 18,000, total skipped docs: 0
18/06/24 22:52:51 INFO twitter.TwitterSource:     29 docs/second
18/06/24 22:52:51 INFO twitter.TwitterSource: Run took 618 seconds and processed:
18/06/24 22:52:51 INFO twitter.TwitterSource:     0.008 MB/sec sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource:     4.859 MB text sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource: There were 0 exceptions ignored: 
18/06/24 22:52:54 INFO twitter.TwitterSource: Processed 18,100 docs
18/06/24 22:52:57 INFO twitter.TwitterSource: Processed 18,200 docs
18/06/24 22:53:00 INFO twitter.TwitterSource: Processed 18,300 docs
18/06/24 22:53:04 INFO twitter.TwitterSource: Processed 18,400 docs
18/06/24 22:53:07 INFO twitter.TwitterSource: Processed 18,500 docs
18/06/24 22:53:10 INFO twitter.TwitterSource: Processed 18,600 docs
18/06/24 22:53:14 INFO twitter.TwitterSource: Processed 18,700 docs
18/06/24 22:53:17 INFO twitter.TwitterSource: Processed 18,800 docs
18/06/24 22:53:21 INFO twitter.TwitterSource: Processed 18,900 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Processed 19,000 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Total docs indexed: 19,000, total skipped docs: 0
18/06/24 22:53:24 INFO twitter.TwitterSource:     29 docs/second][1]][1]

Но в выходной папке hdfs нет файла, сгенерированного. И не исключение также брошено.

кто-нибудь, пожалуйста, помогите мне в этом.

ниже - файл conf:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

# Use CLoudera Twitter Source;
# place your consumerKey and accessToken details here
# Describing/Configuring the source
#TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey=xxx
TwitterAgent.sources.Twitter.consumerSecret=xxx
TwitterAgent.sources.Twitter.accessToken=xxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxx
TwitterAgent.sources.Twitter.maxBatchSize = 1000
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 1000
TwitterAgent.sources.Twitter.keywords=harry kane
# Use a channel which buffers events in memory
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=100
TwitterAgent.channels.MemChannel.transactionCapacity=100

# Describing/Configuring the sink 
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/tmp/pk
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=100
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=1000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600

# Bind the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
...