Ошибка при потоковой передаче данных из Twitter с использованием Apache FLume - PullRequest
0 голосов
/ 13 июня 2019

Я пытаюсь транслировать твиттер-каналы в hdfs, а затем использовать улей. Но первая часть, потоковая передача данных и загрузка в hdfs, не работает и дает ParserException.

  1. Я скачал apache-flume-1.9.0-bin. все содержимое находится в / usr / lib /

  2. Перемещено в каталог conf /. Я скопировал файл flume-env.sh.template как flume-env.sh и отредактировал JAVA_HOME по своему пути к java /usr/lib/jvm/java-1.8.0-openjdk-amd64

  3. Я создал twitter.conf следующим образом:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel

TwitterAgent.sources.Twitter.consumerKey = BOIDeKDOluUI6D2cN0GkdaM3Z
TwitterAgent.sources.Twitter.consumerSecret = hVDpBQbFOWk6sYGSLAhRrmVUT9mN3LCGONosx4nIqiiGeGfoMZ
TwitterAgent.sources.Twitter.accessToken = 1137972756828499968-xGItReLqUsQy0aIcP4aCHVC3HGuGtY
TwitterAgent.sources.Twitter.accessTokenSecret = rAb1IVakREloLFpPhGSBTp8tlfTutZnPWxridanQ2gdW1

TwitterAgent.sources.Twitter.keywords = hadoop, bigdata, cricket, worlcup

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600

TwitterAgent.channels.Memchannel.type = memory
TwitterAgent.channels.Memchannel.capacity = 10000
TwitterAgent.channels.Memchannel.transactionCapacity = 1000

TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sinks.HDFS.channel=MemChannel

Я выполнил следующую команду: Агент bin / flume-ng --conf conf --conf-file twitter.conf --name TwitterAgent -Dflume.root.logger = INFO, консоль

Я получаю ошибку:

Info: Sourcing environment configuration script /usr/lib/apache-flume-1.9.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/home/vaishali/hadoop-2.7.3/bin/hadoop) for HDFS access
Info: Including Hive libraries found via (/home/vaishali/apache-hive-2.1.0-bin) for Hive access
+ exec /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx20m -cp '/usr/lib/apache-flume-1.9.0-bin/conf:/usr/lib/apache-flume-1.9.0-bin/lib/*:/home/vaishali/hadoop-2.7.3/etc/hadoop:/home/vaishali/hadoop-2.7.3/share/hadoop/common/lib/*:/home/vaishali/hadoop-2.7.3/share/hadoop/common/*:/home/vaishali/hadoop-2.7.3/share/hadoop/hdfs:/home/vaishali/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/home/vaishali/hadoop-2.7.3/share/hadoop/hdfs/*:/home/vaishali/hadoop-2.7.3/share/hadoop/yarn/lib/*:/home/vaishali/hadoop-2.7.3/share/hadoop/yarn/*:/home/vaishali/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/home/vaishali/hadoop-2.7.3/share/hadoop/mapreduce/*:/home/vaishali/hadoop-2.7.3/contrib/capacity-scheduler/*.jar:/home/vaishali/apache-hive-2.1.0-bin/lib/*' -Djava.library.path=:/home/vaishali/hadoop-2.7.3/lib/native org.apache.flume.node.Application -n TwitterAgent -f conf/twitter.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/apache-flume-1.9.0-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/vaishali/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/vaishali/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-06-15 15:45:41,606 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL keystore path specified.
2019-06-15 15:45:41,614 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL keystore password specified.
2019-06-15 15:45:41,615 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL keystore type specified.
2019-06-15 15:45:41,615 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL truststore path specified.
2019-06-15 15:45:41,620 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL truststore password specified.
2019-06-15 15:45:41,620 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL truststore type specified.
2019-06-15 15:45:41,620 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL include protocols specified.
2019-06-15 15:45:41,620 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL exclude protocols specified.
2019-06-15 15:45:41,620 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL include cipher suites specified.
2019-06-15 15:45:41,620 (main) [DEBUG - org.apache.flume.util.SSLUtil.initSysPropFromEnvVar(SSLUtil.java:95)] No global SSL exclude cipher suites specified.
2019-06-15 15:45:42,199 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting
2019-06-15 15:45:42,202 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:79)] Configuration provider started
2019-06-15 15:45:42,206 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes
2019-06-15 15:45:42,207 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:138)] Reloading configuration file:conf/twitter.conf
2019-06-15 15:45:42,244 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,245 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1207)] Created context for HDFS: hdfs.rollInterval
2019-06-15 15:45:42,249 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Twitter
2019-06-15 15:45:42,250 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1207)] Created context for Twitter: accessToken
2019-06-15 15:45:42,250 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Twitter
2019-06-15 15:45:42,250 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,250 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,251 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Memchannel
2019-06-15 15:45:42,251 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1207)] Created context for Memchannel: type
2019-06-15 15:45:42,251 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Memchannel
2019-06-15 15:45:42,251 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Memchannel
2019-06-15 15:45:42,251 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,252 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,252 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,252 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1117)] Added sinks: HDFS Agent: TwitterAgent
2019-06-15 15:45:42,252 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Twitter
2019-06-15 15:45:42,253 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Twitter
2019-06-15 15:45:42,253 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,253 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,253 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Twitter
2019-06-15 15:45:42,253 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Twitter
2019-06-15 15:45:42,253 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:Twitter
2019-06-15 15:45:42,254 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:HDFS
2019-06-15 15:45:42,254 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:350)] Starting validation of configuration for agent: TwitterAgent
2019-06-15 15:45:42,255 (conf-file-poller-0) [INFO - org.apache.flume.conf.LogPrivacyUtil.<clinit>(LogPrivacyUtil.java:51)] Logging of configuration details is disabled. To see configuration details in the log run the agent with -Dorg.apache.flume.log.printconfig=true JVM argument. Please note that this is not recommended in production systems as it may leak private information to the logfile.
2019-06-15 15:45:42,255 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateConfigFilterSet(FlumeConfiguration.java:623)] Agent configuration for 'TwitterAgent' has no configfilters.
2019-06-15 15:45:42,287 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:373)] Agent configuration for 'TwitterAgent' does not contain any valid channels. Marking it as invalid.
2019-06-15 15:45:42,287 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:154)] Agent configuration invalid for agent 'TwitterAgent'. It will be removed.
2019-06-15 15:45:42,288 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:158)] Channels:MemChannel

2019-06-15 15:45:42,288 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:159)] Sinks HDFS

2019-06-15 15:45:42,288 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:160)] Sources Twitter

2019-06-15 15:45:42,288 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:163)] Post-validation flume configuration contains configuration for agents: []
2019-06-15 15:45:42,288 (conf-file-poller-0) [WARN - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:139)] No configuration found for this host:TwitterAgent
2019-06-15 15:45:42,340 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:162)] Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} }
2019-06-15 15:46:12,351 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes
2019-06-15 15:46:42,352 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes
2019-06-15 15:47:12,353 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes
2019-06-15 15:47:42,354 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes
2019-06-15 15:48:12,354 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes
2019-06-15 15:48:42,355 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes
2019-06-15 15:49:12,356 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:131)] Checking file:conf/twitter.conf for changes

Почему он снова и снова проверяет изменения?

А нет никаких конфигурационных файлов для TwitterAgent?

...