flume没有将twitter数据写入/tmp/xx文件夹

tv6aics1  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(238)

我正在使用flume将twitter数据加载到hdfs位置。flume ng命令运行成功,显示如下消息:

[![18/06/24 22:52:33 INFO twitter.TwitterSource: Processed 17,500 docs
18/06/24 22:52:37 INFO twitter.TwitterSource: Processed 17,600 docs
18/06/24 22:52:39 INFO hdfs.BucketWriter: Closing hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp
18/06/24 22:52:39 INFO hdfs.BucketWriter: Renaming hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp to hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675
18/06/24 22:52:39 INFO hdfs.HDFSEventSink: Writer callback called.
18/06/24 22:52:40 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/06/24 22:52:40 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/tmp/pk/FlumeData.1529905960074.tmp
18/06/24 22:52:40 INFO twitter.TwitterSource: Processed 17,700 docs
18/06/24 22:52:44 INFO twitter.TwitterSource: Processed 17,800 docs
18/06/24 22:52:47 INFO twitter.TwitterSource: Processed 17,900 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Processed 18,000 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Total docs indexed: 18,000, total skipped docs: 0
18/06/24 22:52:51 INFO twitter.TwitterSource:     29 docs/second
18/06/24 22:52:51 INFO twitter.TwitterSource: Run took 618 seconds and processed:
18/06/24 22:52:51 INFO twitter.TwitterSource:     0.008 MB/sec sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource:     4.859 MB text sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource: There were 0 exceptions ignored: 
18/06/24 22:52:54 INFO twitter.TwitterSource: Processed 18,100 docs
18/06/24 22:52:57 INFO twitter.TwitterSource: Processed 18,200 docs
18/06/24 22:53:00 INFO twitter.TwitterSource: Processed 18,300 docs
18/06/24 22:53:04 INFO twitter.TwitterSource: Processed 18,400 docs
18/06/24 22:53:07 INFO twitter.TwitterSource: Processed 18,500 docs
18/06/24 22:53:10 INFO twitter.TwitterSource: Processed 18,600 docs
18/06/24 22:53:14 INFO twitter.TwitterSource: Processed 18,700 docs
18/06/24 22:53:17 INFO twitter.TwitterSource: Processed 18,800 docs
18/06/24 22:53:21 INFO twitter.TwitterSource: Processed 18,900 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Processed 19,000 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Total docs indexed: 19,000, total skipped docs: 0
18/06/24 22:53:24 INFO twitter.TwitterSource:     29 docs/second][1]][1]

但输出中没有生成文件 hdfs 文件夹。也没有引发异常。
有人能帮我吗。
下面是 conf 文件:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

# Use CLoudera Twitter Source;

# place your consumerKey and accessToken details here

# Describing/Configuring the source

# TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey=xxx
TwitterAgent.sources.Twitter.consumerSecret=xxx
TwitterAgent.sources.Twitter.accessToken=xxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxx
TwitterAgent.sources.Twitter.maxBatchSize = 1000
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 1000
TwitterAgent.sources.Twitter.keywords=harry kane

# Use a channel which buffers events in memory

TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=100
TwitterAgent.channels.MemChannel.transactionCapacity=100

# Describing/Configuring the sink

TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/tmp/pk
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=100
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=1000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600

# Bind the source and sink to the channel

TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题