我正在使用cloudera虚拟机。以下是我的文件结构:
[cloudera@quickstart pydoop]$ hdfs dfs -ls -R /input
drwxr-xr-x - cloudera supergroup 0 2015-10-02 15:00 /input/test1
-rw-r--r-- 1 cloudera supergroup 62 2015-10-02 15:00 /input/test1/file1.txt
drwxr-xr-x - cloudera supergroup 0 2015-10-02 14:59 /input/test2
-rw-r--r-- 1 cloudera supergroup 1428841 2015-10-02 14:59 /input/test2/5000-8.txt
-rw-r--r-- 1 cloudera supergroup 674570 2015-10-02 14:59 /input/test2/pg20417.txt
-rw-r--r-- 1 cloudera supergroup 1573151 2015-10-02 14:59 /input/test2/pg4300.txt
下面是我执行wordcount示例的代码:
python /home/cloudera/MapReduceCode/mrjob/wordcount1.py -r hadoop hdfs://input/test1/file1.txt
它与以下内容崩溃。好像找不到文件。
[cloudera@quickstart hadoop]$ python /home/cloudera/MapReduceCode/mrjob/wordcount1.py -r hadoop hdfs://input/test1/file1.txt
no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
Traceback (most recent call last):
File "/home/cloudera/MapReduceCode/mrjob/wordcount1.py", line 13, in <module>
MRWordCount.run()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 461, in run
mr_job.execute()
File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 479, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/site-packages/mrjob/launch.py", line 153, in execute
self.run_job()
File "/usr/local/lib/python2.7/site-packages/mrjob/launch.py", line 216, in run_job
runner.run()
File "/usr/local/lib/python2.7/site-packages/mrjob/runner.py", line 470, in run
self._run()
File "/usr/local/lib/python2.7/site-packages/mrjob/hadoop.py", line 233, in _run
self._check_input_exists()
File "/usr/local/lib/python2.7/site-packages/mrjob/hadoop.py", line 247, in _check_input_exists
if not self.path_exists(path):
File "/usr/local/lib/python2.7/site-packages/mrjob/fs/composite.py", line 78, in path_exists
return self._do_action('path_exists', path_glob)
File "/usr/local/lib/python2.7/site-packages/mrjob/fs/composite.py", line 54, in _do_action
return getattr(fs, action)(path, *args,**kwargs)
File "/usr/local/lib/python2.7/site-packages/mrjob/fs/hadoop.py", line 212, in path_exists
ok_stderr=[_HADOOP_LS_NO_SUCH_FILE])
File "/usr/local/lib/python2.7/site-packages/mrjob/fs/hadoop.py", line 86, in invoke_hadoop
proc = Popen(args, stdout=PIPE, stderr=PIPE)
File "/usr/local/lib/python2.7/subprocess.py", line 709, in __init__
errread, errwrite)
File "/usr/local/lib/python2.7/subprocess.py", line 1326, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
1条答案
按热度按时间jvidinwx1#
请按照以下步骤操作
Cloudera Quickstart VM
让它工作。确保
HADOOP_HOME
已设置。export HADOOP_HOME=/usr/lib/hadoop
创建symlink
至**hadoop-streaming.jar
sudo ln -s /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop
使用hdfs:///
而不是hdfs://
python /home/cloudera/MapReduceCode/mrjob/wordcount1.py -r hadoop hdfs:///input/test1/file1.txt
下面是完整的mrjob
结果来自my cloudera quickstart VM
.注意:wordcount1.py&file1.txt的位置与您的不同,但这并不重要。