mrjob在hadoop集群上运行时出错

igsr9ssn  于 2021-06-04  发布在  Hadoop
关注(0)|答案(1)|浏览(371)

我正在尝试使用hadoop群集和mrjob运行python作业,我的 Package 器脚本如下所示:


# !/bin/bash

. /etc/profile
module load use.own
module load python/python2.7
module load python/mrjob

python path_to_python-script/mr_word_freq_count.py path_to_input_file/input.txt  -r hadoop  `> path_to_output_file/output.txt       #note the output file already exists before I submit the job`

所以一旦我使用qsub myscript.sh将这个脚本提交到集群
我得到两个文件一个输出文件和一个错误文件:
错误文件包含以下内容:

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
Traceback (most recent call last):
  File "homefolder/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
    MRWordFreqCount.run()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
    mr_job.execute()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
    super(MRJob, self).execute()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
    self.run_job()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 206, in run_job
    with self.make_runner() as runner:
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 541, in make_runner
    return super(MRJob, self).make_runner()
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 164, in make_runner
    return HadoopJobRunner(**self.hadoop_job_runner_kwargs())
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 179, in __init__
    super(HadoopJobRunner, self).__init__(**kwargs)
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/runner.py", line 352, in __init__
    self._opts = self.OPTION_STORE_CLASS(self.alias, opts, conf_paths)
  File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 132, in __init__
    'you must set $HADOOP_HOME, or pass in hadoop_home explicitly')
Exception: you must set $HADOOP_HOME, or pass in hadoop_home explicitly

第一个问题如何找到$hadoop主页?当我回显$hadoop\u home时,没有打印任何内容,这意味着它没有设置。所以即使我必须设定它,我必须设定它的路径是什么?是否应该将其设置为集群中hadoop name\u节点的路径?
第二个问题“找不到配置”错误表示什么?这与没有设置$hadoop\u home有关,还是希望显式传入其他配置文件?
任何帮助都将不胜感激。
提前谢谢!

kkih6yb8

kkih6yb81#

第一, $HADOOP_HOME 如果设置为机器的本地hadoop安装路径,几乎所有hadoop应用程序都假设 $HADOOP_HOME/bin/hadoop 是hadoop可执行文件。因此,如果您在系统默认路径中安装hadoop,您应该 export HADOOP_HOME=/usr/ ,否则你应该 export HADOOP_HOME=/path/to/hadoop 其次,您可以为mrjob提供特定的配置,如果没有,mrjob将使用自动配置。在大多数情况下,提供 HADOOP_HOME 并且使用自动配置就可以了,对于高级用户,请参考http://pythonhosted.org/mrjob/guides/configs-basics.html

相关问题