将nltk与hadoop集成时出错

8zzbczxx  于 2021-05-30  发布在  Hadoop
关注(0)|答案(0)|浏览(290)

我正在尝试将nltk与hadoop集成。基本上我想给这些词贴上标签。我尝试了以下链接:http://blog.cloudera.com/blog/2008/11/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/
但是,在运行mapreduce程序时,我仍然得到一个错误:

14/12/09 11:45:53 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201412091132_0004_m_000000
14/12/09 11:45:53 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

我的Map程序是:


# !/usr/bin/env python

import sys
import os
import re

# import sys

import zipimport

importer = zipimport.zipimporter('nltkandyaml.mod')
yaml = importer.load_module('yaml')
nltk = importer.load_module('nltk')

# input comes from STDIN (standard input)

for line in sys.stdin:

  line = line.strip()

  words = line.split()

  for word in words:

    a=nltk.pos_tag(word)
    print '%s\t%s' % (word, 1)

我使用的是和单词计数示例中相同的reducer程序。我是hadoop新手。请帮忙。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题