我在虚拟机中创建了一个单节点hdfs( hadoop.master
,ip地址: 192.168.12.52
). 文件 etc/hadoop/core-site.xml
名称节点具有以下配置:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master.hadoop:9000/</value>
</property>
</configuration>
我想从本地物理桌面上的hdfs读取一个文件。为此,这是我的代码,我保存在一个名为 hdfs_read.py
:
from hdfs import InsecureClient
client = InsecureClient('http://192.168.12.52:9000')
with client.read('/opt/hadoop/LICENSE.txt') as reader:
features = reader.read()
print(features)
现在,当我运行它时,我得到以下超时错误:
$ python3 hdfs_read.py
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
(self.host, self.port), self.timeout,**extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 91, in create_connection
raise err
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 81, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url,**httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f2d88cef2b0>: Failed to establish a new connection: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='192.168.12.52', port=9000): Max retries exceeded with url: /webhdfs/v1/home/edhuser/testdata.txt?user.name=embs&offset=0&op=OPEN (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2d88cef2b0>: Failed to establish a new connection: [Errno 113] No route to host',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "hdfs_read_local.py", line 3, in <module>
with client.read('/home/edhuser/testdata.txt') as reader:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 678, in read
buffersize=buffer_size,
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 118, in api_handler
raise err
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 107, in api_handler
**self.kwargs
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 207, in _request
**kwargs
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep,**send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request,**kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='192.168.12.52', port=9000): Max retries exceeded with url: /webhdfs/v1/home/edhuser/testdata.txt?user.name=embs&offset=0&op=OPEN (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2d88cef2b0>: Failed to establish a new connection: [Errno 113] No route to host',))
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
(self.host, self.port), self.timeout,**extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 91, in create_connection
raise err
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 81, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url,**httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f2d88cef2b0>: Failed to establish a new connection: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='192.168.12.52', port=9000): Max retries exceeded with url: /webhdfs/v1/home/edhuser/testdata.txt?user.name=embs&offset=0&op=OPEN (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2d88cef2b0>: Failed to establish a new connection: [Errno 113] No route to host',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "hdfs_read.py", line 3, in <module>
with client.read('/home/edhuser/testdata.txt') as reader:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 678, in read
buffersize=buffer_size,
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 118, in api_handler
raise err
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 107, in api_handler
**self.kwargs
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 207, in _request
**kwargs
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep,**send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request,**kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='192.168.12.52', port=9000): Max retries exceeded with url: /webhdfs/v1/home/edhuser/testdata.txt?user.name=embs&offset=0&op=OPEN (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2d88cef2b0>: Failed to establish a new connection: [Errno 113] No route to host',))
如何修复此连接问题?我用错端口了吗?我认为namenode使用的端口是在 core-site.xml
,我在上面已经说明了 9000
为了港口。无论如何,我已经尝试了所有默认端口 50070
, 8020
, 8048
在hadoop安装文档中提到了各种用途,我还是得到了同样的错误。而不是 client = InsecureClient('http://192.168.12.52:9000')
,我应该用吗 client = InsecureClient('hdfs://192.168.12.52:9000')
,或者 client = InsecureClient('file:///192.168.12.52:9000')
或者类似的?我在不同的时间在其他地方见过这些。
我可以顺便访问web中的hdfs,如下面的屏幕截图所示:
另外,即使连接成功,我想我可能没有给出正确的文件路径( /opt/hadoop/README.txt
). 我给出了这个文件路径,因为这是我在hadoop安装目录中搜索文件和目录列表时看到的 /opt/hadoop
:
$ ls /opt/hadoop/
bin lib read_from_hdfs.py write_to_hdfs_2.py
connect_to_hdfs.py libexec README.txt write_to_hdfs3.py
etc LICENSE.txt sbin write_to_hdfs.py
hdfs_read_write.py logs share
include NOTICE.txt test_storage
但是我知道hdfs是独立的,也许我通过 hdfs dfs -get /test_storage/ ./
之前,这就是为什么它显示这些文件。但是当我搜索namenode路径中的文件时,它会返回一些难以辨认的文件:
$ls /opt/volume/namenode/current/
edits_0000000000000000001-0000000000000000002
edits_0000000000000000003-0000000000000000010
edits_0000000000000000011-0000000000000000012
edits_0000000000000000013-0000000000000000015
edits_0000000000000000016-0000000000000000023
edits_0000000000000000024-0000000000000000025
edits_0000000000000000026-0000000000000000032
edits_0000000000000000033-0000000000000000033
edits_0000000000000000034-0000000000000000035
edits_0000000000000000036-0000000000000000037
edits_0000000000000000038-0000000000000000039
edits_0000000000000000040-0000000000000000041
edits_0000000000000000042-0000000000000000043
edits_0000000000000000044-0000000000000000045
edits_0000000000000000046-0000000000000000047
edits_0000000000000000048-0000000000000000049
edits_0000000000000000050-0000000000000000051
edits_0000000000000000052-0000000000000000053
edits_0000000000000000054-0000000000000000055
edits_0000000000000000056-0000000000000000057
edits_0000000000000000058-0000000000000000059
edits_0000000000000000060-0000000000000000061
edits_0000000000000000062-0000000000000000063
edits_0000000000000000064-0000000000000000065
edits_0000000000000000066-0000000000000000067
edits_0000000000000000068-0000000000000000070
edits_0000000000000000071-0000000000000000072
edits_0000000000000000073-0000000000000000074
edits_0000000000000000075-0000000000000000076
edits_0000000000000000077-0000000000000000078
edits_inprogress_0000000000000000079
fsimage_0000000000000000076
fsimage_0000000000000000076.md5
fsimage_0000000000000000078
fsimage_0000000000000000078.md5
seen_txid
VERSION
那么,如果我指定的文件路径读取错误,正确的文件路径是什么?
编辑:将端口更改为50070(即。, client = InsecureClient('http://192.168.12.52:50070')
),我得到以下错误:
$ python3 hdfs_read_local.py
Traceback (most recent call last):
File "hdfs_read.py", line 3, in <module>
with client.read('/opt/hadoop/LICENSE.txt') as reader:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 678, in read
buffersize=buffer_size,
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 112, in api_handler
raise err
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 107, in api_handler
**self.kwargs
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 210, in _request
_on_error(response)
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 50, in _on_error
raise HdfsError(message, exception=exception)
hdfs.util.HdfsError: File /opt/hadoop/LICENSE.txt not found.
edit2:修改文件路径时 /opt/hadoop/LICENSE.txt
至 /test_storage/LICENSE.txt
,这似乎是正确的hdfs路径,并且运行python脚本,我得到以下错误:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 137, in _new_conn
(self.host, self.port), self.timeout,**extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 91, in create_connection
raise err
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 81, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 560, in urlopen
body=body, headers=headers)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
conn.request(method, url,**httplib_request_kw)
File "/usr/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 162, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 146, in _new_conn
self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x7f2e87867400>: Failed to establish a new connection: [Errno 113] No route to host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 376, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 610, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 273, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='pr2.embs', port=50075): Max retries exceeded with url: /webhdfs/v1/test_storage/LICENSE.txt?op=OPEN&user.name=embs&namenoderpcaddress=192.168.12.52:9000&offset=0 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2e87867400>: Failed to establish a new connection: [Errno 113] No route to host',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "hdfs_read_local.py", line 3, in <module>
with client.read('/test_storage/LICENSE.txt') as reader:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 678, in read
buffersize=buffer_size,
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 118, in api_handler
raise err
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 107, in api_handler
**self.kwargs
File "/home/embs/.local/lib/python3.6/site-packages/hdfs/client.py", line 207, in _request
**kwargs
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 468, in request
resp = self.send(prep,**send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 597, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 597, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 195, in resolve_redirects
**adapter_kwargs
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 576, in send
r = adapter.send(request,**kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='pr2.embs', port=50075): Max retries exceeded with url: /webhdfs/v1/test_storage/LICENSE.txt?op=OPEN&user.name=embs&namenoderpcaddress=192.168.12.52:9000&offset=0 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2e87867400>: Failed to establish a new connection: [Errno 113] No route to host',))
4条答案
按热度按时间wsewodh21#
嗨,我也面临类似的问题。看来波特是对的。在我的情况下,我可以得到目录列表,但不能写任何数据。问题是在我的vpn,其中阻止了一些端口,读写使用不同的一个。
ac1kyiln2#
如这里所述,这个python库使用的是webhdfs。如果要测试主机和文件路径是否正确,可以使用以下命令
curl -i 'http://192.168.12.52:50070/webhdfs/v1/<PATH>?op=LISTSTATUS'
. 这将在hdfs中列出一个目录。如果正确的话,可以在python中使用相同的“config”。f5emj3cl3#
网络配置可能有问题。暂时试试这个调整过的代码:
了解ip地址0.0.0.0
kqhtkvqz4#
http://192.168.12.52:9000
9000是rpc端口。50070是默认的http webhdfs端口。你可能会
No route to host
如果webhdfs被禁用,或者datanode由于关闭而没有公开端口50075(datanode http地址),或者您更改了该属性client.read('/opt/hadoop/LICENSE.txt')
您正在伪分布式模式下运行hdfs,但您正在尝试读取本地文件。/opt
默认情况下,hdfs中不存在,并且您只运行了一个本地ls
... 你应该改用hadoop fs -ls /opt
查看您试图打开的路径中确实存在哪些文件但是当我搜索namenode路径中的文件时,它会返回一些难以辨认的文件:
您的文件未存储在namenode中。。。他们的元数据是
您的文件存储在datanode数据目录中,但作为块,而不是人类可读的内容
可以运行此命令以获取所有块及其位置的列表