未能附加到hdfs

ctzwtxfj 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(534)

我有一个类，负责从数据源接收一些批处理数据，并将这些数据的序列化内容写入一个文件（总是同一个文件）。为此，当我创建一个示例时，我要做的第一件事就是检查这个文件是否存在，如果不存在就创建它。这样创建文件似乎没有问题，但当我尝试使用方法将序列化对象附加到文件时，问题就出现了 onOperationsBatchSynchronization .
这是上述类别的代码：

public class HDFSSpaceSynchronizationEndpoint extends SpaceSynchronizationEndpoint {

    private final static Logger LOG = LoggerFactory.getLogger(HDFSSpaceSynchronizationEndpoint.class);
    private final String uriToFileToWrite;
    private final HDFSFileUtil hdfsFileUtil;

    public HDFSSpaceSynchronizationEndpoint(HDFSFileUtil hdfsFileUtil) {
        Validate.notNull(hdfsFileUtil);
        this.hdfsFileUtil = hdfsFileUtil;
        uriToFileToWrite = hdfsFileUtil.getUriToHdfs() + "/object-container";
        createFileIfNeeded();
    }

    private void createFileIfNeeded() {
        final String methodName = "createFileIfNeeded";
        synchronized (this) {
            try {
                if (!hdfsFileUtil.fileExistsInCluster(uriToFileToWrite)) {
                    hdfsFileUtil.createFileInCluster(uriToFileToWrite);
                }
            } catch (IOException e) {
                LOG.error(methodName, "", "Error creating the file in the cluster: {}", e);
            }
        }
    }

    @Override
    public void onOperationsBatchSynchronization(OperationsBatchData batchData) {
        final String methodName = "onOperationsBatchSynchronization";
        LOG.error(methodName, "", "Batch operation received: {}", batchData.getSourceDetails().getName());
        DataSyncOperation[] operations = batchData.getBatchDataItems();
        synchronized (this) {
            for (DataSyncOperation operation : operations) {
                try {
                    hdfsFileUtil.writeObjectToAFile((Serializable) operation.getDataAsObject(), uriToFileToWrite);
                } catch (IOException e) {
                    LOG.error(methodName, "", "Error writing the object to a file in the cluster: {}", e);
                }
            }
        }
    }
}

这是负责与空间互动的班级的代码：

public class HDFSFileUtilImpl implements HDFSFileUtil {

    private final static Logger LOG = LoggerFactory.getLogger(HDFSFileUtilImpl.class);
    private final static boolean DELETE_RECURSIVELY = true;
    private final String uriToHdfs;
    private final FileSystem fileSystem;

    public HDFSFileUtilImpl(HDFSConfiguration config, String uriToHdfs, String user) {
        Validate.notNull(config);
        Validate.notEmpty(uriToHdfs);
        Validate.notEmpty(user);
        this.uriToHdfs = uriToHdfs;
        try {
            fileSystem = FileSystem.get(new URI(uriToHdfs), config.getConfiguration(), user);
        } catch (IOException | URISyntaxException | InterruptedException e) {
            LOG.error("constructor", "", "HDFSFileUtilImpl constructor failed: {}", e);
            throw new IllegalStateException(e);
        }
    }

    @Override
    public String getUriToHdfs() {
        return uriToHdfs;
    }

    @Override
    public void writeObjectToAFile(Serializable obj, String fileUri) throws   IOException {
        Validate.notNull(obj);
        Validate.notEmpty(fileUri);
        FSDataOutputStream out;
        if (!fileExistsInCluster(fileUri)) {
            throw new IllegalArgumentException("File with URI: " + fileUri + " does not exist in the cluster");
        }
        out = fileSystem.append(new Path(fileUri));
        byte[] objByteArray = getBytesFromObject(obj);
        out.write(objByteArray);
        out.close();
    }

    private byte[] getBytesFromObject(Object obj) throws IOException {
        byte[] retByteArray = null;
        // try/catch used only to be able to use "try with resources" feature
        try (ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutput out = new ObjectOutputStream(bos);) {
            out.writeObject(obj);
            retByteArray = bos.toByteArray();
        } catch (IOException e) {
            throw new IOException(e);
        }
        return retByteArray;
    }

     @Override
     public void createFileInCluster(String uriOfFile) throws IOException {
          Validate.notEmpty(uriOfFile);
          fileSystem.create(new Path(uriOfFile));
     }

    @Override
    public boolean fileExistsInCluster(String uri) throws IOException {
        Validate.notEmpty(uri);
        boolean result = false;
        result = fileSystem.exists(new Path(uri));
        return result;
    }

    ...
}

数据源与我的组件和方法建立了三个连接 onOperationsBatchSynchronization 以并发方式被调用。这就是使用同步块的原因，但即使使用了同步块，我也会从日志中得到以下异常：

10:09:23.727  ERROR  - onOperationsBatchSynchronization
org.apache.hadoop.ipc.RemoteException: Failed to create file [/object-container] for [DFSClient_NONMAPREDUCE_1587728611_73] for client [127.0.0.1], because this file is already being created by [DFSClient_NONMAPREDUCE_1972611521_106] on [127.0.0.1]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2636)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2462)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2700)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2663)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:559)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:388)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

那么这里有什么问题呢？我有一些单元测试（更像是集成，因为它们依赖于正在运行的hadoop设置）和上面的所有方法 HDFSFileUtilImpl 工作正常并产生预期结果。
编辑：我只是尝试在集群中编写文件，而不是附加到同一个文件，它工作得很好。所以我会放弃任何权限问题。

Java hadoop hdfs multithreading Client

来源：https://stackoverflow.com/questions/27314412/failed-to-append-to-hdfs

1条答案

按热度按时间

vecaoik11#

终于摆脱了错误。显然必须关闭 FSDataOutputStream 当你打电话的时候会被退回 create 从 filesystem .
这就是说，这是怎样的方法 createFileInCluster 从hdfsfileutilimpl现在开始实现：

@Override
 public void createFileInCluster(String uriOfFile) throws IOException {
      Validate.notEmpty(uriOfFile);
      FSDataOutputStream out = fileSystem.create(new Path(uriOfFile));
      out.close();
 }

赞(0）回复(0）举报 2021-05-30

我来回答

未能附加到hdfs

1条答案

相关问题

热门标签

最新问答