三、hdfs的JavaAPI操作

下文展示Java的API如何操作hdfs，在这之前你需要先安装配置好hdfs

https://www.cnblogs.com/lay2017/p/9919905.html

依赖

你需要引入依赖如下

<dependency>

    <groupId>org.apache.hadoop</groupId>

    <artifactId>hadoop-common</artifactId>

    <version>2.8.0</version>

</dependency>

<dependency>

    <groupId>org.apache.hadoop</groupId>

    <artifactId>hadoop-hdfs</artifactId>

    <version>2.8.0</version>

</dependency>

配置修改

core-site.xml

由于Java访问hdfs始终都要通过nameNode来拿到dataNode节点，所以nameNode要配置为可对外访问的地址，不能是localhost了

我们更改core-site.xml配置为虚拟机IP即可，我们虚拟机要作为类似服务器的方式来使用，所以虚拟机网络配置要是桥接模式，这样才有独立IP，并可对宿主机提供访问

注：如果你不希望使用IP而是希望像使用域名一样来配置，如：

<property>

    <name>fs.defaultFS</name>

    <value>master:9000</value>

</property>

你可以通过配置虚拟机hostname为master，并配置与IP的解析映射，宿主机配置IP的解析即可，参考：

https://www.cnblogs.com/lay2017/p/9953371.html

hdfs-site.xml

另外，对于服务端文件的操作默认会检查权限，所以为了方便，你可以配置hdfs-site.xml关闭

<!-- permissions  -->

<property>

    <name>dfs.permissions</name>

    <value>false</value>

</property>

示例代码

下面是Java示例代码，对外的API主要是由FileSystem这个抽象类来提供，它的java docs在：http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html

你可以查看Java docs阅读更多地API，这里演示常用的上传、下载、删除、创建文件夹、列出文件、列出文件和文件夹

package cn.lay.demo.hdfs;

import org.apache.commons.io.IOUtils;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.*;

import java.io.FileOutputStream;

import java.io.IOException;

/**

 * @Description hdfs java api操作示例

 * @Author lay

 * @Date 2018/11/8 0:04

 */

public class HdfsJavaApiDemo {

    // nameNode节点地址

    private static final String NAME_NODE = "hdfs://192.168.1.12:9000";

    private static Configuration configuration;

    private static FileSystem fileSystem;

    // 本地文件

    private static final String LOCAL_FILE = "C:\\Users\\admin\\Desktop\\helloHdfs.txt";

    // 远程文件

    private static final String REMOTE_FILE = "/helloHdfs.txt";

    // 下载文件

    private static final String DOWNLOAD_FILE = "C:\\Users\\admin\\Desktop\\download.txt";

    // 远程的文件夹

    private static final String REMOTE_DIR = "/newDir/newChildDir";

    static {

        configuration = new Configuration();

        configuration.set("fs.defaultFS", NAME_NODE);

        try {

            fileSystem = FileSystem.get(configuration);

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    public static void main(String[] args) {

        try {

            // upload();

            // download();

            // remove();

            // mkdirs();

            // listFiles();

            listStatus();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

    /**

     * 文件上传

     * @throws IOException

     */

    public static void upload() throws IOException {

        fileSystem.copyFromLocalFile(new Path(LOCAL_FILE), new Path(REMOTE_FILE));

    }

    /**

     * 文件下载

     * @throws IOException

     */

    public static void download() throws IOException {

        // fileSystem.copyToLocalFile(new Path(REMOTE_FILE), new Path(DOWNLOAD_FILE));

        FSDataInputStream fsDataInputStream = fileSystem.open(new Path(REMOTE_FILE));

        FileOutputStream fileOutputStream = new FileOutputStream(DOWNLOAD_FILE);

        IOUtils.copy(fsDataInputStream, fileOutputStream);

        fsDataInputStream.close();

        fileOutputStream.flush();

        fileOutputStream.close();

    }

    /**

     * 删除

     * @throws IOException

     */

    public static void remove() throws IOException {

        // 递归

        boolean recursive = true;

        fileSystem.delete(new Path(REMOTE_FILE), recursive);

    }

    /**

     * 创建文件夹

     * @throws IOException

     */

    public static void mkdirs() throws IOException {

        fileSystem.mkdirs(new Path(REMOTE_DIR));

    }

    /**

     * 列出文件内容

     * @throws IOException

     */

    public static void listFiles() throws IOException {

        RemoteIterator<LocatedFileStatus> fileStatusList = fileSystem.listFiles(new Path("/"), true);

        while (fileStatusList.hasNext()) {

            LocatedFileStatus fileStatus = fileStatusList.next();

            String path = fileStatus.getPath().toString();

            System.out.println("path:" + path);

        }

    }

    /**

     * 列出文件夹和文件

     * @throws IOException

     */

    public static void listStatus() throws IOException {

        FileStatus[] fileStatuses = fileSystem.listStatus(new Path("/"));

        for (FileStatus f : fileStatuses) {

            System.out.println("path:" + f.getPath());

        }

    }

}

注意：文件下载采用了IO流的方式，而不是copyToLocalFile方法，因为该方法需要本地hadoop的环境配置，否则你会看到类似这样的错误：

java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset

并且下载的文件总是0byte，如果遇到这样的问题你可以在hdfs服务端查看目录和文件检查服务端文件有没有问题，命令：

hadoop fs -ls /

其它API

当然hdfs也提供http的方式去访问，可以参考：http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

官方文档的HDFS示例讲解是基于C语言的：http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/LibHdfs.html

当然你可以在hadoop的Javadocs里面阅读，不过不太方便因为你需要了解它的包和类：http://hadoop.apache.org/docs/current/api/