本文地址:http://www.cnblogs.com/archimedes/p/hadoop-filesystem-io.html,转载请注明源地址。

hadoop借鉴了Linux虚拟文件系统的概念,引入了hadoop抽象文件系统,并在此基础上,提供了大量的具体文件系统的实现,满足构建于hadoop上应用的各种数据访问需求

hadoop文件系统API

hadoop提供一个抽象的文件系统,HDFS只是这个抽象文件系统的一个具体的实现。hadoop文件系统的抽象类org.apache.hadoop.fs.FileSystem

hadoop抽象文件系统的方法可以分为两部分:

1、用于处理文件和目录的相关事务

2、用于读写文件数据

hadoop抽象文件系统的操作

HadoopFileSystem

Java操作

Linux操作

描述

URL.openSteam

FileSystem.open

FileSystem.create

FileSystem.append

URL.openStream

open

打开一个文件

FSDataInputStream.read

InputSteam.read

read

读取文件中的数据

FSDataOutputStream.write

OutputSteam.write

write

向文件写入数据

FSDataInputStream.close

FSDataOutputStream.close

InputSteam.close

OutputSteam.close

close

关闭一个文件

FSDataInputStream.seek

RandomAccessFile.seek

lseek

改变文件读写位置

FileSystem.getFileStatus

FileSystem.get*

File.get*

stat

获取文件/目录的属性

FileSystem.set*

File.set*

Chmod等

改变文件的属性

FileSystem.createNewFile

File.createNewFile

create

创建一个文件

FileSystem.delete

File.delete

remove

从文件系统中删除一个文件

FileSystem.rename

File.renameTo

rename

更改文件/目录名

FileSystem.mkdirs

File.mkdir

mkdir

在给定目录下创建一个子目录

FileSystem.delete

File.delete

rmdir

从一个目录中删除一个空的子目录

FileSystem.listStatus

File.list

readdir

读取一个目录下的项目

FileSystem.getWorkingDirectory

getcwd/getwd

返回当前工作目录

FileSystem.setWorkingDirectory

chdir

更改当前工作目录

通过FileSystem.getFileStatus()方法,Hadoop抽象文件系统可以一次获得文件/目录的所有属性,这些属性被保存在类FileStatus中

  1. public class FileStatus implements Writable, Comparable {
  2.  
  3. private Path path; //文件路径
  4. private long length; //文件长度
  5. private boolean isdir; //是否是目录
  6. private short block_replication; //副本数(为HDFS而准的特殊参数)
  7. private long blocksize; //块大小(为HDFS而准的特殊参数)
  8. private long modification_time; //最后修改时间
  9. private long access_time; //最后访问时间
  10. private FsPermission permission; //许可信息
  11. private String owner; //文件所有者
  12. private String group; //用户组
  13. ……
  14. }

FileStatus实现了Writable接口,也就是说,FileStatus可以被序列化后在网络上传输,同时一次性将文件的所有属性读出并返回到客户端,可以减少在分布式系统中进行网络传输的次数

完整的FileStatus类的源代码如下:

  1. /**
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements. See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership. The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License. You may obtain a copy of the License at
  9. *
  10. * http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing, software
  13. * distributed under the License is distributed on an "AS IS" BASIS,
  14. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. * See the License for the specific language governing permissions and
  16. * limitations under the License.
  17. */
  18. package org.apache.hadoop.fs;
  19.  
  20. import java.io.DataInput;
  21. import java.io.DataOutput;
  22. import java.io.IOException;
  23.  
  24. import org.apache.hadoop.fs.permission.FsPermission;
  25. import org.apache.hadoop.io.Text;
  26. import org.apache.hadoop.io.Writable;
  27.  
  28. /** Interface that represents the client side information for a file.
  29. */
  30. public class FileStatus implements Writable, Comparable {
  31.  
  32. private Path path;
  33. private long length;
  34. private boolean isdir;
  35. private short block_replication;
  36. private long blocksize;
  37. private long modification_time;
  38. private long access_time;
  39. private FsPermission permission;
  40. private String owner;
  41. private String group;
  42.  
  43. public FileStatus() { this(0, false, 0, 0, 0, 0, null, null, null, null); }
  44.  
  45. //We should deprecate this soon?
  46. public FileStatus(long length, boolean isdir, int block_replication,
  47. long blocksize, long modification_time, Path path) {
  48.  
  49. this(length, isdir, block_replication, blocksize, modification_time,
  50. 0, null, null, null, path);
  51. }
  52.  
  53. public FileStatus(long length, boolean isdir, int block_replication,
  54. long blocksize, long modification_time, long access_time,
  55. FsPermission permission, String owner, String group,
  56. Path path) {
  57. this.length = length;
  58. this.isdir = isdir;
  59. this.block_replication = (short)block_replication;
  60. this.blocksize = blocksize;
  61. this.modification_time = modification_time;
  62. this.access_time = access_time;
  63. this.permission = (permission == null) ?
  64. FsPermission.getDefault() : permission;
  65. this.owner = (owner == null) ? "" : owner;
  66. this.group = (group == null) ? "" : group;
  67. this.path = path;
  68. }
  69.  
  70. /*
  71. * @return the length of this file, in blocks
  72. */
  73. public long getLen() {
  74. return length;
  75. }
  76.  
  77. /**
  78. * Is this a directory?
  79. * @return true if this is a directory
  80. */
  81. public boolean isDir() {
  82. return isdir;
  83. }
  84.  
  85. /**
  86. * Get the block size of the file.
  87. * @return the number of bytes
  88. */
  89. public long getBlockSize() {
  90. return blocksize;
  91. }
  92.  
  93. /**
  94. * Get the replication factor of a file.
  95. * @return the replication factor of a file.
  96. */
  97. public short getReplication() {
  98. return block_replication;
  99. }
  100.  
  101. /**
  102. * Get the modification time of the file.
  103. * @return the modification time of file in milliseconds since January 1, 1970 UTC.
  104. */
  105. public long getModificationTime() {
  106. return modification_time;
  107. }
  108.  
  109. /**
  110. * Get the access time of the file.
  111. * @return the access time of file in milliseconds since January 1, 1970 UTC.
  112. */
  113. public long getAccessTime() {
  114. return access_time;
  115. }
  116.  
  117. /**
  118. * Get FsPermission associated with the file.
  119. * @return permssion. If a filesystem does not have a notion of permissions
  120. * or if permissions could not be determined, then default
  121. * permissions equivalent of "rwxrwxrwx" is returned.
  122. */
  123. public FsPermission getPermission() {
  124. return permission;
  125. }
  126.  
  127. /**
  128. * Get the owner of the file.
  129. * @return owner of the file. The string could be empty if there is no
  130. * notion of owner of a file in a filesystem or if it could not
  131. * be determined (rare).
  132. */
  133. public String getOwner() {
  134. return owner;
  135. }
  136.  
  137. /**
  138. * Get the group associated with the file.
  139. * @return group for the file. The string could be empty if there is no
  140. * notion of group of a file in a filesystem or if it could not
  141. * be determined (rare).
  142. */
  143. public String getGroup() {
  144. return group;
  145. }
  146.  
  147. public Path getPath() {
  148. return path;
  149. }
  150.  
  151. /* These are provided so that these values could be loaded lazily
  152. * by a filesystem (e.g. local file system).
  153. */
  154.  
  155. /**
  156. * Sets permission.
  157. * @param permission if permission is null, default value is set
  158. */
  159. protected void setPermission(FsPermission permission) {
  160. this.permission = (permission == null) ?
  161. FsPermission.getDefault() : permission;
  162. }
  163.  
  164. /**
  165. * Sets owner.
  166. * @param owner if it is null, default value is set
  167. */
  168. protected void setOwner(String owner) {
  169. this.owner = (owner == null) ? "" : owner;
  170. }
  171.  
  172. /**
  173. * Sets group.
  174. * @param group if it is null, default value is set
  175. */
  176. protected void setGroup(String group) {
  177. this.group = (group == null) ? "" : group;
  178. }
  179.  
  180. //////////////////////////////////////////////////
  181. // Writable
  182. //////////////////////////////////////////////////
  183. public void write(DataOutput out) throws IOException {
  184. Text.writeString(out, getPath().toString());
  185. out.writeLong(length);
  186. out.writeBoolean(isdir);
  187. out.writeShort(block_replication);
  188. out.writeLong(blocksize);
  189. out.writeLong(modification_time);
  190. out.writeLong(access_time);
  191. permission.write(out);
  192. Text.writeString(out, owner);
  193. Text.writeString(out, group);
  194. }
  195.  
  196. public void readFields(DataInput in) throws IOException {
  197. String strPath = Text.readString(in);
  198. this.path = new Path(strPath);
  199. this.length = in.readLong();
  200. this.isdir = in.readBoolean();
  201. this.block_replication = in.readShort();
  202. blocksize = in.readLong();
  203. modification_time = in.readLong();
  204. access_time = in.readLong();
  205. permission.readFields(in);
  206. owner = Text.readString(in);
  207. group = Text.readString(in);
  208. }
  209.  
  210. /**
  211. * Compare this object to another object
  212. *
  213. * @param o the object to be compared.
  214. * @return a negative integer, zero, or a positive integer as this object
  215. * is less than, equal to, or greater than the specified object.
  216. *
  217. * @throws ClassCastException if the specified object's is not of
  218. * type FileStatus
  219. */
  220. public int compareTo(Object o) {
  221. FileStatus other = (FileStatus)o;
  222. return this.getPath().compareTo(other.getPath());
  223. }
  224.  
  225. /** Compare if this object is equal to another object
  226. * @param o the object to be compared.
  227. * @return true if two file status has the same path name; false if not.
  228. */
  229. public boolean equals(Object o) {
  230. if (o == null) {
  231. return false;
  232. }
  233. if (this == o) {
  234. return true;
  235. }
  236. if (!(o instanceof FileStatus)) {
  237. return false;
  238. }
  239. FileStatus other = (FileStatus)o;
  240. return this.getPath().equals(other.getPath());
  241. }
  242.  
  243. /**
  244. * Returns a hash code value for the object, which is defined as
  245. * the hash code of the path name.
  246. *
  247. * @return a hash code value for the path name.
  248. */
  249. public int hashCode() {
  250. return getPath().hashCode();
  251. }
  252. }

FileStatus

出现在FileSystem中的,但在java文件API中找不到对应的方法有:setReplication()、getReplication()、getContentSummary(),其声明如下:

  1. public boolean setReplication(Path src, short replication)
  2. throws IOException {
  3. return true;
  4. }
  5. public short getReplication(Path src) throws IOException {
  6. return getFileStatus(src).getReplication();
  7. }
  8. public ContentSummary getContentSummary(Path f) throws IOException {
  9. FileStatus status = getFileStatus(f);
  10. if (!status.isDir()) {
  11. // f is a file
  12. return new ContentSummary(status.getLen(), 1, 0);
  13. }
  14. // f is a directory
  15. long[] summary = {0, 0, 1};
  16. for(FileStatus s : listStatus(f)) {
  17. ContentSummary c = s.isDir() ? getContentSummary(s.getPath()) :
  18. new ContentSummary(s.getLen(), 1, 0);
  19. summary[0] += c.getLength();
  20. summary[1] += c.getFileCount();
  21. summary[2] += c.getDirectoryCount();
  22. }
  23. return new ContentSummary(summary[0], summary[1], summary[2]);
  24. }

实现一个Hadoop具体文件系统,需要实现的功能有哪些?下面整理org.apache.hadoop.fs.FileSystem中的抽象方法:

  1. //获取文件系统URI
  2. public abstract URI getUri();
  3.  
  4. //为读打开一个文件,并返回一个输入流
  5. public abstract FSDataInputStream open(Path f, int bufferSize) throws IOException;
  6.  
  7. //创建一个文件,并返回一个输出流
  8. public abstract FSDataOutputStream create(Path f,
  9. FsPermission permission,
  10. boolean overwrite,
  11. int bufferSize,
  12. short replication,
  13. long blockSize,
  14. Progressable progress) throws IOException;
  15.  
  16. //在一个已经存在的文件中追加数据
  17. public abstract FSDataOutputStream append(Path f, int bufferSize,
  18. Progressable progress) throws IOException;
  19.  
  20. //修改文件名或目录名
  21. public abstract boolean rename(Path src, Path dst) throws IOException;
  22.  
  23. //删除文件
  24. public abstract boolean delete(Path f) throws IOException;
  25. public abstract boolean delete(Path f, boolean recursive) throws IOException;
  26.  
  27. //如果Path是一个目录,读取一个目录下的所有项目和项目属性
  28. //如果Path是一个文件,获取文件属性
  29. public abstract FileStatus[] listStatus(Path f) throws IOException;
  30.  
  31. //设置当前的工作目录
  32. public abstract void setWorkingDirectory(Path new_dir);
  33.  
  34. //获取当前的工作目录
  35. public abstract Path getWorkingDirectory();
  36.  
  37. //如果Path是一个文件,获取文件属性
  38. public abstract boolean mkdirs(Path f, FsPermission permission
  39. ) throws IOException;
  40.  
  41. //获取文件或目录的属性
  42. public abstract FileStatus getFileStatus(Path f) throws IOException;

实现一个具体的文件系统,至少需要实现上面的这些抽象方法

hadoop完整的FileSystem类的源代码如下:

  1. /**
  2. * Licensed to the Apache Software Foundation (ASF) under one
  3. * or more contributor license agreements. See the NOTICE file
  4. * distributed with this work for additional information
  5. * regarding copyright ownership. The ASF licenses this file
  6. * to you under the Apache License, Version 2.0 (the
  7. * "License"); you may not use this file except in compliance
  8. * with the License. You may obtain a copy of the License at
  9. *
  10. * http://www.apache.org/licenses/LICENSE-2.0
  11. *
  12. * Unless required by applicable law or agreed to in writing, software
  13. * distributed under the License is distributed on an "AS IS" BASIS,
  14. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15. * See the License for the specific language governing permissions and
  16. * limitations under the License.
  17. */
  18. package org.apache.hadoop.fs;
  19.  
  20. import java.io.Closeable;
  21. import java.io.FileNotFoundException;
  22. import java.io.IOException;
  23. import java.net.URI;
  24. import java.util.ArrayList;
  25. import java.util.Arrays;
  26. import java.util.Collection;
  27. import java.util.HashMap;
  28. import java.util.IdentityHashMap;
  29. import java.util.Iterator;
  30. import java.util.List;
  31. import java.util.Map;
  32. import java.util.Set;
  33. import java.util.TreeSet;
  34. import java.util.concurrent.atomic.AtomicLong;
  35. import java.util.regex.Pattern;
  36.  
  37. import javax.security.auth.login.LoginException;
  38.  
  39. import org.apache.commons.logging.*;
  40.  
  41. import org.apache.hadoop.conf.*;
  42. import org.apache.hadoop.net.NetUtils;
  43. import org.apache.hadoop.util.*;
  44. import org.apache.hadoop.fs.permission.FsPermission;
  45. import org.apache.hadoop.io.MultipleIOException;
  46. import org.apache.hadoop.security.UserGroupInformation;
  47.  
  48. /****************************************************************
  49. * An abstract base class for a fairly generic filesystem. It
  50. * may be implemented as a distributed filesystem, or as a "local"
  51. * one that reflects the locally-connected disk. The local version
  52. * exists for small Hadoop instances and for testing.
  53. *
  54. * <p>
  55. *
  56. * All user code that may potentially use the Hadoop Distributed
  57. * File System should be written to use a FileSystem object. The
  58. * Hadoop DFS is a multi-machine system that appears as a single
  59. * disk. It's useful because of its fault tolerance and potentially
  60. * very large capacity.
  61. *
  62. * <p>
  63. * The local implementation is {@link LocalFileSystem} and distributed
  64. * implementation is DistributedFileSystem.
  65. *****************************************************************/
  66. public abstract class FileSystem extends Configured implements Closeable {
  67. private static final String FS_DEFAULT_NAME_KEY = "fs.default.name";
  68.  
  69. public static final Log LOG = LogFactory.getLog(FileSystem.class);
  70.  
  71. /** FileSystem cache */
  72. private static final Cache CACHE = new Cache();
  73.  
  74. /** The key this instance is stored under in the cache. */
  75. private Cache.Key key;
  76.  
  77. /** Recording statistics per a FileSystem class */
  78. private static final Map<Class<? extends FileSystem>, Statistics>
  79. statisticsTable =
  80. new IdentityHashMap<Class<? extends FileSystem>, Statistics>();
  81.  
  82. /**
  83. * The statistics for this file system.
  84. */
  85. protected Statistics statistics;
  86.  
  87. /**
  88. * A cache of files that should be deleted when filsystem is closed
  89. * or the JVM is exited.
  90. */
  91. private Set<Path> deleteOnExit = new TreeSet<Path>();
  92.  
  93. /** Returns the configured filesystem implementation.*/
  94. public static FileSystem get(Configuration conf) throws IOException {
  95. return get(getDefaultUri(conf), conf);
  96. }
  97.  
  98. /** Get the default filesystem URI from a configuration.
  99. * @param conf the configuration to access
  100. * @return the uri of the default filesystem
  101. */
  102. public static URI getDefaultUri(Configuration conf) {
  103. return URI.create(fixName(conf.get(FS_DEFAULT_NAME_KEY, "file:///")));
  104. }
  105.  
  106. /** Set the default filesystem URI in a configuration.
  107. * @param conf the configuration to alter
  108. * @param uri the new default filesystem uri
  109. */
  110. public static void setDefaultUri(Configuration conf, URI uri) {
  111. conf.set(FS_DEFAULT_NAME_KEY, uri.toString());
  112. }
  113.  
  114. /** Set the default filesystem URI in a configuration.
  115. * @param conf the configuration to alter
  116. * @param uri the new default filesystem uri
  117. */
  118. public static void setDefaultUri(Configuration conf, String uri) {
  119. setDefaultUri(conf, URI.create(fixName(uri)));
  120. }
  121.  
  122. /** Called after a new FileSystem instance is constructed.
  123. * @param name a uri whose authority section names the host, port, etc.
  124. * for this FileSystem
  125. * @param conf the configuration
  126. */
  127. public void initialize(URI name, Configuration conf) throws IOException {
  128. statistics = getStatistics(name.getScheme(), getClass());
  129. }
  130.  
  131. /** Returns a URI whose scheme and authority identify this FileSystem.*/
  132. public abstract URI getUri();
  133.  
  134. /** @deprecated call #getUri() instead.*/
  135. public String getName() { return getUri().toString(); }
  136.  
  137. /** @deprecated call #get(URI,Configuration) instead. */
  138. public static FileSystem getNamed(String name, Configuration conf)
  139. throws IOException {
  140. return get(URI.create(fixName(name)), conf);
  141. }
  142.  
  143. /** Update old-format filesystem names, for back-compatibility. This should
  144. * eventually be replaced with a checkName() method that throws an exception
  145. * for old-format names. */
  146. private static String fixName(String name) {
  147. // convert old-format name to new-format name
  148. if (name.equals("local")) { // "local" is now "file:///".
  149. LOG.warn("\"local\" is a deprecated filesystem name."
  150. +" Use \"file:///\" instead.");
  151. name = "file:///";
  152. } else if (name.indexOf('/')==-1) { // unqualified is "hdfs://"
  153. LOG.warn("\""+name+"\" is a deprecated filesystem name."
  154. +" Use \"hdfs://"+name+"/\" instead.");
  155. name = "hdfs://"+name;
  156. }
  157. return name;
  158. }
  159.  
  160. /**
  161. * Get the local file syste
  162. * @param conf the configuration to configure the file system with
  163. * @return a LocalFileSystem
  164. */
  165. public static LocalFileSystem getLocal(Configuration conf)
  166. throws IOException {
  167. return (LocalFileSystem)get(LocalFileSystem.NAME, conf);
  168. }
  169.  
  170. /** Returns the FileSystem for this URI's scheme and authority. The scheme
  171. * of the URI determines a configuration property name,
  172. * <tt>fs.<i>scheme</i>.class</tt> whose value names the FileSystem class.
  173. * The entire URI is passed to the FileSystem instance's initialize method.
  174. */
  175. public static FileSystem get(URI uri, Configuration conf) throws IOException {
  176. String scheme = uri.getScheme();
  177. String authority = uri.getAuthority();
  178.  
  179. if (scheme == null) { // no scheme: use default FS
  180. return get(conf);
  181. }
  182.  
  183. if (authority == null) { // no authority
  184. URI defaultUri = getDefaultUri(conf);
  185. if (scheme.equals(defaultUri.getScheme()) // if scheme matches default
  186. && defaultUri.getAuthority() != null) { // & default has authority
  187. return get(defaultUri, conf); // return default
  188. }
  189. }
  190.  
  191. String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
  192. if (conf.getBoolean(disableCacheName, false)) {
  193. return createFileSystem(uri, conf);
  194. }
  195.  
  196. return CACHE.get(uri, conf);
  197. }
  198.  
  199. private static class ClientFinalizer extends Thread {
  200. public synchronized void run() {
  201. try {
  202. FileSystem.closeAll();
  203. } catch (IOException e) {
  204. LOG.info("FileSystem.closeAll() threw an exception:\n" + e);
  205. }
  206. }
  207. }
  208. private static final ClientFinalizer clientFinalizer = new ClientFinalizer();
  209.  
  210. /**
  211. * Close all cached filesystems. Be sure those filesystems are not
  212. * used anymore.
  213. *
  214. * @throws IOException
  215. */
  216. public static void closeAll() throws IOException {
  217. CACHE.closeAll();
  218. }
  219.  
  220. /** Make sure that a path specifies a FileSystem. */
  221. public Path makeQualified(Path path) {
  222. checkPath(path);
  223. return path.makeQualified(this);
  224. }
  225.  
  226. /** create a file with the provided permission
  227. * The permission of the file is set to be the provided permission as in
  228. * setPermission, not permission&~umask
  229. *
  230. * It is implemented using two RPCs. It is understood that it is inefficient,
  231. * but the implementation is thread-safe. The other option is to change the
  232. * value of umask in configuration to be 0, but it is not thread-safe.
  233. *
  234. * @param fs file system handle
  235. * @param file the name of the file to be created
  236. * @param permission the permission of the file
  237. * @return an output stream
  238. * @throws IOException
  239. */
  240. public static FSDataOutputStream create(FileSystem fs,
  241. Path file, FsPermission permission) throws IOException {
  242. // create the file with default permission
  243. FSDataOutputStream out = fs.create(file);
  244. // set its permission to the supplied one
  245. fs.setPermission(file, permission);
  246. return out;
  247. }
  248.  
  249. /** create a directory with the provided permission
  250. * The permission of the directory is set to be the provided permission as in
  251. * setPermission, not permission&~umask
  252. *
  253. * @see #create(FileSystem, Path, FsPermission)
  254. *
  255. * @param fs file system handle
  256. * @param dir the name of the directory to be created
  257. * @param permission the permission of the directory
  258. * @return true if the directory creation succeeds; false otherwise
  259. * @throws IOException
  260. */
  261. public static boolean mkdirs(FileSystem fs, Path dir, FsPermission permission)
  262. throws IOException {
  263. // create the directory using the default permission
  264. boolean result = fs.mkdirs(dir);
  265. // set its permission to be the supplied one
  266. fs.setPermission(dir, permission);
  267. return result;
  268. }
  269.  
  270. ///////////////////////////////////////////////////////////////
  271. // FileSystem
  272. ///////////////////////////////////////////////////////////////
  273.  
  274. protected FileSystem() {
  275. super(null);
  276. }
  277.  
  278. /** Check that a Path belongs to this FileSystem. */
  279. protected void checkPath(Path path) {
  280. URI uri = path.toUri();
  281. if (uri.getScheme() == null) // fs is relative
  282. return;
  283. String thisScheme = this.getUri().getScheme();
  284. String thatScheme = uri.getScheme();
  285. String thisAuthority = this.getUri().getAuthority();
  286. String thatAuthority = uri.getAuthority();
  287. //authority and scheme are not case sensitive
  288. if (thisScheme.equalsIgnoreCase(thatScheme)) {// schemes match
  289. if (thisAuthority == thatAuthority || // & authorities match
  290. (thisAuthority != null &&
  291. thisAuthority.equalsIgnoreCase(thatAuthority)))
  292. return;
  293.  
  294. if (thatAuthority == null && // path's authority is null
  295. thisAuthority != null) { // fs has an authority
  296. URI defaultUri = getDefaultUri(getConf()); // & is the conf default
  297. if (thisScheme.equalsIgnoreCase(defaultUri.getScheme()) &&
  298. thisAuthority.equalsIgnoreCase(defaultUri.getAuthority()))
  299. return;
  300. try { // or the default fs's uri
  301. defaultUri = get(getConf()).getUri();
  302. } catch (IOException e) {
  303. throw new RuntimeException(e);
  304. }
  305. if (thisScheme.equalsIgnoreCase(defaultUri.getScheme()) &&
  306. thisAuthority.equalsIgnoreCase(defaultUri.getAuthority()))
  307. return;
  308. }
  309. }
  310. throw new IllegalArgumentException("Wrong FS: "+path+
  311. ", expected: "+this.getUri());
  312. }
  313.  
  314. /**
  315. * Return an array containing hostnames, offset and size of
  316. * portions of the given file. For a nonexistent
  317. * file or regions, null will be returned.
  318. *
  319. * This call is most helpful with DFS, where it returns
  320. * hostnames of machines that contain the given file.
  321. *
  322. * The FileSystem will simply return an elt containing 'localhost'.
  323. */
  324. public BlockLocation[] getFileBlockLocations(FileStatus file,
  325. long start, long len) throws IOException {
  326. if (file == null) {
  327. return null;
  328. }
  329.  
  330. if ( (start<0) || (len < 0) ) {
  331. throw new IllegalArgumentException("Invalid start or len parameter");
  332. }
  333.  
  334. if (file.getLen() < start) {
  335. return new BlockLocation[0];
  336.  
  337. }
  338. String[] name = { "localhost:50010" };
  339. String[] host = { "localhost" };
  340. return new BlockLocation[] { new BlockLocation(name, host, 0, file.getLen()) };
  341. }
  342.  
  343. /**
  344. * Opens an FSDataInputStream at the indicated Path.
  345. * @param f the file name to open
  346. * @param bufferSize the size of the buffer to be used.
  347. */
  348. public abstract FSDataInputStream open(Path f, int bufferSize)
  349. throws IOException;
  350.  
  351. /**
  352. * Opens an FSDataInputStream at the indicated Path.
  353. * @param f the file to open
  354. */
  355. public FSDataInputStream open(Path f) throws IOException {
  356. return open(f, getConf().getInt("io.file.buffer.size", 4096));
  357. }
  358.  
  359. /**
  360. * Opens an FSDataOutputStream at the indicated Path.
  361. * Files are overwritten by default.
  362. */
  363. public FSDataOutputStream create(Path f) throws IOException {
  364. return create(f, true);
  365. }
  366.  
  367. /**
  368. * Opens an FSDataOutputStream at the indicated Path.
  369. */
  370. public FSDataOutputStream create(Path f, boolean overwrite)
  371. throws IOException {
  372. return create(f, overwrite,
  373. getConf().getInt("io.file.buffer.size", 4096),
  374. getDefaultReplication(),
  375. getDefaultBlockSize());
  376. }
  377.  
  378. /**
  379. * Create an FSDataOutputStream at the indicated Path with write-progress
  380. * reporting.
  381. * Files are overwritten by default.
  382. */
  383. public FSDataOutputStream create(Path f, Progressable progress) throws IOException {
  384. return create(f, true,
  385. getConf().getInt("io.file.buffer.size", 4096),
  386. getDefaultReplication(),
  387. getDefaultBlockSize(), progress);
  388. }
  389.  
  390. /**
  391. * Opens an FSDataOutputStream at the indicated Path.
  392. * Files are overwritten by default.
  393. */
  394. public FSDataOutputStream create(Path f, short replication)
  395. throws IOException {
  396. return create(f, true,
  397. getConf().getInt("io.file.buffer.size", 4096),
  398. replication,
  399. getDefaultBlockSize());
  400. }
  401.  
  402. /**
  403. * Opens an FSDataOutputStream at the indicated Path with write-progress
  404. * reporting.
  405. * Files are overwritten by default.
  406. */
  407. public FSDataOutputStream create(Path f, short replication, Progressable progress)
  408. throws IOException {
  409. return create(f, true,
  410. getConf().getInt("io.file.buffer.size", 4096),
  411. replication,
  412. getDefaultBlockSize(), progress);
  413. }
  414.  
  415. /**
  416. * Opens an FSDataOutputStream at the indicated Path.
  417. * @param f the file name to open
  418. * @param overwrite if a file with this name already exists, then if true,
  419. * the file will be overwritten, and if false an error will be thrown.
  420. * @param bufferSize the size of the buffer to be used.
  421. */
  422. public FSDataOutputStream create(Path f,
  423. boolean overwrite,
  424. int bufferSize
  425. ) throws IOException {
  426. return create(f, overwrite, bufferSize,
  427. getDefaultReplication(),
  428. getDefaultBlockSize());
  429. }
  430.  
  431. /**
  432. * Opens an FSDataOutputStream at the indicated Path with write-progress
  433. * reporting.
  434. * @param f the file name to open
  435. * @param overwrite if a file with this name already exists, then if true,
  436. * the file will be overwritten, and if false an error will be thrown.
  437. * @param bufferSize the size of the buffer to be used.
  438. */
  439. public FSDataOutputStream create(Path f,
  440. boolean overwrite,
  441. int bufferSize,
  442. Progressable progress
  443. ) throws IOException {
  444. return create(f, overwrite, bufferSize,
  445. getDefaultReplication(),
  446. getDefaultBlockSize(), progress);
  447. }
  448.  
  449. /**
  450. * Opens an FSDataOutputStream at the indicated Path.
  451. * @param f the file name to open
  452. * @param overwrite if a file with this name already exists, then if true,
  453. * the file will be overwritten, and if false an error will be thrown.
  454. * @param bufferSize the size of the buffer to be used.
  455. * @param replication required block replication for the file.
  456. */
  457. public FSDataOutputStream create(Path f,
  458. boolean overwrite,
  459. int bufferSize,
  460. short replication,
  461. long blockSize
  462. ) throws IOException {
  463. return create(f, overwrite, bufferSize, replication, blockSize, null);
  464. }
  465.  
  466. /**
  467. * Opens an FSDataOutputStream at the indicated Path with write-progress
  468. * reporting.
  469. * @param f the file name to open
  470. * @param overwrite if a file with this name already exists, then if true,
  471. * the file will be overwritten, and if false an error will be thrown.
  472. * @param bufferSize the size of the buffer to be used.
  473. * @param replication required block replication for the file.
  474. */
  475. public FSDataOutputStream create(Path f,
  476. boolean overwrite,
  477. int bufferSize,
  478. short replication,
  479. long blockSize,
  480. Progressable progress
  481. ) throws IOException {
  482. return this.create(f, FsPermission.getDefault(),
  483. overwrite, bufferSize, replication, blockSize, progress);
  484. }
  485.  
  486. /**
  487. * Opens an FSDataOutputStream at the indicated Path with write-progress
  488. * reporting.
  489. * @param f the file name to open
  490. * @param permission
  491. * @param overwrite if a file with this name already exists, then if true,
  492. * the file will be overwritten, and if false an error will be thrown.
  493. * @param bufferSize the size of the buffer to be used.
  494. * @param replication required block replication for the file.
  495. * @param blockSize
  496. * @param progress
  497. * @throws IOException
  498. * @see #setPermission(Path, FsPermission)
  499. */
  500. public abstract FSDataOutputStream create(Path f,
  501. FsPermission permission,
  502. boolean overwrite,
  503. int bufferSize,
  504. short replication,
  505. long blockSize,
  506. Progressable progress) throws IOException;
  507.  
  508. /**
  509. * Creates the given Path as a brand-new zero-length file. If
  510. * create fails, or if it already existed, return false.
  511. */
  512. public boolean createNewFile(Path f) throws IOException {
  513. if (exists(f)) {
  514. return false;
  515. } else {
  516. create(f, false, getConf().getInt("io.file.buffer.size", 4096)).close();
  517. return true;
  518. }
  519. }
  520.  
  521. /**
  522. * Append to an existing file (optional operation).
  523. * Same as append(f, getConf().getInt("io.file.buffer.size", 4096), null)
  524. * @param f the existing file to be appended.
  525. * @throws IOException
  526. */
  527. public FSDataOutputStream append(Path f) throws IOException {
  528. return append(f, getConf().getInt("io.file.buffer.size", 4096), null);
  529. }
  530. /**
  531. * Append to an existing file (optional operation).
  532. * Same as append(f, bufferSize, null).
  533. * @param f the existing file to be appended.
  534. * @param bufferSize the size of the buffer to be used.
  535. * @throws IOException
  536. */
  537. public FSDataOutputStream append(Path f, int bufferSize) throws IOException {
  538. return append(f, bufferSize, null);
  539. }
  540.  
  541. /**
  542. * Append to an existing file (optional operation).
  543. * @param f the existing file to be appended.
  544. * @param bufferSize the size of the buffer to be used.
  545. * @param progress for reporting progress if it is not null.
  546. * @throws IOException
  547. */
  548. public abstract FSDataOutputStream append(Path f, int bufferSize,
  549. Progressable progress) throws IOException;
  550.  
  551. /**
  552. * Get replication.
  553. *
  554. * @deprecated Use getFileStatus() instead
  555. * @param src file name
  556. * @return file replication
  557. * @throws IOException
  558. */
  559. @Deprecated
  560. public short getReplication(Path src) throws IOException {
  561. return getFileStatus(src).getReplication();
  562. }
  563.  
  564. /**
  565. * Set replication for an existing file.
  566. *
  567. * @param src file name
  568. * @param replication new replication
  569. * @throws IOException
  570. * @return true if successful;
  571. * false if file does not exist or is a directory
  572. */
  573. public boolean setReplication(Path src, short replication)
  574. throws IOException {
  575. return true;
  576. }
  577.  
  578. /**
  579. * Renames Path src to Path dst. Can take place on local fs
  580. * or remote DFS.
  581. */
  582. public abstract boolean rename(Path src, Path dst) throws IOException;
  583.  
  584. /** Delete a file. */
  585. /** @deprecated Use delete(Path, boolean) instead */ @Deprecated
  586. public abstract boolean delete(Path f) throws IOException;
  587.  
  588. /** Delete a file.
  589. *
  590. * @param f the path to delete.
  591. * @param recursive if path is a directory and set to
  592. * true, the directory is deleted else throws an exception. In
  593. * case of a file the recursive can be set to either true or false.
  594. * @return true if delete is successful else false.
  595. * @throws IOException
  596. */
  597. public abstract boolean delete(Path f, boolean recursive) throws IOException;
  598.  
  599. /**
  600. * Mark a path to be deleted when FileSystem is closed.
  601. * When the JVM shuts down,
  602. * all FileSystem objects will be closed automatically.
  603. * Then,
  604. * the marked path will be deleted as a result of closing the FileSystem.
  605. *
  606. * The path has to exist in the file system.
  607. *
  608. * @param f the path to delete.
  609. * @return true if deleteOnExit is successful, otherwise false.
  610. * @throws IOException
  611. */
  612. public boolean deleteOnExit(Path f) throws IOException {
  613. if (!exists(f)) {
  614. return false;
  615. }
  616. synchronized (deleteOnExit) {
  617. deleteOnExit.add(f);
  618. }
  619. return true;
  620. }
  621.  
  622. /**
  623. * Delete all files that were marked as delete-on-exit. This recursively
  624. * deletes all files in the specified paths.
  625. */
  626. protected void processDeleteOnExit() {
  627. synchronized (deleteOnExit) {
  628. for (Iterator<Path> iter = deleteOnExit.iterator(); iter.hasNext();) {
  629. Path path = iter.next();
  630. try {
  631. delete(path, true);
  632. }
  633. catch (IOException e) {
  634. LOG.info("Ignoring failure to deleteOnExit for path " + path);
  635. }
  636. iter.remove();
  637. }
  638. }
  639. }
  640.  
  641. /** Check if exists.
  642. * @param f source file
  643. */
  644. public boolean exists(Path f) throws IOException {
  645. try {
  646. return getFileStatus(f) != null;
  647. } catch (FileNotFoundException e) {
  648. return false;
  649. }
  650. }
  651.  
  652. /** True iff the named path is a directory. */
  653. /** @deprecated Use getFileStatus() instead */ @Deprecated
  654. public boolean isDirectory(Path f) throws IOException {
  655. try {
  656. return getFileStatus(f).isDir();
  657. } catch (FileNotFoundException e) {
  658. return false; // f does not exist
  659. }
  660. }
  661.  
  662. /** True iff the named path is a regular file. */
  663. public boolean isFile(Path f) throws IOException {
  664. try {
  665. return !getFileStatus(f).isDir();
  666. } catch (FileNotFoundException e) {
  667. return false; // f does not exist
  668. }
  669. }
  670.  
  671. /** The number of bytes in a file. */
  672. /** @deprecated Use getFileStatus() instead */ @Deprecated
  673. public long getLength(Path f) throws IOException {
  674. return getFileStatus(f).getLen();
  675. }
  676.  
  677. /** Return the {@link ContentSummary} of a given {@link Path}. */
  678. public ContentSummary getContentSummary(Path f) throws IOException {
  679. FileStatus status = getFileStatus(f);
  680. if (!status.isDir()) {
  681. // f is a file
  682. return new ContentSummary(status.getLen(), 1, 0);
  683. }
  684. // f is a directory
  685. long[] summary = {0, 0, 1};
  686. for(FileStatus s : listStatus(f)) {
  687. ContentSummary c = s.isDir() ? getContentSummary(s.getPath()) :
  688. new ContentSummary(s.getLen(), 1, 0);
  689. summary[0] += c.getLength();
  690. summary[1] += c.getFileCount();
  691. summary[2] += c.getDirectoryCount();
  692. }
  693. return new ContentSummary(summary[0], summary[1], summary[2]);
  694. }
  695.  
  696. final private static PathFilter DEFAULT_FILTER = new PathFilter() {
  697. public boolean accept(Path file) {
  698. return true;
  699. }
  700. };
  701.  
  702. /**
  703. * List the statuses of the files/directories in the given path if the path is
  704. * a directory.
  705. *
  706. * @param f
  707. * given path
  708. * @return the statuses of the files/directories in the given patch
  709. * @throws IOException
  710. */
  711. public abstract FileStatus[] listStatus(Path f) throws IOException;
  712.  
  713. /*
  714. * Filter files/directories in the given path using the user-supplied path
  715. * filter. Results are added to the given array <code>results</code>.
  716. */
  717. private void listStatus(ArrayList<FileStatus> results, Path f,
  718. PathFilter filter) throws IOException {
  719. FileStatus listing[] = listStatus(f);
  720. if (listing != null) {
  721. for (int i = 0; i < listing.length; i++) {
  722. if (filter.accept(listing[i].getPath())) {
  723. results.add(listing[i]);
  724. }
  725. }
  726. }
  727. }
  728.  
  729. /**
  730. * Filter files/directories in the given path using the user-supplied path
  731. * filter.
  732. *
  733. * @param f
  734. * a path name
  735. * @param filter
  736. * the user-supplied path filter
  737. * @return an array of FileStatus objects for the files under the given path
  738. * after applying the filter
  739. * @throws IOException
  740. * if encounter any problem while fetching the status
  741. */
  742. public FileStatus[] listStatus(Path f, PathFilter filter) throws IOException {
  743. ArrayList<FileStatus> results = new ArrayList<FileStatus>();
  744. listStatus(results, f, filter);
  745. return results.toArray(new FileStatus[results.size()]);
  746. }
  747.  
  748. /**
  749. * Filter files/directories in the given list of paths using default
  750. * path filter.
  751. *
  752. * @param files
  753. * a list of paths
  754. * @return a list of statuses for the files under the given paths after
  755. * applying the filter default Path filter
  756. * @exception IOException
  757. */
  758. public FileStatus[] listStatus(Path[] files)
  759. throws IOException {
  760. return listStatus(files, DEFAULT_FILTER);
  761. }
  762.  
  763. /**
  764. * Filter files/directories in the given list of paths using user-supplied
  765. * path filter.
  766. *
  767. * @param files
  768. * a list of paths
  769. * @param filter
  770. * the user-supplied path filter
  771. * @return a list of statuses for the files under the given paths after
  772. * applying the filter
  773. * @exception IOException
  774. */
  775. public FileStatus[] listStatus(Path[] files, PathFilter filter)
  776. throws IOException {
  777. ArrayList<FileStatus> results = new ArrayList<FileStatus>();
  778. for (int i = 0; i < files.length; i++) {
  779. listStatus(results, files[i], filter);
  780. }
  781. return results.toArray(new FileStatus[results.size()]);
  782. }
  783.  
  784. /**
  785. * <p>Return all the files that match filePattern and are not checksum
  786. * files. Results are sorted by their names.
  787. *
  788. * <p>
  789. * A filename pattern is composed of <i>regular</i> characters and
  790. * <i>special pattern matching</i> characters, which are:
  791. *
  792. * <dl>
  793. * <dd>
  794. * <dl>
  795. * <p>
  796. * <dt> <tt> ? </tt>
  797. * <dd> Matches any single character.
  798. *
  799. * <p>
  800. * <dt> <tt> * </tt>
  801. * <dd> Matches zero or more characters.
  802. *
  803. * <p>
  804. * <dt> <tt> [<i>abc</i>] </tt>
  805. * <dd> Matches a single character from character set
  806. * <tt>{<i>a,b,c</i>}</tt>.
  807. *
  808. * <p>
  809. * <dt> <tt> [<i>a</i>-<i>b</i>] </tt>
  810. * <dd> Matches a single character from the character range
  811. * <tt>{<i>a...b</i>}</tt>. Note that character <tt><i>a</i></tt> must be
  812. * lexicographically less than or equal to character <tt><i>b</i></tt>.
  813. *
  814. * <p>
  815. * <dt> <tt> [^<i>a</i>] </tt>
  816. * <dd> Matches a single character that is not from character set or range
  817. * <tt>{<i>a</i>}</tt>. Note that the <tt>^</tt> character must occur
  818. * immediately to the right of the opening bracket.
  819. *
  820. * <p>
  821. * <dt> <tt> \<i>c</i> </tt>
  822. * <dd> Removes (escapes) any special meaning of character <i>c</i>.
  823. *
  824. * <p>
  825. * <dt> <tt> {ab,cd} </tt>
  826. * <dd> Matches a string from the string set <tt>{<i>ab, cd</i>} </tt>
  827. *
  828. * <p>
  829. * <dt> <tt> {ab,c{de,fh}} </tt>
  830. * <dd> Matches a string from the string set <tt>{<i>ab, cde, cfh</i>}</tt>
  831. *
  832. * </dl>
  833. * </dd>
  834. * </dl>
  835. *
  836. * @param pathPattern a regular expression specifying a pth pattern
  837.  
  838. * @return an array of paths that match the path pattern
  839. * @throws IOException
  840. */
  841. public FileStatus[] globStatus(Path pathPattern) throws IOException {
  842. return globStatus(pathPattern, DEFAULT_FILTER);
  843. }
  844.  
  845. /**
  846. * Return an array of FileStatus objects whose path names match pathPattern
  847. * and is accepted by the user-supplied path filter. Results are sorted by
  848. * their path names.
  849. * Return null if pathPattern has no glob and the path does not exist.
  850. * Return an empty array if pathPattern has a glob and no path matches it.
  851. *
  852. * @param pathPattern
  853. * a regular expression specifying the path pattern
  854. * @param filter
  855. * a user-supplied path filter
  856. * @return an array of FileStatus objects
  857. * @throws IOException if any I/O error occurs when fetching file status
  858. */
  859. public FileStatus[] globStatus(Path pathPattern, PathFilter filter)
  860. throws IOException {
  861. String filename = pathPattern.toUri().getPath();
  862. List<String> filePatterns = GlobExpander.expand(filename);
  863. if (filePatterns.size() == 1) {
  864. return globStatusInternal(pathPattern, filter);
  865. } else {
  866. List<FileStatus> results = new ArrayList<FileStatus>();
  867. for (String filePattern : filePatterns) {
  868. FileStatus[] files = globStatusInternal(new Path(filePattern), filter);
  869. for (FileStatus file : files) {
  870. results.add(file);
  871. }
  872. }
  873. return results.toArray(new FileStatus[results.size()]);
  874. }
  875. }
  876.  
  877. private FileStatus[] globStatusInternal(Path pathPattern, PathFilter filter)
  878. throws IOException {
  879. Path[] parents = new Path[1];
  880. int level = 0;
  881. String filename = pathPattern.toUri().getPath();
  882.  
  883. // path has only zero component
  884. if ("".equals(filename) || Path.SEPARATOR.equals(filename)) {
  885. return getFileStatus(new Path[]{pathPattern});
  886. }
  887.  
  888. // path has at least one component
  889. String[] components = filename.split(Path.SEPARATOR);
  890. // get the first component
  891. if (pathPattern.isAbsolute()) {
  892. parents[0] = new Path(Path.SEPARATOR);
  893. level = 1;
  894. } else {
  895. parents[0] = new Path(Path.CUR_DIR);
  896. }
  897.  
  898. // glob the paths that match the parent path, i.e., [0, components.length-1]
  899. boolean[] hasGlob = new boolean[]{false};
  900. Path[] parentPaths = globPathsLevel(parents, components, level, hasGlob);
  901. FileStatus[] results;
  902. if (parentPaths == null || parentPaths.length == 0) {
  903. results = null;
  904. } else {
  905. // Now work on the last component of the path
  906. GlobFilter fp = new GlobFilter(components[components.length - 1], filter);
  907. if (fp.hasPattern()) { // last component has a pattern
  908. // list parent directories and then glob the results
  909. results = listStatus(parentPaths, fp);
  910. hasGlob[0] = true;
  911. } else { // last component does not have a pattern
  912. // get all the path names
  913. ArrayList<Path> filteredPaths = new ArrayList<Path>(parentPaths.length);
  914. for (int i = 0; i < parentPaths.length; i++) {
  915. parentPaths[i] = new Path(parentPaths[i],
  916. components[components.length - 1]);
  917. if (fp.accept(parentPaths[i])) {
  918. filteredPaths.add(parentPaths[i]);
  919. }
  920. }
  921. // get all their statuses
  922. results = getFileStatus(
  923. filteredPaths.toArray(new Path[filteredPaths.size()]));
  924. }
  925. }
  926.  
  927. // Decide if the pathPattern contains a glob or not
  928. if (results == null) {
  929. if (hasGlob[0]) {
  930. results = new FileStatus[0];
  931. }
  932. } else {
  933. if (results.length == 0 ) {
  934. if (!hasGlob[0]) {
  935. results = null;
  936. }
  937. } else {
  938. Arrays.sort(results);
  939. }
  940. }
  941. return results;
  942. }
  943.  
  944. /*
  945. * For a path of N components, return a list of paths that match the
  946. * components [<code>level</code>, <code>N-1</code>].
  947. */
  948. private Path[] globPathsLevel(Path[] parents, String[] filePattern,
  949. int level, boolean[] hasGlob) throws IOException {
  950. if (level == filePattern.length - 1)
  951. return parents;
  952. if (parents == null || parents.length == 0) {
  953. return null;
  954. }
  955. GlobFilter fp = new GlobFilter(filePattern[level]);
  956. if (fp.hasPattern()) {
  957. parents = FileUtil.stat2Paths(listStatus(parents, fp));
  958. hasGlob[0] = true;
  959. } else {
  960. for (int i = 0; i < parents.length; i++) {
  961. parents[i] = new Path(parents[i], filePattern[level]);
  962. }
  963. }
  964. return globPathsLevel(parents, filePattern, level + 1, hasGlob);
  965. }
  966.  
  967. /* A class that could decide if a string matches the glob or not */
  968. private static class GlobFilter implements PathFilter {
  969. private PathFilter userFilter = DEFAULT_FILTER;
  970. private Pattern regex;
  971. private boolean hasPattern = false;
  972.  
  973. /** Default pattern character: Escape any special meaning. */
  974. private static final char PAT_ESCAPE = '\\';
  975. /** Default pattern character: Any single character. */
  976. private static final char PAT_ANY = '.';
  977. /** Default pattern character: Character set close. */
  978. private static final char PAT_SET_CLOSE = ']';
  979.  
  980. GlobFilter() {
  981. }
  982.  
  983. GlobFilter(String filePattern) throws IOException {
  984. setRegex(filePattern);
  985. }
  986.  
  987. GlobFilter(String filePattern, PathFilter filter) throws IOException {
  988. userFilter = filter;
  989. setRegex(filePattern);
  990. }
  991.  
  992. private boolean isJavaRegexSpecialChar(char pChar) {
  993. return pChar == '.' || pChar == '$' || pChar == '(' || pChar == ')' ||
  994. pChar == '|' || pChar == '+';
  995. }
  996. void setRegex(String filePattern) throws IOException {
  997. int len;
  998. int setOpen;
  999. int curlyOpen;
  1000. boolean setRange;
  1001.  
  1002. StringBuilder fileRegex = new StringBuilder();
  1003.  
  1004. // Validate the pattern
  1005. len = filePattern.length();
  1006. if (len == 0)
  1007. return;
  1008.  
  1009. setOpen = 0;
  1010. setRange = false;
  1011. curlyOpen = 0;
  1012.  
  1013. for (int i = 0; i < len; i++) {
  1014. char pCh;
  1015.  
  1016. // Examine a single pattern character
  1017. pCh = filePattern.charAt(i);
  1018. if (pCh == PAT_ESCAPE) {
  1019. fileRegex.append(pCh);
  1020. i++;
  1021. if (i >= len)
  1022. error("An escaped character does not present", filePattern, i);
  1023. pCh = filePattern.charAt(i);
  1024. } else if (isJavaRegexSpecialChar(pCh)) {
  1025. fileRegex.append(PAT_ESCAPE);
  1026. } else if (pCh == '*') {
  1027. fileRegex.append(PAT_ANY);
  1028. hasPattern = true;
  1029. } else if (pCh == '?') {
  1030. pCh = PAT_ANY;
  1031. hasPattern = true;
  1032. } else if (pCh == '{') {
  1033. fileRegex.append('(');
  1034. pCh = '(';
  1035. curlyOpen++;
  1036. hasPattern = true;
  1037. } else if (pCh == ',' && curlyOpen > 0) {
  1038. fileRegex.append(")|");
  1039. pCh = '(';
  1040. } else if (pCh == '}' && curlyOpen > 0) {
  1041. // End of a group
  1042. curlyOpen--;
  1043. fileRegex.append(")");
  1044. pCh = ')';
  1045. } else if (pCh == '[' && setOpen == 0) {
  1046. setOpen++;
  1047. hasPattern = true;
  1048. } else if (pCh == '^' && setOpen > 0) {
  1049. } else if (pCh == '-' && setOpen > 0) {
  1050. // Character set range
  1051. setRange = true;
  1052. } else if (pCh == PAT_SET_CLOSE && setRange) {
  1053. // Incomplete character set range
  1054. error("Incomplete character set range", filePattern, i);
  1055. } else if (pCh == PAT_SET_CLOSE && setOpen > 0) {
  1056. // End of a character set
  1057. if (setOpen < 2)
  1058. error("Unexpected end of set", filePattern, i);
  1059. setOpen = 0;
  1060. } else if (setOpen > 0) {
  1061. // Normal character, or the end of a character set range
  1062. setOpen++;
  1063. setRange = false;
  1064. }
  1065. fileRegex.append(pCh);
  1066. }
  1067.  
  1068. // Check for a well-formed pattern
  1069. if (setOpen > 0 || setRange || curlyOpen > 0) {
  1070. // Incomplete character set or character range
  1071. error("Expecting set closure character or end of range, or }",
  1072. filePattern, len);
  1073. }
  1074. regex = Pattern.compile(fileRegex.toString());
  1075. }
  1076.  
  1077. boolean hasPattern() {
  1078. return hasPattern;
  1079. }
  1080.  
  1081. public boolean accept(Path path) {
  1082. return regex.matcher(path.getName()).matches() && userFilter.accept(path);
  1083. }
  1084.  
  1085. private void error(String s, String pattern, int pos) throws IOException {
  1086. throw new IOException("Illegal file pattern: "
  1087. +s+ " for glob "+ pattern + " at " + pos);
  1088. }
  1089. }
  1090.  
  1091. /** Return the current user's home directory in this filesystem.
  1092. * The default implementation returns "/user/$USER/".
  1093. */
  1094. public Path getHomeDirectory() {
  1095. return new Path("/user/"+System.getProperty("user.name"))
  1096. .makeQualified(this);
  1097. }
  1098.  
  1099. /**
  1100. * Set the current working directory for the given file system. All relative
  1101. * paths will be resolved relative to it.
  1102. *
  1103. * @param new_dir
  1104. */
  1105. public abstract void setWorkingDirectory(Path new_dir);
  1106.  
  1107. /**
  1108. * Get the current working directory for the given file system
  1109. * @return the directory pathname
  1110. */
  1111. public abstract Path getWorkingDirectory();
  1112.  
  1113. /**
  1114. * Call {@link #mkdirs(Path, FsPermission)} with default permission.
  1115. */
  1116. public boolean mkdirs(Path f) throws IOException {
  1117. return mkdirs(f, FsPermission.getDefault());
  1118. }
  1119.  
  1120. /**
  1121. * Make the given file and all non-existent parents into
  1122. * directories. Has the semantics of Unix 'mkdir -p'.
  1123. * Existence of the directory hierarchy is not an error.
  1124. */
  1125. public abstract boolean mkdirs(Path f, FsPermission permission
  1126. ) throws IOException;
  1127.  
  1128. /**
  1129. * The src file is on the local disk. Add it to FS at
  1130. * the given dst name and the source is kept intact afterwards
  1131. */
  1132. public void copyFromLocalFile(Path src, Path dst)
  1133. throws IOException {
  1134. copyFromLocalFile(false, src, dst);
  1135. }
  1136.  
  1137. /**
  1138. * The src files is on the local disk. Add it to FS at
  1139. * the given dst name, removing the source afterwards.
  1140. */
  1141. public void moveFromLocalFile(Path[] srcs, Path dst)
  1142. throws IOException {
  1143. copyFromLocalFile(true, true, srcs, dst);
  1144. }
  1145.  
  1146. /**
  1147. * The src file is on the local disk. Add it to FS at
  1148. * the given dst name, removing the source afterwards.
  1149. */
  1150. public void moveFromLocalFile(Path src, Path dst)
  1151. throws IOException {
  1152. copyFromLocalFile(true, src, dst);
  1153. }
  1154.  
  1155. /**
  1156. * The src file is on the local disk. Add it to FS at
  1157. * the given dst name.
  1158. * delSrc indicates if the source should be removed
  1159. */
  1160. public void copyFromLocalFile(boolean delSrc, Path src, Path dst)
  1161. throws IOException {
  1162. copyFromLocalFile(delSrc, true, src, dst);
  1163. }
  1164.  
  1165. /**
  1166. * The src files are on the local disk. Add it to FS at
  1167. * the given dst name.
  1168. * delSrc indicates if the source should be removed
  1169. */
  1170. public void copyFromLocalFile(boolean delSrc, boolean overwrite,
  1171. Path[] srcs, Path dst)
  1172. throws IOException {
  1173. Configuration conf = getConf();
  1174. FileUtil.copy(getLocal(conf), srcs, this, dst, delSrc, overwrite, conf);
  1175. }
  1176.  
  1177. /**
  1178. * The src file is on the local disk. Add it to FS at
  1179. * the given dst name.
  1180. * delSrc indicates if the source should be removed
  1181. */
  1182. public void copyFromLocalFile(boolean delSrc, boolean overwrite,
  1183. Path src, Path dst)
  1184. throws IOException {
  1185. Configuration conf = getConf();
  1186. FileUtil.copy(getLocal(conf), src, this, dst, delSrc, overwrite, conf);
  1187. }
  1188.  
  1189. /**
  1190. * The src file is under FS, and the dst is on the local disk.
  1191. * Copy it from FS control to the local dst name.
  1192. */
  1193. public void copyToLocalFile(Path src, Path dst) throws IOException {
  1194. copyToLocalFile(false, src, dst);
  1195. }
  1196.  
  1197. /**
  1198. * The src file is under FS, and the dst is on the local disk.
  1199. * Copy it from FS control to the local dst name.
  1200. * Remove the source afterwards
  1201. */
  1202. public void moveToLocalFile(Path src, Path dst) throws IOException {
  1203. copyToLocalFile(true, src, dst);
  1204. }
  1205.  
  1206. /**
  1207. * The src file is under FS, and the dst is on the local disk.
  1208. * Copy it from FS control to the local dst name.
  1209. * delSrc indicates if the src will be removed or not.
  1210. */
  1211. public void copyToLocalFile(boolean delSrc, Path src, Path dst)
  1212. throws IOException {
  1213. FileUtil.copy(this, src, getLocal(getConf()), dst, delSrc, getConf());
  1214. }
  1215.  
  1216. /**
  1217. * Returns a local File that the user can write output to. The caller
  1218. * provides both the eventual FS target name and the local working
  1219. * file. If the FS is local, we write directly into the target. If
  1220. * the FS is remote, we write into the tmp local area.
  1221. */
  1222. public Path startLocalOutput(Path fsOutputFile, Path tmpLocalFile)
  1223. throws IOException {
  1224. return tmpLocalFile;
  1225. }
  1226.  
  1227. /**
  1228. * Called when we're all done writing to the target. A local FS will
  1229. * do nothing, because we've written to exactly the right place. A remote
  1230. * FS will copy the contents of tmpLocalFile to the correct target at
  1231. * fsOutputFile.
  1232. */
  1233. public void completeLocalOutput(Path fsOutputFile, Path tmpLocalFile)
  1234. throws IOException {
  1235. moveFromLocalFile(tmpLocalFile, fsOutputFile);
  1236. }
  1237.  
  1238. /**
  1239. * No more filesystem operations are needed. Will
  1240. * release any held locks.
  1241. */
  1242. public void close() throws IOException {
  1243. // delete all files that were marked as delete-on-exit.
  1244. processDeleteOnExit();
  1245. CACHE.remove(this.key, this);
  1246. }
  1247.  
  1248. /** Return the total size of all files in the filesystem.*/
  1249. public long getUsed() throws IOException{
  1250. long used = 0;
  1251. FileStatus[] files = listStatus(new Path("/"));
  1252. for(FileStatus file:files){
  1253. used += file.getLen();
  1254. }
  1255. return used;
  1256. }
  1257.  
  1258. /**
  1259. * Get the block size for a particular file.
  1260. * @param f the filename
  1261. * @return the number of bytes in a block
  1262. */
  1263. /** @deprecated Use getFileStatus() instead */ @Deprecated
  1264. public long getBlockSize(Path f) throws IOException {
  1265. return getFileStatus(f).getBlockSize();
  1266. }
  1267.  
  1268. /** Return the number of bytes that large input files should be optimally
  1269. * be split into to minimize i/o time. */
  1270. public long getDefaultBlockSize() {
  1271. // default to 32MB: large enough to minimize the impact of seeks
  1272. return getConf().getLong("fs.local.block.size", 32 * 1024 * 1024);
  1273. }
  1274.  
  1275. /**
  1276. * Get the default replication.
  1277. */
  1278. public short getDefaultReplication() { return 1; }
  1279.  
  1280. /**
  1281. * Return a file status object that represents the path.
  1282. * @param f The path we want information from
  1283. * @return a FileStatus object
  1284. * @throws FileNotFoundException when the path does not exist;
  1285. * IOException see specific implementation
  1286. */
  1287. public abstract FileStatus getFileStatus(Path f) throws IOException;
  1288.  
  1289. /**
  1290. * Get the checksum of a file.
  1291. *
  1292. * @param f The file path
  1293. * @return The file checksum. The default return value is null,
  1294. * which indicates that no checksum algorithm is implemented
  1295. * in the corresponding FileSystem.
  1296. */
  1297. public FileChecksum getFileChecksum(Path f) throws IOException {
  1298. return null;
  1299. }
  1300.  
  1301. /**
  1302. * Set the verify checksum flag. This is only applicable if the
  1303. * corresponding FileSystem supports checksum. By default doesn't do anything.
  1304. * @param verifyChecksum
  1305. */
  1306. public void setVerifyChecksum(boolean verifyChecksum) {
  1307. //doesn't do anything
  1308. }
  1309.  
  1310. /**
  1311. * Return a list of file status objects that corresponds to the list of paths
  1312. * excluding those non-existent paths.
  1313. *
  1314. * @param paths
  1315. * the list of paths we want information from
  1316. * @return a list of FileStatus objects
  1317. * @throws IOException
  1318. * see specific implementation
  1319. */
  1320. private FileStatus[] getFileStatus(Path[] paths) throws IOException {
  1321. if (paths == null) {
  1322. return null;
  1323. }
  1324. ArrayList<FileStatus> results = new ArrayList<FileStatus>(paths.length);
  1325. for (int i = 0; i < paths.length; i++) {
  1326. try {
  1327. results.add(getFileStatus(paths[i]));
  1328. } catch (FileNotFoundException e) { // do nothing
  1329. }
  1330. }
  1331. return results.toArray(new FileStatus[results.size()]);
  1332. }
  1333.  
  1334. /**
  1335. * Set permission of a path.
  1336. * @param p
  1337. * @param permission
  1338. */
  1339. public void setPermission(Path p, FsPermission permission
  1340. ) throws IOException {
  1341. }
  1342.  
  1343. /**
  1344. * Set owner of a path (i.e. a file or a directory).
  1345. * The parameters username and groupname cannot both be null.
  1346. * @param p The path
  1347. * @param username If it is null, the original username remains unchanged.
  1348. * @param groupname If it is null, the original groupname remains unchanged.
  1349. */
  1350. public void setOwner(Path p, String username, String groupname
  1351. ) throws IOException {
  1352. }
  1353.  
  1354. /**
  1355. * Set access time of a file
  1356. * @param p The path
  1357. * @param mtime Set the modification time of this file.
  1358. * The number of milliseconds since Jan 1, 1970.
  1359. * A value of -1 means that this call should not set modification time.
  1360. * @param atime Set the access time of this file.
  1361. * The number of milliseconds since Jan 1, 1970.
  1362. * A value of -1 means that this call should not set access time.
  1363. */
  1364. public void setTimes(Path p, long mtime, long atime
  1365. ) throws IOException {
  1366. }
  1367.  
  1368. private static FileSystem createFileSystem(URI uri, Configuration conf
  1369. ) throws IOException {
  1370. Class<?> clazz = conf.getClass("fs." + uri.getScheme() + ".impl", null);
  1371. if (clazz == null) {
  1372. throw new IOException("No FileSystem for scheme: " + uri.getScheme());
  1373. }
  1374. FileSystem fs = (FileSystem)ReflectionUtils.newInstance(clazz, conf);
  1375. fs.initialize(uri, conf);
  1376. return fs;
  1377. }
  1378.  
  1379. /** Caching FileSystem objects */
  1380. static class Cache {
  1381. private final Map<Key, FileSystem> map = new HashMap<Key, FileSystem>();
  1382.  
  1383. synchronized FileSystem get(URI uri, Configuration conf) throws IOException{
  1384. Key key = new Key(uri, conf);
  1385. FileSystem fs = map.get(key);
  1386. if (fs == null) {
  1387. fs = createFileSystem(uri, conf);
  1388. if (map.isEmpty() && !clientFinalizer.isAlive()) {
  1389. Runtime.getRuntime().addShutdownHook(clientFinalizer);
  1390. }
  1391. fs.key = key;
  1392. map.put(key, fs);
  1393. }
  1394. return fs;
  1395. }
  1396.  
  1397. synchronized void remove(Key key, FileSystem fs) {
  1398. if (map.containsKey(key) && fs == map.get(key)) {
  1399. map.remove(key);
  1400. if (map.isEmpty() && !clientFinalizer.isAlive()) {
  1401. if (!Runtime.getRuntime().removeShutdownHook(clientFinalizer)) {
  1402. LOG.info("Could not cancel cleanup thread, though no " +
  1403. "FileSystems are open");
  1404. }
  1405. }
  1406. }
  1407. }
  1408.  
  1409. synchronized void closeAll() throws IOException {
  1410. List<IOException> exceptions = new ArrayList<IOException>();
  1411. for(; !map.isEmpty(); ) {
  1412. Map.Entry<Key, FileSystem> e = map.entrySet().iterator().next();
  1413. final Key key = e.getKey();
  1414. final FileSystem fs = e.getValue();
  1415.  
  1416. //remove from cache
  1417. remove(key, fs);
  1418.  
  1419. if (fs != null) {
  1420. try {
  1421. fs.close();
  1422. }
  1423. catch(IOException ioe) {
  1424. exceptions.add(ioe);
  1425. }
  1426. }
  1427. }
  1428.  
  1429. if (!exceptions.isEmpty()) {
  1430. throw MultipleIOException.createIOException(exceptions);
  1431. }
  1432. }
  1433.  
  1434. /** FileSystem.Cache.Key */
  1435. static class Key {
  1436. final String scheme;
  1437. final String authority;
  1438. final String username;
  1439.  
  1440. Key(URI uri, Configuration conf) throws IOException {
  1441. scheme = uri.getScheme()==null?"":uri.getScheme().toLowerCase();
  1442. authority = uri.getAuthority()==null?"":uri.getAuthority().toLowerCase();
  1443. UserGroupInformation ugi = UserGroupInformation.readFrom(conf);
  1444. if (ugi == null) {
  1445. try {
  1446. ugi = UserGroupInformation.login(conf);
  1447. } catch(LoginException e) {
  1448. LOG.warn("uri=" + uri, e);
  1449. }
  1450. }
  1451. username = ugi == null? null: ugi.getUserName();
  1452. }
  1453.  
  1454. /** {@inheritDoc} */
  1455. public int hashCode() {
  1456. return (scheme + authority + username).hashCode();
  1457. }
  1458.  
  1459. static boolean isEqual(Object a, Object b) {
  1460. return a == b || (a != null && a.equals(b));
  1461. }
  1462.  
  1463. /** {@inheritDoc} */
  1464. public boolean equals(Object obj) {
  1465. if (obj == this) {
  1466. return true;
  1467. }
  1468. if (obj != null && obj instanceof Key) {
  1469. Key that = (Key)obj;
  1470. return isEqual(this.scheme, that.scheme)
  1471. && isEqual(this.authority, that.authority)
  1472. && isEqual(this.username, that.username);
  1473. }
  1474. return false;
  1475. }
  1476.  
  1477. /** {@inheritDoc} */
  1478. public String toString() {
  1479. return username + "@" + scheme + "://" + authority;
  1480. }
  1481. }
  1482. }
  1483.  
  1484. public static final class Statistics {
  1485. private final String scheme;
  1486. private AtomicLong bytesRead = new AtomicLong();
  1487. private AtomicLong bytesWritten = new AtomicLong();
  1488.  
  1489. public Statistics(String scheme) {
  1490. this.scheme = scheme;
  1491. }
  1492.  
  1493. /**
  1494. * Increment the bytes read in the statistics
  1495. * @param newBytes the additional bytes read
  1496. */
  1497. public void incrementBytesRead(long newBytes) {
  1498. bytesRead.getAndAdd(newBytes);
  1499. }
  1500.  
  1501. /**
  1502. * Increment the bytes written in the statistics
  1503. * @param newBytes the additional bytes written
  1504. */
  1505. public void incrementBytesWritten(long newBytes) {
  1506. bytesWritten.getAndAdd(newBytes);
  1507. }
  1508.  
  1509. /**
  1510. * Get the total number of bytes read
  1511. * @return the number of bytes
  1512. */
  1513. public long getBytesRead() {
  1514. return bytesRead.get();
  1515. }
  1516.  
  1517. /**
  1518. * Get the total number of bytes written
  1519. * @return the number of bytes
  1520. */
  1521. public long getBytesWritten() {
  1522. return bytesWritten.get();
  1523. }
  1524.  
  1525. public String toString() {
  1526. return bytesRead + " bytes read and " + bytesWritten +
  1527. " bytes written";
  1528. }
  1529.  
  1530. /**
  1531. * Reset the counts of bytes to 0.
  1532. */
  1533. public void reset() {
  1534. bytesWritten.set(0);
  1535. bytesRead.set(0);
  1536. }
  1537.  
  1538. /**
  1539. * Get the uri scheme associated with this statistics object.
  1540. * @return the schema associated with this set of statistics
  1541. */
  1542. public String getScheme() {
  1543. return scheme;
  1544. }
  1545. }
  1546.  
  1547. /**
  1548. * Get the Map of Statistics object indexed by URI Scheme.
  1549. * @return a Map having a key as URI scheme and value as Statistics object
  1550. * @deprecated use {@link #getAllStatistics} instead
  1551. */
  1552. public static synchronized Map<String, Statistics> getStatistics() {
  1553. Map<String, Statistics> result = new HashMap<String, Statistics>();
  1554. for(Statistics stat: statisticsTable.values()) {
  1555. result.put(stat.getScheme(), stat);
  1556. }
  1557. return result;
  1558. }
  1559.  
  1560. /**
  1561. * Return the FileSystem classes that have Statistics
  1562. */
  1563. public static synchronized List<Statistics> getAllStatistics() {
  1564. return new ArrayList<Statistics>(statisticsTable.values());
  1565. }
  1566.  
  1567. /**
  1568. * Get the statistics for a particular file system
  1569. * @param cls the class to lookup
  1570. * @return a statistics object
  1571. */
  1572. public static synchronized
  1573. Statistics getStatistics(String scheme, Class<? extends FileSystem> cls) {
  1574. Statistics result = statisticsTable.get(cls);
  1575. if (result == null) {
  1576. result = new Statistics(scheme);
  1577. statisticsTable.put(cls, result);
  1578. }
  1579. return result;
  1580. }
  1581.  
  1582. public static synchronized void clearStatistics() {
  1583. for(Statistics stat: statisticsTable.values()) {
  1584. stat.reset();
  1585. }
  1586. }
  1587.  
  1588. public static synchronized
  1589. void printStatistics() throws IOException {
  1590. for (Map.Entry<Class<? extends FileSystem>, Statistics> pair:
  1591. statisticsTable.entrySet()) {
  1592. System.out.println(" FileSystem " + pair.getKey().getName() +
  1593. ": " + pair.getValue());
  1594. }
  1595. }
  1596. }

FileSystem

Hadoop 输入/输出流

Hadoop抽象文件系统和java类似,也是使用流机制进行文件的读写,用于读文件数据流和写文件的抽象类分别是:FSDataInputStream和FSDataOutputStream

1、FSDataInputStream

public class FSDataInputStream extends DataInputStream
implements Seekable, PositionedReadable {
……
}

可以看到,FSDataInputStream继承自DataInputStream类,实现了Seekable和PositionedReadable接口

Seekable接口提供在(文件)流中进行随机存取的方法,其功能类似于RandomAccessFile中的getFilePointer()和seek()方法,它提供了某种随机定位文件读取位置的能力

Seekable接口代码以及相关注释如下:

/** 接口,用于支持在流中定位. */
public interface Seekable {
/**
* 将当前偏移量设置到参数位置,下次读取数据将从该位置开始
*/
void seek(long pos) throws IOException; /**得到当前偏移量 */
long getPos() throws IOException; /**重新选择一个副本 */
boolean seekToNewSource(long targetPos) throws IOException;
}

完整的FSDataInputStream类源代码如下:

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs; import java.io.*; /** Utility that wraps a {@link FSInputStream} in a {@link DataInputStream}
* and buffers input through a {@link BufferedInputStream}. */
public class FSDataInputStream extends DataInputStream
implements Seekable, PositionedReadable { public FSDataInputStream(InputStream in)
throws IOException {
super(in);
if( !(in instanceof Seekable) || !(in instanceof PositionedReadable) ) {
throw new IllegalArgumentException(
"In is not an instance of Seekable or PositionedReadable");
}
} public synchronized void seek(long desired) throws IOException {
((Seekable)in).seek(desired);
} public long getPos() throws IOException {
return ((Seekable)in).getPos();
} public int read(long position, byte[] buffer, int offset, int length)
throws IOException {
return ((PositionedReadable)in).read(position, buffer, offset, length);
} public void readFully(long position, byte[] buffer, int offset, int length)
throws IOException {
((PositionedReadable)in).readFully(position, buffer, offset, length);
} public void readFully(long position, byte[] buffer)
throws IOException {
((PositionedReadable)in).readFully(position, buffer, 0, buffer.length);
} public boolean seekToNewSource(long targetPos) throws IOException {
return ((Seekable)in).seekToNewSource(targetPos);
}
}

FSDataInputStream

FSDataInputStream实现的另一个接口是PositionedReadable,它提供了从流中某一个位置开始读数据的一系列方法:

//接口,用于在流中进行定位读
public interface PositionedReadable { //从指定位置开始,读最多指定长度的数据到buffer中offset开始的缓冲区中
//注意,该函数不改变读流的当前位置,同时,它是线程安全的
public int read(long position, byte[] buffer, int offset, int length)
throws IOException; //从指定位置开始,读指定长度的数据到buffer中offset开始的缓冲区中
public void readFully(long position, byte[] buffer, int offset, int length)
throws IOException; public void readFully(long position, byte[] buffer) throws IOException;
}

PositionedReadable中的3个读方法,都不会改变流的当前位置,而且还是线程安全的

2、FSInputStream

org.apache.hadoop.fs包中还包含抽象类FSInputStream。Seekable接口和PositionedReadable中的方法都成为这个类的抽象方法

在FSInputStream类中,通过Seekable接口的seek()方法实现了PositionedReadable接口中的read()方法

//实现PositionedReadable.read()方法
public int read(long position, byte[] buffer, int offset, int length) throws IOException {
/**
* 由于PositionedReadable.read()是线程安全的,所以此处要借助synchronized (this)
* 来保证方法被调用的时候其他方法不会被调用,也保证不会有其他线程改变Seekable.getPos()保存的
* 当前读位置
*/
synchronized (this) {
long oldPos = getPos(); //保存当前读的位置,调用 Seekable.getPos()
int nread = -1;
try {
seek(position); //移动读数据的位置,调用Seekable.seek()
nread = read(buffer, offset, length); //调用InputStream.read()读取数据
} finally {
seek(oldPos); //调用Seekable.seek()恢复InputStream.read()前的位置
}
return nread;
}
}

完整的FSInputStream源代码如下:

package org.apache.hadoop.fs;

import java.io.*;

/****************************************************************
* FSInputStream is a generic old InputStream with a little bit
* of RAF-style seek ability.
*
*****************************************************************/
public abstract class FSInputStream extends InputStream
implements Seekable, PositionedReadable {
/**
* Seek to the given offset from the start of the file.
* The next read() will be from that location. Can't
* seek past the end of the file.
*/
public abstract void seek(long pos) throws IOException; /**
* Return the current offset from the start of the file
*/
public abstract long getPos() throws IOException; /**
* Seeks a different copy of the data. Returns true if
* found a new source, false otherwise.
*/
public abstract boolean seekToNewSource(long targetPos) throws IOException; public int read(long position, byte[] buffer, int offset, int length)
throws IOException {
synchronized (this) {
long oldPos = getPos();
int nread = -1;
try {
seek(position);
nread = read(buffer, offset, length);
} finally {
seek(oldPos);
}
return nread;
}
} public void readFully(long position, byte[] buffer, int offset, int length)
throws IOException {
int nread = 0;
while (nread < length) {
int nbytes = read(position+nread, buffer, offset+nread, length-nread);
if (nbytes < 0) {
throw new EOFException("End of file reached before reading fully.");
}
nread += nbytes;
}
} public void readFully(long position, byte[] buffer)
throws IOException {
readFully(position, buffer, 0, buffer.length);
}
}

FSInputStream

注意:hadoop中没有相对应的FSOutputStream类

3、FSDataOutputStream

FSDataOutputStream用于写数据,和FSDataInputStream类似,继承自DataOutputStream,提供writeInt()和writeChar()等方法,但是FSDataOutputStream更加的简单,没有实现Seekable接口,也就是说,Hadoop文件系统不支持随机写,用户不能在文件中重新定位写位置,并通过写数据来覆盖文件原有的内容。单用户可以通过getPos()方法获得当前流的写位置,为了实现getPos()方法,FSDataOutputStream定义了内部类PositionCache,该类继承自FilterOutputStream,并通过重载write()方法跟踪目前流的写位置.

PositionCache是一个典型的过滤流,在基础的流功能上添加了getPos()方法,同时利用FileSystem.Statistics实现了文件系统读写的一些统计。

public class FSDataOutputStream extends DataOutputStream implements Syncable {
private OutputStream wrappedStream; private static class PositionCache extends FilterOutputStream {
private FileSystem.Statistics statistics;
long position; //当前流的写位置 public PositionCache(OutputStream out,
FileSystem.Statistics stats,
long pos) throws IOException {
super(out);
statistics = stats;
position = pos;
} public void write(int b) throws IOException {
out.write(b);
position++; //跟新当前位置
if (statistics != null) {
statistics.incrementBytesWritten(1); //跟新文件统计值
}
} public void write(byte b[], int off, int len) throws IOException {
out.write(b, off, len);
position += len; // update position
if (statistics != null) {
statistics.incrementBytesWritten(len);
}
} public long getPos() throws IOException {
return position; //返回当前流的写位置
} public void close() throws IOException {
out.close();
}
} @Deprecated
public FSDataOutputStream(OutputStream out) throws IOException {
this(out, null);
} public FSDataOutputStream(OutputStream out, FileSystem.Statistics stats)
throws IOException {
this(out, stats, 0);
} public FSDataOutputStream(OutputStream out, FileSystem.Statistics stats,
long startPosition) throws IOException {
super(new PositionCache(out, stats, startPosition)); //直接生成PositionCache对象并调用父类构造方法
wrappedStream = out;
} public long getPos() throws IOException {
return ((PositionCache)out).getPos();
} public void close() throws IOException {
out.close(); // This invokes PositionCache.close()
} // Returns the underlying output stream. This is used by unit tests.
public OutputStream getWrappedStream() {
return wrappedStream;
} /** {@inheritDoc} */
public void sync() throws IOException {
if (wrappedStream instanceof Syncable) {
((Syncable)wrappedStream).sync();
}
}
}

FSDataOutputStream实现了Syncable接口,该接口只有一个函数sync(),其目的和Linux中系统调用sync()类似,用于将流中保存的数据同步到设备中

/** This interface declare the sync() operation. */
public interface Syncable {
/**
* Synchronize all buffer with the underlying devices.
* @throws IOException
*/
public void sync() throws IOException;
}

hadoop文件系统与I/O流的更多相关文章

  1. Hadoop权威指南:HDFS-数据流

    Hadoop权威指南:HDFS-数据流 [TOC] 剖析文件读取 客户端通过调用FileSystem对象的open()方法来打开希望读取的文件,对于HDFS来说, 这个对象是分布式文件系统的一个实例 ...

  2. Hadoop学习笔记(3) Hadoop文件系统二

    1 查询文件系统 (1) 文件元数据:FileStatus,该类封装了文件系统中文件和目录的元数据,包括文件长度.块大小.备份.修改时间.所有者以及版权信息.FileSystem的getFileSta ...

  3. hadoop2.5.2学习及实践笔记(六)—— Hadoop文件系统及其java接口

    文件系统概述 org.apache.hadoop.fs.FileSystem是hadoop的抽象文件系统,为不同的数据访问提供了统一的接口,并提供了大量具体文件系统的实现,满足hadoop上各种数据访 ...

  4. hadoop文件系统FileSystem详解 转自http://hi.baidu.com/270460591/item/0efacd8accb7a1d7ef083d05

    Hadoop文件系统 基本的文件系统命令操作, 通过hadoop fs -help可以获取所有的命令的详细帮助文件. Java抽象类org.apache.hadoop.fs.FileSystem定义了 ...

  5. hadoop学习笔记:hadoop文件系统浅析

    1.什么是分布式文件系统? 管理网络中跨多台计算机存储的文件系统称为分布式文件系统. 2.为什么需要分布式文件系统了? 原因很简单,当数据集的大小超过一台独立物理计算机的存储能力时候,就有必要对它进行 ...

  6. 云计算分布式大数据Hadoop实战高手之路第八讲Hadoop图文训练课程:Hadoop文件系统的操作实战

    本讲通过实验的方式讲解Hadoop文件系统的操作. “云计算分布式大数据Hadoop实战高手之路”之完整发布目录 云计算分布式大数据实战技术Hadoop交流群:312494188,每天都会在群中发布云 ...

  7. hadoop文件系统浅析

    1.什么是分布式文件系统? 管理网络中跨多台计算机存储的文件系统称为分布式文件系统. 2.为什么需要分布式文件系统了? 原因很简单,当数据集的大小超过一台独立物理计算机的存储能力时候,就有必要对它进行 ...

  8. Java API实现Hadoop文件系统增删改查

    Java API实现Hadoop文件系统增删改查 Hadoop文件系统可以通过shell命令hadoop fs -xx进行操作,同时也提供了Java编程接口 maven配置 <project x ...

  9. 将本地文件复制到hadoop文件系统

    package com.yoyosys.cebbank.bdap.service.mr; import java.io.BufferedInputStream; import java.io.File ...

随机推荐

  1. c# 递归异步获取本地驱动器下所有文件

    //获取所有驱动器 string[] drives = Environment.GetLogicalDrives(); foreach (string driver in drives) { Cons ...

  2. shell-sed命令详解(转)

    (转自http://blog.csdn.net/wl_fln/article/details/7281986) Sed简介 sed是一种在线编辑器,它一次处理一行内容.处理时,把当前处理的行存储在临时 ...

  3. Vsftpd支持SSL加密传输

    ftp传输数据是明文,弄个抓包软件就可以通过数据包来分析到账号和密码,为了搭建一个安全性比较高ftp,可以结合SSL来解决问题   SSL(Secure Socket Layer)工作于传输层和应用程 ...

  4. vue 阻止事件冒泡

    <mt-button type="danger" size="small"  @click="cancelOrderInfo(this.even ...

  5. OutputStreamWriter API 以及源码解读

    OutputStreamWriter是字符流与字节流之间的桥梁. 通过它写入的字符流可以通过特殊的字符集转化为字节流.这个特殊的字符集可以指定,也可以采用平台默认的字符集. 每一次调用write()方 ...

  6. redis环境搭建和java应用

    安装 连接 Java连接redis 下载 wget http://download.redis.io/releases/redis-4.0.9.tar.gz 解压移动 tar -xvf redis-4 ...

  7. java浅拷贝和深拷贝(基础也是很重要的)

    对象的copy你兴许只是懵懂,或者是并没在意,来了解下吧. 对于的github基础代码https://github.com/chywx/JavaSE 最近学习c++,跟java很是相像,在慕课网学习c ...

  8. Eclipse酷炫项目、最新趋势介绍

    作为Eclipse基金组织的执行董事,我需要经常审阅每一个新提交的Eclipse项目协议书.作为Eclipse的一分子,我很乐意与加入我们团队的新开发人员互动.这也是我工作中的乐趣之一.2013年,我 ...

  9. 什么情况下调用doGet()和doPost()?

    Jsp页面中的FORM标签里的method属性为get时调用doGet(),为post时调用doPost().

  10. Arduino可穿戴教程之第一个程序——Blink(一)

    Arduino可穿戴教程之第一个程序——Blink(一) 至此我们的硬件和软件部分都准备好了,是时候测试一下他们是否可以和谐地合作了.当然,第一个程序我们并不需要自己来写,因为我们还没有了解过Ardu ...