Directory类是个抽象类,作用是设置索引的存放位置。具体的子类提供特定的存储索引的地址。(FSDirectory将索引存放在指定的磁盘中,RAMDirectory将索引存放在内存中。

在网上很多文章中都提到RAMDirectory性能比FSDirectory好,因为一个是内存,一个是外存。然而也有人做了实验,实验结果和上述结论相反。具体看下面这篇原文:关于lucene的RAMDirectory和FSDirectory的性能问题的困惑

其实在8.7.0版本的Lucene源码中RAMDirectory类已经多了个@Deprecated注解。并且配有注释:

/**
 * A memory-resident {@link Directory} implementation.  Locking
 * implementation is by default the {@link SingleInstanceLockFactory}.
 * 
 * <p><b>Warning:</b> This class is not intended to work with huge
 * indexes. Everything beyond several hundred megabytes will waste
 * resources (GC cycles), because it uses an internal buffer size
 * of 1024 bytes, producing millions of {@code byte[1024]} arrays.
 * This class is optimized for small memory-resident indexes.
 * It also has bad concurrency on multithreaded environments.
 * 
 * <p>It is recommended to materialize large indexes on disk and use
 * {@link MMapDirectory}, which is a high-performance directory
 * implementation working directly on the file system cache of the
 * operating system, so copying data to Java heap space is not useful.
 * 
 * @deprecated This class uses inefficient synchronization and is discouraged
 * in favor of {@link MMapDirectory}. It will be removed in future versions 
 * of Lucene.
 */
@Deprecated
public class RAMDirectory extends BaseDirectory implements Accountable {
...
}

该注释说明了此类的效率问题,并且在未来版本会移除。所以不要凭借经验就认为内存中比外存中快。

下面来看FSDirectory类:

/**
 * Base class for Directory implementations that store index
 * files in the file system.  
 * <a name="subclasses"></a>
 * There are currently three core
 * subclasses:
 *
 * <ul>
 *
 *  <li>{@link SimpleFSDirectory} is a straightforward
 *       implementation using Files.newByteChannel.
 *       However, it has poor concurrent performance
 *       (multiple threads will bottleneck) as it
 *       synchronizes when multiple threads read from the
 *       same file.
 *
 *  <li>{@link NIOFSDirectory} uses java.nio's
 *       FileChannel's positional io when reading to avoid
 *       synchronization when reading from the same file.
 *       Unfortunately, due to a Windows-only <a
 *       href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734">Sun
 *       JRE bug</a> this is a poor choice for Windows, but
 *       on all other platforms this is the preferred
 *       choice. Applications using {@link Thread#interrupt()} or
 *       {@link Future#cancel(boolean)} should use
 *       {@code RAFDirectory} instead. See {@link NIOFSDirectory} java doc
 *       for details.
 *        
 *  <li>{@link MMapDirectory} uses memory-mapped IO when
 *       reading. This is a good choice if you have plenty
 *       of virtual memory relative to your index size, eg
 *       if you are running on a 64 bit JRE, or you are
 *       running on a 32 bit JRE but your index sizes are
 *       small enough to fit into the virtual memory space.
 *       Java has currently the limitation of not being able to
 *       unmap files from user code. The files are unmapped, when GC
 *       releases the byte buffers. Due to
 *       <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038">
 *       this bug</a> in Sun's JRE, MMapDirectory's {@link IndexInput#close}
 *       is unable to close the underlying OS file handle. Only when
 *       GC finally collects the underlying objects, which could be
 *       quite some time later, will the file handle be closed.
 *       This will consume additional transient disk usage: on Windows,
 *       attempts to delete or overwrite the files will result in an
 *       exception; on other platforms, which typically have a &quot;delete on
 *       last close&quot; semantics, while such operations will succeed, the bytes
 *       are still consuming space on disk.  For many applications this
 *       limitation is not a problem (e.g. if you have plenty of disk space,
 *       and you don't rely on overwriting files on Windows) but it's still
 *       an important limitation to be aware of. This class supplies a
 *       (possibly dangerous) workaround mentioned in the bug report,
 *       which may fail on non-Sun JVMs.
 * </ul>
 *
 */
public abstract class FSDirectory extends BaseDirectory {
...
}

该类也是一个抽象类。只不过默认已经有多种实现,并且在注释中已经写清楚了不同实现类的优点和缺点。这里大概解释下:

  1. SimpleFSDirectory 并发性能差
  2. NIOFSDirectory 由于JRE的bug,导致Windows不适合使用此类,但是是其他系统的首选。
  3. MMapDirectory 使用内存映射,32位jvm会使用虚拟内存,性能较好,但是由于JRE的bug,导致不能很好的垃圾回收,因此会占用额外的内存,有钱人首选。

下面我们来看当我们使用FSDirectory.open()方法创建实例时具体使用的是哪种实现类:

  public static FSDirectory open(Path path, LockFactory lockFactory) throws IOException {
    if (Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED) {
      return new MMapDirectory(path, lockFactory);
    } else if (Constants.WINDOWS) {
      return new SimpleFSDirectory(path, lockFactory);
    } else {
      return new NIOFSDirectory(path, lockFactory);
    }
  }

可以看出如果是64位JRE或者支持unmapping的时候优先使用的是MMapDirectory。其次,如果是Windows系统,使用的是SimpleFSDirectory,否则使用NIOFSDirectory(不支持Windows)。这个判断条件正好和FSDirectory类上的注释对应。

Q.E.D.


擅长前端的Java程序员