Saturday, April 12, 2014

Directory Scalability of File Systems

Introduction

Recently I've been hearing from several people that having too many files in a directory is bad for performance and you need to shard the files in multiple directories instead. If you're familiar with file system internals you'll notice this claim doesn't sound right at all. 

Most mainstream production quality file systems keep a B+ tree (or a variant) index per directory. B+ trees are very scalable data structures that self balance when full rate goes below a certain threshold for each node. And since each node is a large consecutive block containing hundreds of entries they work with sector based persistent storage very well.

Similarly most database servers make use of B+ trees for their index keeping and they do handle billions of records per each table without trouble.

When you combine these it should be intuitive to expect from file systems handle numerous files in a directory without too much trouble.

Edit: Turns out HFS+ doesn't have per directory B+ Tree index after all. It has one global index which is called the catalog file and directory traversal happens through this index. This means sharding files to several directories for performance reasons is pretty much irrelevant.

Experiment

This information should convince most software engineers but part out of curiosity part to convince the remaining people I did a small experiment to see how a file system directory scales with respect to number of files.

Experiment Environment

For a while now I've been working with Apple development tools so the experiment setup is consisting of Apple technologies.


Operating System
MAC OS X 10.9.2
Computer
Macbook Pro Retina, 3 GHz Intel Core i7, 8 GB 1600 MHz DDR3, APPLE SSD SM256E Media
File System
HFS+
Programming Language
Objective C

Experiment Setup

In the experiment I setup different directories and populated them with 1K, 10K, 100K, 1M very small files (4 bytes each) respectively. I measured time to populate the directories, time to enumerate all files in the directories, and time to lookup 1K files in each directories.

I'm sure someone will be troubled by the small file sizes. I kept the file sizes as small as possible because file size is irrelevant to directory metadata. File data and directory metadata are kept totally separate by the file system. By keeping the file size small I kept the file data write overhead to a minimum and most of the time taken to populate the directories would be spent on updating directory metadata rather than updating the files themselves.

Experiment Results

All measurements are in seconds and in wall clock time. For enumeration and lookup timings I rebooted the computer to make sure file system caches get purged. Indeed without cache purge I've observed that both lookups and enumerations are much faster.



1K Files
10K Files
100K Files
1M Files
Time to create the files
0.22s
2.07s
21.48s
252.59s
Time to lookup 1K Files
0.039s
0.14s
0.30s
0.34s
Time to enumerate all files in the directory
0.032s
0.15s
1.21s
19.98s


Conclusion

As you can see from the results HFS+ file system handled up to 1 million files extremely well. Total file creation and enumeration times increased roughly linear to the number of files. Looking up 1 thousand files took pretty much the same time for all directories.

Total time to enumerate 1 million files seems to take a slight divergence from other directories but this is probably about memory locality. My absurdly simple experiment code reads all file names in a single array and an array with  1 million elements is big enough to put a strain on that.

I did this experiment on HFS+ but I'm fairly sure all mainstream file systems like NTFS, EXT3, EXT4 etc. would produce similar results.

Appendix A

Why Do People Think File Systems Can't Handle Too Many Files in a Directory

I feel like this is a residue in collective software engineers' memory. In the early 90s file systems used to be much primitive compared to today. Most didn't even have journaling so you could lose file system metadata when system shutdown abruptly. But starting from mid 90s most major operating system platforms started having robust file systems that also scaled much better compared to their predecessors.

Appendix B

Why Does 'ls -l' Take Much Longer Than Plain 'ls'

ls -l returns bunch of file attributes among the file names where ls just returns the file names. File attributes are part of file data so they don't really concern the directory and it's not kept with directory metadata. So in effect doing ls -l requires looking up the files one by one and reading their attributes. This is why it's wildly slower then plain ls.

Appendix C

Should I Rule Out Directory Sharding Forever

For 99.999% of software engineers, yes, at least for performance reasons. If you're building a file server that'll host millions of files from thousands of very active users, then a big maybe. Here's why: file systems keep in memory objects representing the entities on disk like files and directories which is called FCB (file control block) in Windows world. FCB is an object that's created per file on disk not for each open(handle) of that file. So the FCB for the very large directory can become a hot point under heavy load. Per FCB resources like locks will be under contention and can potentially hurt performance.

I'm personally skeptic even with this because per FCB resources will matter for file open time mostly. And this is only part of the I/O performance picture where other parts are read, write, flush, sysctl(ioctl) etc. times.

Appendix D

Experiment Code

Sorry for the crudity of the code. For the different directory setups I manually modified parts of the code so you can't really use it out of box.

#import <Foundation/Foundation.h>

#include <QuartzCore/QuartzCore.h>


NSString *path = @"your path here";

void lookup1KFiles(void)
{
  int found = 0;
  double start = CACurrentMediaTime();
  
  for (int i = 0; i < 1000; i++) {
    
    if ([[NSFileManager defaultManager] fileExistsAtPath:[NSString stringWithFormat:@"%@/%d.txt", path, i]
                                             isDirectory:NO]) {
      found++;
    }
  }
  
  double end = CACurrentMediaTime();
  
  NSLog(@"lookup time %lf", end - start);
  
  NSLog( @"Lookup found %d", found);
  
}

void populate(void)
{
  NSData *data = [NSData dataWithBytes:"abcd" length:4];
  
  double start = CACurrentMediaTime();
  
  //NSString *uuid = nil;
  
  for (int i=0; i < 1000; i++) {
    
    if (i % 10000 == 0) {
      //uuid = [[NSUUID UUID] UUIDString];
      NSLog(@"%d", i);
    }
    
    [data writeToFile:[NSString stringWithFormat:@"%@/%d.txt", path, i]
           atomically:NO];
    
    
  }
  
  double end = CACurrentMediaTime();
  
  NSLog(@"populate time %lf", end - start);
}

void enumerate(void)
{
  
  
  NSDirectoryEnumerationOptions options = NSDirectoryEnumerationSkipsSubdirectoryDescendants |
  NSDirectoryEnumerationSkipsPackageDescendants |
  NSDirectoryEnumerationSkipsHiddenFiles;
  
  NSURL *url = [NSURL fileURLWithPath:path];
  
  
  double start = CACurrentMediaTime();
  
  __unused NSArray *array = [[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:nil];
  
  [[NSFileManager defaultManager] contentsOfDirectoryAtURL:url
                                includingPropertiesForKeys:[[NSArray alloc] initWithObjects:NSURLIsRegularFileKey, nil]
                                                   options:options
                                                     error:nil];
  
  double end = CACurrentMediaTime();
  
  NSLog(@"enumerate time %lf", end - start);
  NSLog(@"count %d", (int)[array count]);
  
  return;
}

No comments:

Post a Comment