tag:blogger.com,1999:blog-6942931718640649372024-03-12T17:42:57.720-07:00System Software DiscussionsAdhoc discussions about operating systems, file systems, memory management, distributed systems etc.Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-694293171864064937.post-83870068434483247952017-02-13T21:51:00.001-08:002017-02-13T21:51:54.301-08:00Notes on C++ Exception Handling and Stack Unwinding<br />
<h2>
Introduction</h2>
<div>
If you take a look at the LLVM <a href="http://llvm.org/docs/ExceptionHandling.html">documentation</a>, you might be surprised to see that different processor architectures have their own exception handling ABI, and there's an entire section on how things work on Windows. <b>Why would processors care about C++ exceptions, and why would things work differently on Windows - what is it to the OS what a C++ program does within the user space (opposed to the kernel space)?</b></div>
<div>
<br /></div>
<div>
There are interesting answers to these questions, but first let's take a closer look at the C++ exceptions.</div>
<div>
<br /></div>
<h2>
Exceptions Demystified</h2>
<div>
C++ exceptions give the look and feel of magic. Based on throw and catch statements the control flow seems to jump over functions, doing proper cleanup and knowing when to stop. When you look under the <a href="https://monoinfinito.wordpress.com/series/exception-handling-in-c/">hood</a> it is no surprise that it feels like magic - especially the catching of an exception is really complicated. Breaking things down and making them more concrete helps us better understand things so let's try to break the life of an exception into more concrete albeit high level steps.</div>
<div>
<br /></div>
<div>
Everything starts by compiler's transformation of the keywords <i>throw</i> and <i>catch</i>. Throw is replaced with a pair of <i><span style="font-family: Courier New, Courier, monospace;">__cxa_allocate_exception</span></i> and <i><span style="font-family: Courier New, Courier, monospace;">__cxa_throw</span></i> calls, catch is replaced with a pair of <i><span style="font-family: Courier New, Courier, monospace;">cxa_begin_catch</span></i> and __<i><span style="font-family: Courier New, Courier, monospace;">cxa_end_catch</span></i> calls. These are functions in the C++ runtime. Now at least we know that those keywords are just some function calls under the hood. Now let's look at the high level flow of events when an exception is thrown during program execution:</div>
<div>
<br /></div>
<div>
<ol>
<li><i><span style="font-family: Courier New, Courier, monospace;">__cxa_allocate_exception</span></i> allocates the exception to be thrown.</li>
<li><i><span style="font-family: Courier New, Courier, monospace;">__cxa_throw</span></i> starts the chain of events involved in throwing and handling the exception.</li>
<ol>
<li>This call never returns. Either the exception is caught by some call frame on the stack and execution continues from that point, or no one catches it and <i><span style="font-family: Courier New, Courier, monospace;">std::terminate</span></i> is called</li>
</ol>
<li><i><span style="font-family: Courier New, Courier, monospace;">__Unwind_RaiseException</span></i> (comes with C++ runtime, part of the Unwind library) walks back the function call frames on the stack one by one.</li>
<ol>
<li>To be able to walk back the stack, <i><span style="font-family: Courier New, Courier, monospace;">__Unwind_RaiseException</span></i> uses the CFI (Call Frame Information) that's found in the .eh_frame section of the object/executable file (e.g ELF).</li>
</ol>
<li>As <i><span style="font-family: Courier New, Courier, monospace;">__Unwind_RaiseException</span> </i>walks back the stack, for each frame it calls something called the personality function (<i><span style="font-family: Courier New, Courier, monospace;">__gxx_personality_v0</span></i> for GCC) for that frame. The personality function is a language specific function which</li>
<ol>
<li>Checks whether the current frame will be able to catch the function.</li>
<li>Does the proper cleanup as the current frame is unwound, like calling destructors for objects going out of scope.</li>
</ol>
<li>When the personality function finds a frame that can catch the exception, control is transferred to the corresponding <i>landing pad</i>, from where execution continues.</li>
<li>If no call frame on the stack can catch the exception then <i><span style="font-family: Courier New, Courier, monospace;">std::terminate</span></i> is called.</li>
</ol>
</div>
<h2>
Unwinding the Stack - C++ ABI</h2>
<div>
In the previous section it might have caught your eye that we need some sort of metadata in the object file (e.g ELF) to be able to find the previous stack frames. If you have looked at how the x86 function call stack works this probably sounds weird to you, why can't I just follow the EBP register (which keeps the frame pointer) to find the previous stack frames?<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://2.bp.blogspot.com/-gNd5RTGpQ1Q/WKKKMbnpRJI/AAAAAAAAALg/nNAZeqHW9dEO1RlzdgyJRc5f7dgmUbdHACLcB/s1600/stack-convention.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://2.bp.blogspot.com/-gNd5RTGpQ1Q/WKKKMbnpRJI/AAAAAAAAALg/nNAZeqHW9dEO1RlzdgyJRc5f7dgmUbdHACLcB/s320/stack-convention.png" width="320" /></a></div>
<div>
<br /></div>
<div>
Yes, on x86 processors you don't need any metadata to be able to walk back the stack because the frame pointers (EBP) are saved on the stack and practically for a linked list that you can traverse. However x86 is an old architecture at this point and <b>newer architectures like x86-64, Itanium, and ARM don't necessarily save the frame pointer on the stack - they may not even have a dedicated register for the frame pointer.</b> [1] Compiler generates code in a way that it can always refer to local variables with varying offsets to the stack pointer (ESP) [2].</div>
<div>
<br /></div>
<div>
This is where the C++ ABI and eh_frame/CFI (Call Frame Information) come in. C++ ABI for specific platforms (e.g [3]) specify what kind of metadata needs to be generated by the compiler to enable stack unwinding. This metadata is used by</div>
<div>
<ol>
<li>Exception handling by the runtime</li>
<li>Stack back trace calls in the program itself</li>
<li>Debuggers</li>
</ol>
<b>It's not surprising this ABI is usually specified per processor architecture - it directly relates to processor registers. </b></div>
<div>
<b><br /></b></div>
<div>
You can stop your compiler from generating this metadata by using the <span style="font-family: Courier New, Courier, monospace;"><i>-fno-unwind-tables</i></span> flag, in which case I'd imagine exceptions won't work and you can't see the stack trace in a debugger.</div>
<div>
<br /></div>
Stack unwinding forms the language independent part of the exception handling. There's also a language specific part, which we'll look at next.<br />
<br />
<h2>
Personality Function and Call Frame Cleanup</h2>
<div>
So the unwinding process is able to walk back the stack without knowing anything about the programming language or the compiler - thanks to the ABI - which is great. But someone needs to handle two things:</div>
<div>
<ol>
<li>Understanding whether current call frame (during the unwind process) can catch this exception</li>
<li>Cleaning up the current frame, e.g calling the destructors for the proper objects on the stack</li>
</ol>
<div>
These are very language specific, in fact compiler specific actions. This is why compiler generates something called personality function, which is simply one function per compiler, <i><span style="font-family: Courier New, Courier, monospace;">__gxx_personality_v0</span></i> for GCC. This function is called by the unwinder [4] for each frame on the call stack and does the two things mentioned. To be able to do those, the personality function uses the LSDA (language specific data area), which is metadata generated for each function and placed at the end of the function's code. LSDA contains information to determine what exceptions the function can catch [5] and if it doesn't catch the exception how to clean up this frame (e.g destructors).</div>
</div>
<div>
<br /></div>
<div>
The cleanup metadata that is in the LSDA is called exception table, GCC calls it <i><span style="font-family: Courier New, Courier, monospace;">gcc_except_table</span></i>. The <span style="font-family: Courier New, Courier, monospace;"><i>-fno-exeptions</i></span> compiler flag tells the compiler not to generate this metadata. An exception thrown by such a program would result in a call to <span style="font-family: Courier New, Courier, monospace;"><i>std::terminate</i></span>.</div>
<div>
<br /></div>
<h2>
Nothrow</h2>
<div>
After reading the last section it should be clear why a C++ program terminates when an exception escapes a <span style="font-family: Courier New, Courier, monospace;"><i>nothrow</i></span> function. For such a function the compiler wouldn't generate the exception table which would make the cleanup of the call frames for that function impossible during unwinding. The runtime has no option but to terminate.</div>
<div>
<br /></div>
<h2>
One Pass vs. Two Pass Exception Handling</h2>
<div>
The high level exception handling flow given in the second section is missing one aspect. Some runtimes decide not to do call frame cleanup (e.g destructors) until they make sure someone will handle the exception. They take a first pass at the stack unwinding, trying to locate a frame that will catch the exception. If none found, runtime directly calls <span style="font-family: Courier New, Courier, monospace;"><i>std::terminate</i></span> without bothering with frame cleanups. If a frame catches the exception then the runtime does another pass on the stack this time actually calling the personalization function for frame cleanup. Apparently GCC works this way.</div>
<div>
<br /></div>
<h2>
Exceptions in Windows</h2>
<div>
This is interesting, if you spent your life in the UNIX universe you must be thinking what the hell does an OS to do with exceptions. You're right, UNIX systems don't have the notion of exceptions. They know about signals, and exceptions are an entirely application concept.</div>
<div>
<br /></div>
<div>
Windows is different - it has the concept of exceptions and it manages them closely. Architects of Windows made the interesting choice to <b>unify hardware exceptions (e.g page fault) with software exceptions</b>. The OS defines what an exception looks like, how the exception handling frames should be setup etc. and this is called <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms679270(v=vs.85).aspx">Structured Exception Handling</a> [6]. This has some cool implications, you can catch hardware exceptions like page fault or divide by zero in your C++ catch block. You just need to compile with the SEH flag enabled. This is not possible on UNIX systems, hardware exceptions are delivered as UNIX signals.</div>
<div>
<br /></div>
<div>
On Windows there's a system API that's language agnostic to throw exceptions - <span style="font-family: Courier New, Courier, monospace;"><i>RaiseException</i></span>. <b>Thanks to the extensive OS support Microsoft's C compiler also supports exceptions, through the <span style="font-family: Courier New, Courier, monospace;"><i>__throw</i></span> and <span style="font-family: Courier New, Courier, monospace;"><i>__catch</i></span> keywords.</b></div>
<div>
<br /></div>
<h2>
First Chance / Second Chance Exceptions</h2>
<div>
If you have debugged a Windows application that throws and catches exceptions you might have noticed something peculiar in the debugger. It'll print statements about 'first chance exception's while executing. This is basically an OS provided debug port feature, the OS provides debuggers insights into the exceptions that are happening in the program while the program is running and the debugger is not broken into. First chance exception means an exception has been thrown but stack unwind hasn't happened yet so the program code might still catch the exception. Second chance exception means the stack unwind has finished but no call frame has caught the exception so the program is about to crash [7].</div>
<div>
<br /></div>
<div>
As far as I know this is a pretty unique Windows feature and it is possible because the OS is involved in the exception throwing/handling process.</div>
<h2>
<br /></h2>
<h2>
Notes</h2>
<div>
[1] Some ARM compilers seem to reserve a register as the frame pointer, but this is more of a configuration/implementation detail for that given compiler - unlike the universal use of the EBP register on x86. So runtimes and tools still need to rely on eh_frame and CFI to be able to work for all compilers on the same ABI.</div>
<div>
[2] "In common with the IA-64 runtime architecture, the ARM Exception ABI specifies separate, per-function unwinding
tables indexed by program counter."</div>
<div>
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf</div>
<div>
[3] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf</div>
<div>
[4] Unwinding runtime code knows where to find the personality function thanks to the CFI (Call Frame Information)</div>
<div>
[5] Obviously this depends on where the program counter (PC) was when that frame made the call to another function</div>
<div>
[6] Apparently this was later replaced by Vector Exception Handling but still the OS is a key player in exception throwing and handling.</div>
<div>
[7] https://support.microsoft.com/en-us/help/105675/first-and-second-chance-exception-handling</div>
<div>
<br /></div>
<div>
<br /></div>
<h2>
References</h2>
<div>
<br />
<br />
<br />
<ul>
<li><a href="http://llvm.org/docs/ExceptionHandling.html" style="font-family: Times, "Times New Roman", serif;">http://llvm.org/docs/ExceptionHandling.html</a></li>
</ul>
<ul>
<li><a href="https://monoinfinito.wordpress.com/series/exception-handling-in-c/" style="font-family: Times, "Times New Roman", serif;">https://monoinfinito.wordpress.com/series/exception-handling-in-c/</a></li>
</ul>
<ul>
<li><a href="http://blog.reverberate.org/2013/05/deep-wizardry-stack-unwinding.html" style="font-family: Times, "Times New Roman", serif;">http://blog.reverberate.org/2013/05/deep-wizardry-stack-unwinding.html</a></li>
</ul>
<ul>
<li><a href="https://gnu.wildebeest.org/blog/mjw/2007/08/23/stack-unwinding/" style="font-family: Times, "Times New Roman", serif;">https://gnu.wildebeest.org/blog/mjw/2007/08/23/stack-unwinding/</a></li>
</ul>
<ul>
<li><a href="http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf" style="font-family: Times, "Times New Roman", serif;">http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf</a></li>
</ul>
<ul>
<li><a href="http://www.logix.cz/michal/devel/gas-cfi/" style="font-family: Times, "Times New Roman", serif;">http://www.logix.cz/michal/devel/gas-cfi/</a></li>
</ul>
<ul>
<li><a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ms679270(v=vs.85).aspx" style="font-family: Times, "Times New Roman", serif;">https://msdn.microsoft.com/en-us/library/windows/desktop/ms679270(v=vs.85).aspx</a></li>
</ul>
<br />
<br />
</div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com2tag:blogger.com,1999:blog-694293171864064937.post-63332218217656361982014-06-26T18:05:00.002-07:002014-06-26T18:11:29.779-07:00What Constitutes an OS Kernel?<br />
<h2>
<span style="font-size: x-large;">
Introduction</span></h2>
<div>
The question in the title might seem very trivial to most. But you'd be surprised if you knew the extent of kernel being a misunderstood concept. I have heard following sentences several times in my life and to a person familiar with OS kernels they are really absurd.</div>
<div>
<br /></div>
<div>
<ul>
<li>They changed the kernel in this version</li>
<li>They used the same kernel in this version</li>
<li>Until Windows 8 they have been using the same kernel since Windows 98</li>
<ul>
<li>This is wrong in so many levels, such that I have to specifically talk about this later in this post.</li>
</ul>
</ul>
<div>
I'll stick to Windows NT as an example here but mainstream OSes like Linux and Mac OS X are similar in this perspective.</div>
</div>
<h2>
<span style="font-size: x-large;">
Kernel Mode</span></h2>
<div>
Before diving deep into discussion we should define the 'kernel mode' concept. Processors have different privilege levels. At the most privileged level anything is possible. At lesser privilege levels certain actions are restricted. Although most modern processors support several (e.g 4) privilege levels (rings) mainstream operating systems like Windows and Linux only make user 2 of them for practical reasons. These two privilege levels are called user mode and kernel mode.</div>
<div>
<br /></div>
<div>
User mode is what all user programs run in. In this mode processor can't execute certain instructions, can't access certain registers etc. This is to keep user mode programs from damaging and destabilizing the entire system. Kernel mode is the opposite and everything is allowed.</div>
<h2>
<span style="font-size: x-large;">
Potential Definitions of the 'Kernel'</span></h2>
<div>
Most user mode programmers think of the kernel as an atomic indivisible piece of software that magically does everything user mode programs need, from file system management to hardware bus communication. Funny enough if you stick to the stricter definitions of a kernel these two jobs fall out of the kernel boundary.</div>
<div>
<br /></div>
<div>
First of all you should understand that there isn't even a single definition of the kernel. When people say kernel they might be referring to many significantly different things. Please take a moment to take a look at the architecture diagram below for Windows NT. As you can see the boxes below the 'kernel' line are numerous and complicated in their interactions.</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.microsoft.com/resources/documentation/windowsnt/4/workstation/reskit/en-us/images/sysarchc.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://www.microsoft.com/resources/documentation/windowsnt/4/workstation/reskit/en-us/images/sysarchc.gif" width="295" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Now let's start with potential definitions of the kernel from most narrow to most inclusive.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<h3 style="clear: both; text-align: left;">
<span style="font-size: large;">
Core Kernel</span></h3>
<div>
'Core kernel' would refer to pieces of software that run in kernel mode, written by the OS vendor (e.g Microsoft, Apple) and contains only pieces that are minimal in the services they provide. These services are mainly:</div>
<div>
<ul>
<li>Synchronization primitives</li>
<li>Thread dispatcher, or simply the dispatcher</li>
<li>Interrupt dispatcher/handler</li>
</ul>
<div>
This definition of the kernel somewhat incorrectly corresponds to the microkernel in the above diagram. kernel components that seem very core and low level as virtual memory manager are left out of this definition. This may seem really weird but from an architectural standpoint it is correct as this core part of the kernel doesn't depend on the rest of the kernel in what they do.</div>
</div>
<div>
<br /></div>
<h3>
<span style="font-size: large;">
Core Kernel + Executive Components</span></h3>
<div>
You can view this definition of the kernel as the 'static kernel'. By static I'm referring to the fact that components that are part of this definition aren't dynamically loaded, they are always present and they can't be replaced. For example kernel wouldn't dynamically load a virtual memory manager or unload it. Same for I/O manager. These 'static' parts of the kernel are also written by the OS vendor company. A sample of these static kernel components are:</div>
<div>
<ul>
<li>Virtual memory manager</li>
<li>I/O manager</li>
<li>Security subsystem</li>
<li>Cache manager</li>
</ul>
<div>
Note that crucial parts of the system such as file systems or TCP driver (yes it's a driver in Windows) aren't included in this definition as they are dynamic and can be loaded/unloaded.</div>
</div>
<div>
<br /></div>
<h3>
<span style="font-size: large;">
Core Kernel + Executive + Drivers Shipped by the Vendor</span></h3>
<div>
This definition of the kernel extends the previous one by including the drivers that are developed by the vendor. I'm not saying that come off the box because some drivers that come off the box are still 3rd party software. These drivers are considered crucial parts of the OS yet they're architecturally dynamic and different than the rest of the kernel. I think a great example of these drivers is the NTFS driver. You can't imagine a Windows kernel without it yet technically it's an outer part of the kernel and it's dynamic by nature.</div>
<div>
<br /></div>
<h3>
<span style="font-size: large;">
Anything That Runs in Kernel Mode Including 3rd Party Drivers</span></h3>
<div>
This is also a very legitimate definition of the kernel. Running in the kernel mode gives unlimited access to the processor and the hardware and presents similar challenges to both vendor shipped code and 3rd party code. These are challenges and concepts that simply don't exist in the user mode. A few examples are:</div>
<div>
<ul>
<li>More complicated synchronization primitives (like fast mutexes and spinlocks. you might think you used spinlocks in user mode but they're substantially different)</li>
<li>Responding to interrupts</li>
<li>Dealing with different processor interrupt request levels</li>
<li>Being able to use non-paged memory opposed to all memory being paged. This is not just a matter of ability but is sometimes a requirement</li>
</ul>
<div>
Most importantly any crash in kernel mode drivers is as fatal as a crash in the core kernel.</div>
</div>
<div>
<br /></div>
<h2>
<span style="font-size: x-large;">
Wrap-up</span></h2>
<div>
So going back to where we started, when you hear someone talking about kernel changing or not changing now it should be clear why that discussion doesn't make much sense. First of all, what do you mean by kernel?</div>
<div>
<br /></div>
<div>
With every OS release there are definitely some changes to what can be defined as kernel at any level. </div>
<div>
<br /></div>
<div>
<ul>
<li>OS vendors ship new drivers all the time (like the reFS file system driver that shipped with Windows Server 2012)</li>
<li>They extend existing drivers all the time (e.q TCP driver adding support for IPv6)</li>
<li>There are new APIs added to executive level components such as I/O manager, object manager</li>
<li>Even core kernels change with every release, sometimes in big ways sometimes in small ways. For example Microsoft made the dispatcher lock more granular in Windows7 which resulted in a better multi processor support with less lock contention</li>
</ul>
<div>
My impression is when people see that their existing drivers don't work in a new release of the OS they think the kernel has changed. Most of the time this is simply because of a breaking change in a kernel mode API, most likely in the I/O manager as kernel drivers interact with it extensively. This tells very little about the over all changes made in the kernel mode components.</div>
</div>
<div>
<br /></div>
<h2>
<span style="font-size: x-large;">
Appendix - Windows 98 to Windows 8</span></h2>
<div>
This one is really funny. Windows 98 and Windows 8 don't even belong to the same family of OSes. </div>
<div>
<br /></div>
<div>
Until Windows XP Microsoft built two different families of OSes, Windows and Windows NT. Windows was a descendant of DOS and it was an inferior OS in many ways. However Windows NT was an OS designed by David Cutler later in Microsoft and it was a much modern and robust architecture compared to Windows. These two OSes coexisted until 2000. Following Windows 95 and Windows 98, the Windows line had its final release in Windows ME. Following the Windows NT 4.0, Windows 2000, Windows XP line Windows NT is still alive and the latest release has been Windows 8.1 in 2013.</div>
<div>
<br /></div>
<div>
So saying Windows had the same kernel since Windows 98 until Windows 8 doesn't make any sense.</div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-35694043816860096152014-05-31T18:44:00.008-07:002014-05-31T20:18:32.415-07:00Volatile, Memory Barriers, and Load/Store Reordering<h2>
<span style="font-size: x-large;">
Introduction</span></h2>
<div>
Volatile is a memory access keyword in several languages which is much more complicated and confusing in practice then what it appears to be. There are pieces of information about volatile and related concepts everywhere but it's hard to build a complete understanding. Let's first understand the underlying concepts.</div>
<div>
<br /></div>
<h2>
<span style="font-size: x-large;">
Some Memory Concepts</span></h2>
<h3>
Load/Store</h3>
<div>
This is a more formal terminology for memory read/write. Load means brining something from main memory (ram) to processor registers, and store is the same thing in other direction.<br />
<br /></div>
<h3>
Load/Store Omission by the Compiler</h3>
<div>
Every memory read/write in a program source code doesn't have to result in a processor instruction that does an actual load/store. Compilers can decide to cache values instead of actually loading them from memory when they think it doesn't change the results of the program. This can result in buggy programs in multi threaded programs and in programs with 'unusual memory'.<br />
<br /></div>
<h3>
'Unusual Memory'</h3>
<div>
Unusual memory refers to a memory location that can change in value without compiler having anyway to know it. Hardware mapped memory is such a case. If a sensor hardware is updating a memory location on its own, compiler can't know about this and incorrectly omit stores from that memory location assuming the value will never change.<br />
<br /></div>
<div>
<h3>
Load/Store Reordering by the Compiler</h3>
</div>
<div>
Compilers can reorder loads and stores in a program for optimization reasons, if the reordering won't change program results. This logic can breakdown in multithreaded programs and cause buggy behavior in the program.<br />
<br /></div>
<h3>
Out of Order Execution and Cache Coherency</h3>
<div>
Similar to compilers reordering loads/stores, CPUs can do the same via 'out of order execution'. This means that a CPU might execute instructions in a slightly different order then they appear in computer memory which includes instructions getting reordered and interleaved execution of two or more instructions. Interleaved execution means instructions aren't strictly ordered in their execution i.e one instruction might begin its execution before the 'already in progress' instruction is 'finished'.</div>
<div>
<br /></div>
<div>
Also, when a CPU stores a value in a memory location, all CPUs may not see the same stores in the same order because of cache coherency issues i.e each processor having its own cache of the main memory.</div>
<div>
<br /></div>
<div>
These issues create buggy behavior in multithreaded programs. To prevent this programmers can use memory barriers, which are special instructions telling the CPU not to do out of order execution across the barrier instruction. Memory barriers also ensure cache consistency which means we won't get hurt by cache coherency problems.</div>
<div>
<br /></div>
<h3>
Atomic Load/Store</h3>
<div>
Loading OR storing a primitive value from main memory may or may not be atomic. Don't confuse this with 'read, modify, write' atomicity which can only be achieved either by proper locking or special 'interlocked' instructions. On certain hardware platforms loading an integer may not be atomic i.e some bytes of the integer may be modified while we're reading the integer which means we'll end up with a value of the integer that was never the value of it any point in time. Pointer aligned word sized (32bits) load/store operations on X86 architecture are atomic.</div>
<div>
<br />
<h2>
<span style="font-size: x-large;">
Volatile in C/C++ vs. Java and C#</span></h2>
</div>
<div>
Volatile has different meanings in different programming languages. Let's see these differences.</div>
<div>
<br /></div>
<h3>
C/C++</h3>
<div>
Volatile provides:</div>
<div>
<ul>
<li>When a volatile variable is loaded/stored, it will actually be loaded/stored and the load/store won't be cached or omitted. Code reading volatile variable will always see the latest value.</li>
<li>Loads/Stores of volatile variables won't be reordered i.e they'll appear in the same order they appear in the source code.</li>
</ul>
<div>
Volatile doesn't provide:</div>
</div>
<div>
<ul>
<li>Atomic load/store of the volatile variable. For example if the volatile variable is bigger than word size or it's not pointer aligned on an X86 h/w load/store operations on the variable may not be atomic. Again this is different than read, modify, write atomicity which is an altogether different problem.</li>
<li>Guarantee of not reordering non volatile load/store operations around the volatile loads/stores. Compiler can place a non volatile store operation after a volatile store operation even if non volatile store happens before in the source code. This is actually called the 'happens before' relationship, and volatile keyword doesn't provide that in C/C++.</li>
<li>Memory barriers around volatile loads/stores. CPU can still reorder instructions around load/store of volatile variables and there can still be CPU cache inconsistencies.</li>
</ul>
<div>
Because of these issues volatile is only recommended for memory mapped I/O (i.e unusual memory) in C/C++. If our only concern is seeing the latest value set by a h/w device or some other process that we're sharing memory with, volatile does the job.<br />
<br /></div>
<h3>
Java/C#</h3>
</div>
<div>
Volatile provides:</div>
<div>
<ul>
<li>Guarantee that load/store operations won't be cached/omitted. Code reading volatile variable will always see the latest value.</li>
<li>Guarantee that load/store operations of volatile variables won't be reordered, including the non volatile loads/stores around them. This guarantee came to Java starting with Java 5 only. We'll see why this is important in the case study.</li>
<li>Atomic load/store of the volatile variable i.e loads/stores to a volatile variable won't be interleaved.</li>
<li>Memory barriers. CPU won't do out of order execution of volatile loads/stores (and other loads/stores around them) and it won't cause cache inconsistencies.</li>
</ul>
<div>
Volatile doesn't provide:</div>
</div>
<div>
<ul>
<li>A solution to all your multi threading problems. There are countless scenarios where you'd need proper locking via mutexes etc. or interlocked API that operate on data atomically.</li>
</ul>
</div>
<h2>
<span style="font-size: x-large;">
A Case Study</span></h2>
<div>
Let's look at a small piece of pseudocode which will provide us with the basis for discussion.</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace;">MyType my_global;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">volatile bool f_my_global_ready = false;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">void thread1()</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">{</span><br />
<span style="font-family: 'Courier New', Courier, monospace;"> my_global = setup_my_object();</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> f_</span>my_global_ready = true;</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">}</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">void thread2()</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">{</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>while (!f_my_global_ready);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;"><span class="Apple-tab-span" style="white-space: pre;"> </span>use_my_global_object();</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">}</span></div>
</div>
<div>
<br /></div>
<div>
Code is fairly simple. One thread initializes a global object and sets the flag indicating that it's ready, the other busy waits (polls) until the flag is set and then uses the object prepared by the other thread.<br />
<br />
I should mention that while loads/stores of <span style="font-family: Courier New, Courier, monospace;">f_my_global_ready</span> are volatile load/store operations, initialization and consumption of <span style="font-family: Courier New, Courier, monospace;">my_global</span> consists of non volatile loads/stores. This will be important in terms of reordering guarantees.</div>
<div>
<br /></div>
<div>
Now let's think about how using volatile keyword in different languages would result for the above piece of code.</div>
<div>
<br />
<h2>
<span style="font-size: x-large;">
Analysis of the Case Study</span></h2>
</div>
<h3>
C/C++</h3>
<div>
The code specified in case study won't work as expected in C/C++ and there are several reasons.</div>
<div>
<br /></div>
<div>
If the compiler optimizes the read of <span style="font-family: Courier New, Courier, monospace;">f_my_global_ready</span> in <span style="font-family: Courier New, Courier, monospace;">Thread2</span>, we may never see the flag getting set and the while loop will keep going forever. To prevent compiler doing this optimization we can declare f_my_global_ready as volatile.</div>
<div>
<br /></div>
<div>
However, even if we make the flag volatile there are other issues making the code buggy. First there's the atomicity of the load. In this specific example we're loading a bool which is a single byte so I doubt this will be a problem on any h/w platform. But if we were working with a multi byte primitive such as 32 bit integer we could be vulnerable to race conditions, depending on the h/w platform and depending on how the integer is aligned in memory. Volatile doesn't fix these issues in C/C++.</div>
<div>
<br /></div>
<div>
Second, compiler won't optimize or reorder loads/stores to the volatile flag but it can reorder other 'non volatile' load and stores around it. This means that part of the instructions setting up <span style="font-family: Courier New, Courier, monospace;">my_global</span> might get placed by the compiler after the setting of the flag. This will result in Thread2 breaking the while loop and using <span style="font-family: Courier New, Courier, monospace;">my_global</span> before <span style="font-family: Courier New, Courier, monospace;">my_global</span> was fully initialized. This would be a serious bug.</div>
<div>
<br /></div>
<div>
Third, similar to the compiler reordering, the CPU might do out of order execution which again will result in <span style="font-family: Courier New, Courier, monospace;">Thread2</span> accessing a not fully initialized <span style="font-family: Courier New, Courier, monospace;">my_global</span>. Cache coherency issues can also result in the same problem as other CPUs might see loads/stores in a different order.<br />
<br />
<h3>
Java/C#</h3>
</div>
<div>
For Java (5 and later) and C# code in the case study will work just fine. Thanks to the guarantees given by volatile in these languages everything works as expected. Let's go over these one more time:</div>
<div>
<br /></div>
<div>
<ul>
<li>After the flag is set, <span style="font-family: Courier New, Courier, monospace;">Thread2</span> will see this new value the next time it polls the flag because it's declared as volatile and compiler won't cache the value for the flag.</li>
<li>Load/store of the variable is atomic (again, not read, modify, and write) so we're not concerned with reading an interleaved value accidentally.</li>
<li>Initialization of <span style="font-family: Courier New, Courier, monospace;">my_global</span> will be done strictly before the<span style="font-family: Courier New, Courier, monospace;"> f_my_global_ready</span> flag is set because compiler doesn't reorder non volatile load/store operations across volatile load/store operations. The volatile load/store acts as a strict barrier for all other loads/stores.</li>
<li>CPU won't do out of order execution or have cache inconsistency for the volatile loads/stores thanks to the compiler inserted memory barrier before the volatile loads/stores.</li>
</ul>
<div>
<br /></div>
<h3>
Solution to the C/C++ Case</h3>
</div>
<div>
Volatile doesn't help with the code we've been working on in C/C++, so what other options do we have?</div>
<div>
<br /></div>
<div>
<ul>
<li>If we're using C++ 11, we can use the std::atomic<T> types which act in the same way as volatile in C#/Java.</li>
<li>We can use the Interlocked* API on Windows or similar API present on other platforms.</li>
<li>Use proper locking techniques like spinlocks, mutexes, semaphores etc.</li>
</ul>
<h2>
<span style="font-size: x-large;">Further Reading</span> </h2>
</div>
<div>
<div>
<br />
<ul>
<li><a href="http://en.wikipedia.org/wiki/Volatile_variable">http://en.wikipedia.org/wiki/Volatile_variable</a></li>
<li><a href="http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484">http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484</a></li>
<li><a href="http://en.wikipedia.org/wiki/Busy_waiting">http://en.wikipedia.org/wiki/Busy_waiting</a></li>
<li><a href="http://en.wikipedia.org/wiki/Cache_coherence">http://en.wikipedia.org/wiki/Cache_coherence</a></li>
<li><a href="http://en.wikipedia.org/wiki/Memory_barrier">http://en.wikipedia.org/wiki/Memory_barrier</a></li>
<li><a href="https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt">https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt</a></li>
<li><a href="http://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html">http://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html</a></li>
<li><a href="http://www.ibm.com/developerworks/library/j-jtp06197/">http://www.ibm.com/developerworks/library/j-jtp06197/</a></li>
</ul>
</div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-62329300659296144512014-05-16T19:49:00.002-07:002014-05-16T19:52:47.314-07:00How Do File Recovery Tools Work?<h3>
Introduction</h3>
<div>
If you are a software person you must have wondered how file recovery tools are able to find and recover deleted files on disk. Sure, it's easy to guess that sectors used by the file are potentially untouched and in that case the tool recovers the file but the problem is each sector on disk is simply and array of bytes. How does the recovery tool know what file it used to belong to when it looks at a sector? Here's how.</div>
<div>
<br /></div>
<h3>
The 'Master' File</h3>
<div>
Any serious file system has a 'master' file which keeps a list of all files that exist on a volume. This is called the master file table in NTFS and the catalog file in HFS+. This is a special file and file system knows how to locate this file in a predefined way when it mounts the volume. For example HFS+ keeps the location of the catalog file in volume header which itself is at a fixed location.</div>
<div>
<br /></div>
<div>
You can think of the master file as a giant table which has an entry for each file on the volume. These entries keep certain important information about the file, one of which is the disk sectors occupied by the file. In fact when you read from a file this is how the file system knows from which disk sectors to read.</div>
<div>
<br /></div>
<h3>
Adding/Removing a File</h3>
<div>
It's pretty easy to see that master file can't be a full table where every entry is occupied by a current file record and when a new file is added we just add a new entry at the end. This is not impossible, but it's infeasible. Keeping such a compact table would be very costly and it'd kill performance.</div>
<div>
<br /></div>
<div>
When a file is deleted from the volume, the file system does as little as possible work on the master file to have high performance. Most of the times it simply marks the corresponding entry in the master file as 'deleted' so in the future it can know file pointed by that entry was removed.</div>
<div>
<br /></div>
<div>
When a new file is added it has to have a corresponding entry in the master file. To do this file system can reuse the entry for a previously deleted file, or it can allocate new space on disk for a brand new entry. Obviously file system has to do a lot of house keeping to be able to do these operations efficiently.</div>
<div>
<br /></div>
<h3>
Finding Deleted Files</h3>
<div>
Conceptually finding the deleted files is pretty straightforward. You just need to walk over all entries in the master file and see which entries are marked as deleted. Obviously a problem is if an entry is reused after the file referred by it is deleted, it's not possible to recover that file anymore as we simply don't know about that file. So all file entries in the master file that are marked as deleted are potentially recoverable file.</div>
<div>
<br /></div>
<h3>
Recovering the Data For a File</h3>
Now that we have a list of deleted files, the problem is how to find sectors that were making up the deleted files. We'll use the entry in the master file for this. A key point is when a file system marks the file entry as deleted, that's pretty much all it does. Sector information for that file will still be there. This is great but we're far from done.<br />
<br />
Let's say a file occupied two sectors on disk and we know which sectors they are. Now the question is how do we know whether these sectors aren't being used by another file now. If that's the case we won't be able to recover the file. To determine this we need to use the 'sector map' which is a data structure on disk used by the file system to keep track of used/unused sectors. If both sectors of the file are showing up as unused then we may be able to recover the file.<br />
<br />
<h3>
Complications Around Recovery</h3>
We still can't be certain because if we're unlucky those sectors might have been used by some other file at some point, and then got freed. In this case we'd think the sectors still contain data from the deleted file we're looking for but in fact they would contain data from another deleted file. There isn't a bullet proof way to know whether the data we recovered is truly intact. One way to extract further information would be to see if deleted files have any colliding sectors. If there are N collisions for a specific sector, at least N - 1 of those files have lost their data in that sector. There are still no guarantees around this because master file entries of some files using that sector might have been overwritten by entries of newer files, which means we have no idea those files ever existed.<br />
<br />
Also I should note that we may only recover certain sectors of a file but not the others. If some sectors of the deleted file are in use by other files, it's certain that we lost at least part of the file's data. If all are unused currently we have some hope to recover the file in its entirety but we can't be sure for the reasons I explained above. If the file has some internal verification information (like MD5 hash of the content) we can determine if the data was really recovered entirely or not.<br />
<div>
<br /></div>
<div>
<br /></div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-44783783222005988622014-04-30T18:36:00.001-07:002014-05-04T18:32:39.883-07:00Paged Memory vs. Non Paged Memory<h3>
Introduction</h3>
<div>
Most computer science students learn that memory is paged in and out of main memory to hard disk depending on need, during their freshman year. They learn that the OS does it for them and it is the only way. This is a mostly accurate view of the computing universe but there are exceptions. Since most software engineers will only write user mode programs (opposed to kernel mode programs) in their entire professional life this is accurate information. But when you think about kernel mode code things get more complicated.</div>
<div>
<br /></div>
<h3>
A Short Introduction to Kernel Mode</h3>
<div>
Typical computer software is a user mode program. It's a binary file that lives on the hard disk, loaded to memory by the OS, given a virtual memory space of its own and bunch of crucial computing resources like CPU time. Word processors, web browsers, web servers, software development tools are all examples of user mode programs.</div>
<div>
<br /></div>
<div>
Then there is the kernel space, which is actually much more heterogeneous than the average software engineer imagines. Core OS components, file systems, device drivers, functional drivers (drivers that don't manage physical hardware) all run in the kernel space. Code running in kernel space is more privileged in terms of what it can do on the processor compared to user mode code. One of these several differences is working with non-paged memory.</div>
<div>
<br /></div>
<h3>
Why Does Non-Paged Memory Even Exist - An Intuition</h3>
<div>
Imagine the memory manager component of the OS. It is the component that is responsible for paging in and out memory pages. When a memory page is paged out it is written to part of something called the 'page file' which is a file on disk obviously. The memory manager needs to maintain some metadata to be able to locate page data in this file right? What if this metadata itself is paged out and memory manager has to page in that data to be able to do page ins? You must see where I'm going, it's a contradictory situation.</div>
<div>
<br /></div>
<div>
So even theoretically it's not possible to make all memory pageable. Some memory pages need to be resident at all times.</div>
<div>
<br /></div>
<h3>
Reasons for Using Non-Paged Memory</h3>
<h4>
Performance</h4>
<div>
If a driver or a core OS component is doing something performance sensitive, obviously using pageable memory to read/write data can be problematic. Waiting for a memory page to be brought in by the memory manager can take forever which would kill a performance sensitive driver/OS component.</div>
<h4>
Avoiding Never Ending Cycles</h4>
<div>
Imagine a disk driver that is trying to read a sector from the disk. Obviously it needs some memory to write the data read from device (actual architecture is more complicated with DMA etc. but skipping that for simplicity). What if that memory page is paged out? We need to go back to disk to bring that page in. In worst case scenario this could go on forever. If the disk driver writes the data back to a non-paged memory page than that page will be guaranteed to be resident.</div>
<div>
<br /></div>
<h4>
Code Running At High Interrupt Masking Levels</h4>
At times processor will be running in a high interrupt masking level. This happens either when an interrupt handling routine is called by the processor as a result of an interrupt or when a piece of kernel code explicitly raises the interrupt level of the processor. Whatever the reason might be if a code is running in high interrupt level and it accesses a pageable memory page, that's simply a bug and sooner or later it will cause a crash of the entire OS. The reason this is so disastrous is the fact that memory manager itself takes action as a result of bunch of interrupts and if the current interrupt masking level of the processor is high enough, those interrupts will be masked meaning that they won't be served until the code running at the high interrupt level is done. But that code won't be done until it can bring the memory page it needs back into memory. Well you see the problem. That piece of code is accessing an inaccessible piece of memory and that's a fatal issue.<br />
<br />
So any code that's running at a high enough interrupt level (DPC_LEVEL on Windows) needs to access only non-paged memory. Any other behavior is a fatal bug that'll surely surface.<br />
<br />
<h3>
One Exception to the User Mode Case</h3>
<div>
I previously said that user mode programs don't get to work with non-paged memory but that's not entirely true, they sometimes do without knowing it. Drivers running in kernel mode can 'pin' some memory pages in a process' address space which makes the memory page non-paged for a temporary duration (until driver unpins the page). In these cases user mode programs would be accessing non-paged memory for a while.</div>
<div>
<br /></div>
<h3>
Why Even Bother With Paged Memory?</h3>
<div>
It sounds like using paged memory is nothing but trouble. If all memory was non-paged none of the complications mentioned above would be there. So why are we insisting on having pageable memory?</div>
<div>
<br /></div>
<div>
Well obviously if all memory was non-paged than we could never go over the physical available memory of a computer which is a very big constraint. If we had mostly non-paged memory and some paged memory then code working with pageable memory would incur a huge performance penalty because the rate of page faults would go through the roof.</div>
<div>
<br /></div>
<div>
So operating systems try to limit the amount of non-paged memory to a minimum to avoid these problems. Non-paged memory is a scarce resource and kernel mode code should use it very wisely.</div>
<br />Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-84563271577072393142014-04-29T16:23:00.000-07:002014-04-29T16:23:01.277-07:00Storing Image Data in Transactional Databases<h3>
Introduction</h3>
<div>
You might have noticed that a lot of websites serve their images as links to regular files, instead of storing them as a field in database rows. Generally it's not recommended to store image files directly in the database (e.g Mysql) but people don't say why. I'll talk about two reasons for not doing so.</div>
<div>
<br /></div>
<h3>
Serving Media Through Content Distribution Networks</h3>
<div>
For websites with users all around the world and high traffic it makes a lot of sense to store media content at content distribution networks (CDN). Media content is generally immutable once it's created and they consume a lot of bandwidth to download. By having the content stored as plain files on a CDN the website makes use of CDN's bandwidth and global points of presence (POP). Having several POPs around the globe also makes roundtrip time to reach the data much smaller which results in more prompt downloads.</div>
<div>
<br /></div>
<div>
Keeping media files separate form the database fits better to working with CDN strategy. You can actually still keep the primary copy of the data in your database and put the corresponding file to a CDN but to me that's a weird design. Also this brings us to our next point.</div>
<div>
<br /></div>
<h3>
Cost of Having Transaction Properties</h3>
<div>
Almost all production quality database engines provide the so called ACID transaction properties. A(atomicity) and D(durability) of transactions require a design that employs 'transaction logging'. There can be numerous ways to implement transaction logging but two popular ways are 'write ahead logging' and 'undo/redo logging'. I won't go into details of these techniques but in a nutshell they involve writing the database mutations to transaction logs AND to the actual database image itself. Depending on the logging technique used this means image data will be written to disk 2 or 3 times at least. A single image can occupy space as big as thousands of regular database rows would occupy (imagine tables that have just a few integer and string columns in their definitions) so on a busy database server incurring this extra write(s) is a non trivial performance penalty. For most large scale websites this is unacceptable.</div>
<div>
<br /></div>
<h3>
Conclusion</h3>
<div>
I've picked image files as a case study because it's a fairly common case but this applies to all BLOB (binary large object) types. Purpose of this article isn't to say 'thou shall not store BLOBs in databases'. I'm simply saying know the implications if you want to do so.</div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-80670181007157544182014-04-17T21:41:00.000-07:002014-04-17T21:41:40.263-07:00When and How Files Get Deleted by a File System<h3>
Introduction</h3>
<div>
We had an interesting discussion with a colleague today. We were discussing what would happen if you rename a file to an existing file if there are already open file descriptors (FD, or handle if you're coming from Windows world) against the existing file. Here's the pseudo code for the scenario:</div>
<div>
<ol>
<li>Open an FD to file a/b/c</li>
<li>Read 5 bytes from the FD</li>
<li>Rename an existing file a/b/d to a/b/c</li>
<li>Using the already open FD read another 5 bytes.</li>
</ol>
<div>
<br /></div>
</div>
<div>
So what would be the content of the 5 bytes we read in step 4? Would it be content of the original a/b/c file or would it be the content of a/b/d which was later renamed as a/b/c? I thought we would read the content from the file that was originally a/b/d. I was dead wrong. My colleague wrote an experiment code and we were clearly reading data from old file. This didn't make any sense to me. I was really bothered by two questions:</div>
<div>
<ul>
<li>How can we be seeing two different file content for the same file at the same time?</li>
<li>Since we can still read from original file, sectors occupied by that file are still accessible. But who's accounting for those sectors now?</li>
</ul>
<h3>
Understanding Rename System Call</h3>
<div>
If you look at the man page for rename system call you'll see that if the destination file name already exists, it'll be unlinked first then the source file will be renamed. So this clears out the first question. Obviously we're not seeing two different data for the same file. Original file is deleted and some other file takes its place.</div>
<div>
<br /></div>
<div>
But still it's weird that we're reading from a deleted file and god knows what happened to those poor disk sectors occupied by that file.</div>
<div>
<br /></div>
<h3>
Global File List vs. Directories</h3>
<div>
Most modern file systems keep a global list of files per volume. This is called master file table in NTFS and catalog file in HFS+. A file on a volume is mainly represented by its presence in this table.</div>
<div>
<br /></div>
<div>
When we think of directories we're generally mistaken by the idea that directories 'own' files. Quite the contrary. Directories simply 'link' to files that are accounted for in the master file table. One file in the master file table can be linked by one or more directories. So in this sense Unix terminology of link/unlink is extremely good fitting.</div>
<div>
<br /></div>
<h3>
What Happens When a File is Deleted?</h3>
<div>
When a file is to be deleted file system marks the corresponding entry in the master file table as deleted. It also marks the sectors previously occupied by the file as free, which means they can be allocated by other files from that point on.</div>
<div>
<br /></div>
<h3>
When is a File Deleted?</h3>
<div>
If you look at the man page of unlink system call it very clearly lists conditions for a file to be deleted:</div>
<div>
<ul>
<li>No directory has a link against the file</li>
<li>No process has an open file descriptor against the file</li>
</ul>
<div>
Now this explains the behavior we saw in the experiment described in introduction. We were unlinking the file from the directory but the open FD against is was keeping it from deletion. So the data we read was coming from a perfectly alive file. And sectors of the file are accounted by the file definition in the master file table. Life is good again. When the final FD is closed file will be really deleted this time.</div>
</div>
<div>
<br /></div>
<h3>
Final Missing Piece</h3>
<div>
Then yet another thing started bothering me. Sure it's nice that file stays around until closing of the last FD. But what happens if the system crashes while the FD is still open. When the system reboots we'll end up with a zombie file in the master file table and there's no way to reach it anymore because it doesn't exist in the namespace. </div>
<div>
<br /></div>
<div>
My gut feeling was when the last link to file is removed, this information is persisted somewhere so even if the system crashes until the file is really deleted, it will know about this situation the next time volume is mounted. Sadly I couldn't find concrete information regarding this for NTFS or HFS+, but one EXT3 related paper clearly mentions it:</div>
<div>
<br /></div>
<div>
"In order to handle these deferred delete/truncate requests in a crash-safe manner, the inodes to be unlinked/truncated are added into the ext3 orphan list. This is an already existing mechanism by which ext3 handles file unlink/truncates that might be interrupted by a crash. A persistent singly-linked list of inode numbers is linked from the superblock and, if this list is not empty at filesystem mount time, the ext3 code will first walk the list and delete/truncate all of the files on it before the mount is completed."</div>
<div>
<br /></div>
</div>
<div>
I'm sure other file systems employ a similar mechanism to keep track of such files.</div>
<div>
<br /></div>
<h3>
Credits</h3>
<div>
At this point I should thank my colleague, Slobodan Predolac, whom I had the discussion with and who wrote the experiment code and pointed me to related man pages.</div>
<div>
<br /></div>
<h3>
Appendix - Experiment Source Code</h3>
<div>
<div style="background-color: white; color: #333333; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 12.727272033691406px; line-height: 16.309091567993164px; white-space: pre-wrap;">
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h></div>
<div style="background-color: white; color: #333333; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 12.727272033691406px; line-height: 16.309091567993164px; margin-top: 10px; white-space: pre-wrap;">
#import <Foundation/Foundation.h></div>
<div style="background-color: white; color: #333333; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 12.727272033691406px; line-height: 16.309091567993164px; margin-top: 10px; white-space: pre-wrap;">
int main(int argc, const char * argv[])
{</div>
<div style="background-color: white; color: #333333; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 12.727272033691406px; line-height: 16.309091567993164px; margin-top: 10px; white-space: pre-wrap;">
@autoreleasepool {
NSString *tmpfname = [NSTemporaryDirectory() stringByAppendingPathComponent:@"MyFile"];
NSString *str = @"1234567890ABCDEFG";
[str writeToFile:tmpfname atomically:YES encoding:NSUTF8StringEncoding error:nil];
int fd = open([tmpfname UTF8String], O_RDONLY);
if (-1 == fd) {
perror("Oh sh");
exit(1);
}
char buf[64] = {0};
if (-1 == read(fd, buf, 5)) {
perror("Oh sh");
exit(1);
} else {
NSLog(@"First 5 = '%s'", buf);
}
NSString *str1 = @"qwertyqwertyqwerty";</div>
<div style="background-color: white; color: #333333; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 12.727272033691406px; line-height: 16.309091567993164px; margin-top: 10px; white-space: pre-wrap;">
BOOL succ = [str1 writeToFile:tmpfname atomically:YES encoding:NSUTF8StringEncoding error:nil];
if (!succ) {
NSLog(@"Write failed");
exit(1);
} else {
int fd1 = open([tmpfname UTF8String], O_RDONLY);
}
char buf1[64] = {0};
if (-1 == read(fd, buf1, 5)) {
perror("Oh sh");
exit(1);
} else {
NSLog(@"Next 5 = '%s'", buf1);
}</div>
<div style="background-color: white; color: #333333; font-family: Helvetica, Arial, 'lucida grande', tahoma, verdana, arial, sans-serif; font-size: 12.727272033691406px; line-height: 16.309091567993164px; margin-top: 10px; white-space: pre-wrap;">
close(fd);
}
return 0;
}</div>
</div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-54975657128155773072014-04-16T20:15:00.002-07:002014-04-19T15:33:25.695-07:00File System Reliability: NTFS, journaling, CHKDSK etc.Reliability of file systems and disks can be a very confusing topic with a lot of nuances. I'd like to address at least some of these topics here and hopefully make them a little easier to understand.<br />
<br />
<h3>
Journaling File System</h3>
<div>
In the older days file systems were very vulnerable to abrupt system shutdowns, whether that's a power failure, hardware failure, or a software bug causing the system to crash. When such a failure happened if the file system was in the middle of updating its metadata things could get really ugly. You could end up with disappearing files/directories, orphaned files/directories (files that belong to no directory), files/directories that appear in more than one directory, and many more like this. You could lose even an entire file system or a big chunk of it.</div>
<div>
<br /></div>
<div>
Modern file systems like NTFS addressed this by introducing journaling, which made updates to file system metadata transactional. By implementing transactional logging, file systems could guarantee that metadata updates were atomic, meaning they would either happen in entirety or not happen at all. This made file systems very resilient compared to their predecessors.</div>
<div>
<br /></div>
<h3>
Journaling vs. File Content Corruption</h3>
<div>
In a journaling file system, file system's METADATA is updated atomically. File system metadata and file content are different things. The list of files in a directory is part of metadata and updated via journaling. Content of your word file is user data and it doesn't have the same guarantees. So content of your word file can be left in an undefined state if the system crashes, even with journaling.</div>
<div>
<br /></div>
<h3>
Having Transactional Guarantees for File Content</h3>
<div>
For that guarantee you either need to use a database system that supports transactions, or use the TxF API (transactional file system) on Windows to modify file contents within a transaction. TxF didn't gain any popularity in the developer community though. I guess people just used databases when they needed transactions. And Microsoft removed this support in its latest server file system ReFS.</div>
<div>
<br /></div>
<h3>
CHKDSK (Check Disk)</h3>
<div>
Chkdsk is a tool in the Windows platform that verifies the health of a file system and make repairs if necessary. Chkdsk works with both FAT and NTFS file systems. The fact that NTFS is a journaling file system confuses a lot of people regarding chkdsk. After all, why do you need such a tool if file system protects itself from metadata inconsistencies via journaling right?</div>
<div>
<br /></div>
<div>
Well it's a little complicated than that. First of all chkdsk addresses different problems for FAT and NTFS. FAT can end up with inconsistent metadata whenever there was an abrupt shutdown. NTFS is resilient to that, but bad sectors complicate life here. So because of bad sectors even NTFS needs chkdsk to make things right when it's needed.<br />
<br />
Software and hardware both work on preventing data loss caused by bad sectors as explained in <a href="http://systemtbe.blogspot.com/2014/04/bad-sector-problems-and-how-theyre.html">here</a>. However there can still be unrecoverable bad sector errors. If this happens to a sector that used to contain file system metadata, file system needs to be repaired by bringing it to a consistent state. This might still result in metadata/data loss though.<br />
<br />
I have seen on the internet people mentioning CHKDSK is responsible for working with journal records to bring file system to a consistent state, and that's its main job on NTFS volumes. This is incorrect. Scanning journal records during volume mount is routine NTFS activity and has nothing to do with CHKDSK.</div>
<div>
<br /></div>
<div>
</div>
<div>
<br /></div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-50449933450440821562014-04-16T20:09:00.001-07:002014-04-16T20:09:13.205-07:00Bad Sector Problems and How They're Addressed'Bad sector' is a frequently heard term by tech people but most people don't understand it deeply. Handling of bad sectors involve cooperation of storage hardware, volume drivers, and file system drivers. Hence understanding how such problems are handled and mitigated is pretty complicated. Let's address the subtleties of the topic one by one.<br />
<br />
<h3>
Sector Remapping by the Disk Hardware Itself</h3>
Most modern disks have some error correction code (ECC) per each sector. So if the disk h/w notices that data read from a sector doesn't add up to its checksum, it corrects the sector data using error correction code for that sector. Disk remaps this sector to a spare sector as soon as possible to prevent permanent data loss. Even disk drives that have this functionality can run out of spare sectors and may not be able to do remapping anymore.<br />
<br />
Another case is even if the sector has gone bad beyond error correction, sector still can be remapped the next time it's written to. This is because the old data was going to be overwritten by the new write and old data in the sector becomes irrelevant.<br />
<br />
<h3>
Sector Remapping by the Volume Driver Software</h3>
<div>
SCSI disks provide an interface to explicitly map a bad sector to a spare sector. Imagine a sector has gone bad beyond the point of error correction (or maybe disk sectors didn't have ECC in the first place). In this case disk hardware can't remap the sector without data loss. But if there's redundancy in place (like a mirrored RAID system) volume driver can get the redundant sector data, remap the sector to a spare one on the disk drive, and re-write the remapped sector with the redundant data. This way sector is saved without involving the file system driver at all.</div>
<div>
<br /></div>
<div>
If the sector data can be saved because there's redundancy but the bad sector can't be remapped because there aren't any remaining spare sectors, then there's still a solution but it requires cooperation from the file system.</div>
<div>
<br /></div>
<h3>
Sector Remapping by the File System</h3>
<div>
First I should say that file systems don't really do sector remapping. All they do is marking a sector as bad so they won't use it in the future. Depending on whether the current operation is a read or write and whether sector data had some redundancy to it, flow of events will be different.</div>
<div>
<br /></div>
<div>
<ul>
<li>Reading the sector</li>
<ul>
<li>Sector has gone bad but volume driver has recovered the data via redundancy (e.g mirrored RAID)</li>
<ul>
<li>In this case file system marks the sector as unusable, finds a free sector to replace it and writes the sector data to this new sector. Note that this is not a physical remapping, file system is just ignoring this bad sector from now on. It's like a black hole in the sector bitmap.</li>
</ul>
<li>Sector has gone bad and data is lost.</li>
<ul>
<li>Any reads from the related file's related part will return an I/O error to the caller.</li>
<li>If the failed sector was part of file system's metadata, then a repair action on the file system is necessary but still file system will lose some metadata in all likelihood.</li>
</ul>
</ul>
<li>Writing to the sector</li>
<ul>
<li>File system will mark the bad sector as unusable in its sector map, and use a free sector in its place and write that sector with the new write data.</li>
</ul>
</ul>
</div>
<br />
<h3>
A Case Study</h3>
<div>
Recently a colleague of mine had checksum errors with his git repository. As the issue was being discussed some people suggested his hard drive is probably failing and he's having bad sectors. As you'd see in the above notes when there's a bad sector file systems don't simply return garbage data from the sector. If the sector data can be saved by something in the I/O stack (FS driver, volume driver, disk driver, disk H/W etc.) it will be. Otherwise caller of the file read operation will get an error code saying data couldn't be read. Of course this may not be the case for all file systems in the world but any production quality file system has to behave this way.</div>
<div>
<br /></div>
<div>
So any checksum problems like my colleague had would most likely be the result of application level bugs rather than file system returning gibberish data for bad sectors. </div>
<div>
<br /></div>
<div>
<br /></div>
<h3>
Note</h3>
<div>
Most information here is based on NTFS and Windows' out of box volume driver. But it should still be mostly applicable to most modern production quality software.</div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0tag:blogger.com,1999:blog-694293171864064937.post-49431818770986952412014-04-12T18:07:00.000-07:002014-04-19T19:31:32.360-07:00Directory Scalability of File Systems<h2>
Introduction</h2>
<div>
Recently I've been hearing from several people that having too many files in a directory is bad for performance and you need to shard the files in multiple directories instead. If you're familiar with file system internals you'll notice this claim doesn't sound right at all. </div>
<div>
<br /></div>
<div>
Most mainstream production quality file systems keep a B+ tree (or a variant) index per directory. B+ trees are very scalable data structures that self balance when full rate goes below a certain threshold for each node. And since each node is a large consecutive block containing hundreds of entries they work with sector based persistent storage very well.</div>
<div>
<br /></div>
<div>
Similarly most database servers make use of B+ trees for their index keeping and they do handle billions of records per each table without trouble.</div>
<div>
<br /></div>
<div>
When you combine these it should be intuitive to expect from file systems handle numerous files in a directory without too much trouble.<br />
<br />
<b>Edit:</b> Turns out HFS+ doesn't have per directory B+ Tree index after all. It has one global index which is called the catalog file and directory traversal happens through this index. This means sharding files to several directories for performance reasons is pretty much irrelevant.</div>
<div>
<br /></div>
<h2>
Experiment</h2>
<div>
This information should convince most software engineers but part out of curiosity part to convince the remaining people I did a small experiment to see how a file system directory scales with respect to number of files.</div>
<div>
<br /></div>
<h3>
Experiment Environment</h3>
<div>
For a while now I've been working with Apple development tools so the experiment setup is consisting of Apple technologies.<br />
<br />
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>27</o:Words>
<o:Characters>154</o:Characters>
<o:Company>fb</o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:CharactersWithSpaces>180</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;
mso-fareast-language:JA;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;
mso-fareast-language:JA;}
</style>
<![endif]-->
<!--StartFragment-->
<!--EndFragment--><br />
<table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: none; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 1184;">
<tbody>
<tr style="mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td style="border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Operating
System<o:p></o:p></div>
</td>
<td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
MAC OS X
10.9.2<o:p></o:p></div>
</td>
</tr>
<tr style="mso-yfti-irow: 1;">
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Computer<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Macbook
Pro Retina, <span style="font-family: "Lucida Grande"; font-size: 11.0pt;">3 GHz
Intel Core i7, 8 GB 1600 MHz DDR3, APPLE SSD SM256E Media</span><o:p></o:p></div>
</td>
</tr>
<tr style="mso-yfti-irow: 2;">
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
File
System<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
HFS+<o:p></o:p></div>
</td>
</tr>
<tr style="mso-yfti-irow: 3; mso-yfti-lastrow: yes;">
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Programming
Language<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 221.4pt;" valign="top" width="221"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Objective
C<o:p></o:p></div>
</td>
</tr>
</tbody></table>
<br />
<h3>
Experiment Setup</h3>
<div>
In the experiment I setup different directories and populated them with 1K, 10K, 100K, 1M very small files (4 bytes each) respectively. I measured time to populate the directories, time to enumerate all files in the directories, and time to lookup 1K files in each directories.</div>
<div>
<br /></div>
<div>
I'm sure someone will be troubled by the small file sizes. I kept the file sizes as small as possible because file size is irrelevant to directory metadata. File data and directory metadata are kept totally separate by the file system. By keeping the file size small I kept the file data write overhead to a minimum and most of the time taken to populate the directories would be spent on updating directory metadata rather than updating the files themselves.</div>
<div>
<br /></div>
<h3>
Experiment Results</h3>
<div>
All measurements are in seconds and in wall clock time. For enumeration and lookup timings I rebooted the computer to make sure file system caches get purged. Indeed without cache purge I've observed that both lookups and enumerations are much faster.</div>
<div>
<br /></div>
<div>
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Revision>0</o:Revision>
<o:TotalTime>0</o:TotalTime>
<o:Pages>1</o:Pages>
<o:Words>32</o:Words>
<o:Characters>185</o:Characters>
<o:Company>fb</o:Company>
<o:Lines>1</o:Lines>
<o:Paragraphs>1</o:Paragraphs>
<o:CharactersWithSpaces>216</o:CharactersWithSpaces>
<o:Version>14.0</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>JA</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
<w:UseFELayout/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="276">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;
mso-fareast-language:JA;}
table.MsoTableGrid
{mso-style-name:"Table Grid";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-priority:59;
mso-style-unhide:no;
border:solid windowtext 1.0pt;
mso-border-alt:solid windowtext .5pt;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-border-insideh:.5pt solid windowtext;
mso-border-insidev:.5pt solid windowtext;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:Cambria;
mso-ascii-font-family:Cambria;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Cambria;
mso-hansi-theme-font:minor-latin;
mso-fareast-language:JA;}
</style>
<![endif]-->
<!--StartFragment-->
<br />
<table border="1" cellpadding="0" cellspacing="0" class="MsoTableGrid" style="border-collapse: collapse; border: none; mso-border-alt: solid windowtext .5pt; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-yfti-tbllook: 1184;">
<tbody>
<tr>
<td style="border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 82.3pt;" valign="top" width="82"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
<br /></div>
</td>
<td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 92.35pt;" valign="top" width="92"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
1K Files<o:p></o:p></div>
</td>
<td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
10K Files<o:p></o:p></div>
</td>
<td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 90.05pt;" valign="top" width="90"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
100K Files<o:p></o:p></div>
</td>
<td style="border-left: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
1M Files<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 82.3pt;" valign="top" width="82"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Time to
create the files<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 92.35pt;" valign="top" width="92"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
0.22s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
2.07s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 90.05pt;" valign="top" width="90"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
21.48s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
252.59s<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 82.3pt;" valign="top" width="82"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Time to
lookup 1K Files<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 92.35pt;" valign="top" width="92"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
0.039s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
0.14s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 90.05pt;" valign="top" width="90"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
0.30s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
0.34s<o:p></o:p></div>
</td>
</tr>
<tr>
<td style="border-top: none; border: solid windowtext 1.0pt; mso-border-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 82.3pt;" valign="top" width="82"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
Time to
enumerate all files in the directory<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 92.35pt;" valign="top" width="92"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
0.032s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
0.15s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 90.05pt;" valign="top" width="90"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
1.21s<o:p></o:p></div>
</td>
<td style="border-bottom: solid windowtext 1.0pt; border-left: none; border-right: solid windowtext 1.0pt; border-top: none; mso-border-alt: solid windowtext .5pt; mso-border-left-alt: solid windowtext .5pt; mso-border-top-alt: solid windowtext .5pt; padding: 0in 5.4pt 0in 5.4pt; width: 89.05pt;" valign="top" width="89"><div class="MsoNormal" style="margin-bottom: .0001pt; margin-bottom: 0in;">
19.98s<o:p></o:p></div>
</td>
</tr>
</tbody></table>
<!--EndFragment--></div>
<div>
<br /></div>
<div>
<br /></div>
<h2>
Conclusion</h2>
<div>
As you can see from the results HFS+ file system handled up to 1 million files extremely well. Total file creation and enumeration times increased roughly linear to the number of files. Looking up 1 thousand files took pretty much the same time for all directories.</div>
<div>
<br /></div>
<div>
Total time to enumerate 1 million files seems to take a slight divergence from other directories but this is probably about memory locality. My absurdly simple experiment code reads all file names in a single array and an array with 1 million elements is big enough to put a strain on that.</div>
<div>
<br /></div>
<div>
I did this experiment on HFS+ but I'm fairly sure all mainstream file systems like NTFS, EXT3, EXT4 etc. would produce similar results.</div>
<div>
<br /></div>
<h2>
Appendix A</h2>
<h3>
Why Do People Think File Systems Can't Handle Too Many Files in a Directory</h3>
<div>
I feel like this is a residue in collective software engineers' memory. In the early 90s file systems used to be much primitive compared to today. Most didn't even have journaling so you could lose file system metadata when system shutdown abruptly. But starting from mid 90s most major operating system platforms started having robust file systems that also scaled much better compared to their predecessors.</div>
<div>
<br /></div>
<h2>
Appendix B</h2>
<h3>
Why Does 'ls -l' Take Much Longer Than Plain 'ls'</h3>
ls -l returns bunch of file attributes among the file names where ls just returns the file names. File attributes are part of file data so they don't really concern the directory and it's not kept with directory metadata. So in effect doing ls -l requires looking up the files one by one and reading their attributes. This is why it's wildly slower then plain ls. <br />
<br />
<h2>
Appendix C</h2>
<h3>
Should I Rule Out Directory Sharding Forever</h3>
<div>
For 99.999% of software engineers, yes, at least for performance reasons. If you're building a file server that'll host millions of files from thousands of very active users, then a big maybe. Here's why: file systems keep in memory objects representing the entities on disk like files and directories which is called FCB (file control block) in Windows world. FCB is an object that's created per file on disk not for each open(handle) of that file. So the FCB for the very large directory can become a hot point under heavy load. Per FCB resources like locks will be under contention and can potentially hurt performance.</div>
<div>
<br /></div>
<div>
I'm personally skeptic even with this because per FCB resources will matter for file open time mostly. And this is only part of the I/O performance picture where other parts are read, write, flush, sysctl(ioctl) etc. times.<br />
<br />
<h2>
Appendix D</h2>
</div>
<h3>
Experiment Code</h3>
<div>
Sorry for the crudity of the code. For the different directory setups I manually modified parts of the code so you can't really use it out of box.</div>
<div>
<br /></div>
</div>
<div>
<div class="p1">
<span class="s1">#import </span><Foundation/Foundation.h></div>
<div class="p2">
<br /></div>
<div class="p1">
<span class="s1">#include </span><QuartzCore/QuartzCore.h></div>
<div class="p2">
<br /></div>
<div class="p2">
<br /></div>
<div class="p1">
<span class="s2">NSString</span><span class="s3"> *path = </span>@"your path here"<span class="s3">;</span></div>
<div class="p2">
<br /></div>
<div class="p3">
<span class="s4">void</span> lookup1KFiles(<span class="s4">void</span>)</div>
<div class="p3">
{</div>
<div class="p3">
<span class="s4">int</span> found = <span class="s5">0</span>;</div>
<div class="p4">
<span class="s3"> </span><span class="s4">double</span><span class="s3"> start = </span>CACurrentMediaTime<span class="s3">();</span></div>
<div class="p2">
</div>
<div class="p3">
<span class="s4">for</span> (<span class="s4">int</span> i = <span class="s5">0</span>; i < <span class="s5">1000</span>; i++) {</div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s4">if</span><span class="s3"> ([[</span><span class="s2">NSFileManager</span><span class="s3"> </span>defaultManager<span class="s3">] </span>fileExistsAtPath<span class="s3">:[</span><span class="s2">NSString</span><span class="s3"> </span>stringWithFormat<span class="s3">:</span><span class="s6">@"%@/%d.txt"</span><span class="s3">, </span><span class="s7">path</span><span class="s3">, i]</span></div>
<div class="p3">
<span class="s8">isDirectory</span>:<span class="s4">NO</span>]) {</div>
<div class="p3">
found++;</div>
<div class="p3">
}</div>
<div class="p3">
}</div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s4">double</span><span class="s3"> end = </span>CACurrentMediaTime<span class="s3">();</span></div>
<div class="p2">
</div>
<div class="p3">
<span class="s8">NSLog</span>(<span class="s6">@"lookup time %lf"</span>, end - start);</div>
<div class="p2">
</div>
<div class="p1">
<span class="s3"> </span><span class="s8">NSLog</span><span class="s3">( </span>@"Lookup found %d"<span class="s3">, found);</span></div>
<div class="p2">
</div>
<div class="p3">
}</div>
<div class="p2">
<br /></div>
<div class="p3">
<span class="s4">void</span> populate(<span class="s4">void</span>)</div>
<div class="p3">
{</div>
<div class="p4">
<span class="s3"> </span><span class="s2">NSData</span><span class="s3"> *data = [</span><span class="s2">NSData</span><span class="s3"> </span>dataWithBytes<span class="s3">:</span><span class="s6">"abcd"</span><span class="s3"> </span>length<span class="s3">:</span><span class="s5">4</span><span class="s3">];</span></div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s4">double</span><span class="s3"> start = </span>CACurrentMediaTime<span class="s3">();</span></div>
<div class="p2">
</div>
<div class="p5">
<span class="s3"> </span>//NSString *uuid = nil;</div>
<div class="p2">
</div>
<div class="p3">
<span class="s4">for</span> (<span class="s4">int</span> i=<span class="s5">0</span>; i < <span class="s5">1000</span>; i++) {</div>
<div class="p2">
</div>
<div class="p3">
<span class="s4">if</span> (i % <span class="s5">10000</span> == <span class="s5">0</span>) {</div>
<div class="p5">
<span class="s3"> </span>//uuid = [[NSUUID UUID] UUIDString];</div>
<div class="p3">
<span class="s8">NSLog</span>(<span class="s6">@"%d"</span>, i);</div>
<div class="p3">
}</div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> [data </span>writeToFile<span class="s3">:[</span><span class="s2">NSString</span><span class="s3"> </span>stringWithFormat<span class="s3">:</span><span class="s6">@"%@/%d.txt"</span><span class="s3">, </span><span class="s7">path</span><span class="s3">, i]</span></div>
<div class="p3">
<span class="s8">atomically</span>:<span class="s4">NO</span>];</div>
<div class="p2">
</div>
<div class="p2">
</div>
<div class="p3">
}</div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s4">double</span><span class="s3"> end = </span>CACurrentMediaTime<span class="s3">();</span></div>
<div class="p2">
</div>
<div class="p1">
<span class="s3"> </span><span class="s8">NSLog</span><span class="s3">(</span>@"populate time %lf"<span class="s3">, end - start);</span></div>
<div class="p3">
}</div>
<div class="p2">
<br /></div>
<div class="p3">
<span class="s4">void</span> enumerate(<span class="s4">void</span>)</div>
<div class="p3">
{</div>
<div class="p2">
</div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s2">NSDirectoryEnumerationOptions</span><span class="s3"> options = </span>NSDirectoryEnumerationSkipsSubdirectoryDescendants<span class="s3"> |</span></div>
<div class="p4">
<span class="s3"> </span>NSDirectoryEnumerationSkipsPackageDescendants<span class="s3"> |</span></div>
<div class="p4">
<span class="s3"> </span>NSDirectoryEnumerationSkipsHiddenFiles<span class="s3">;</span></div>
<div class="p2">
</div>
<div class="p3">
<span class="s2">NSURL</span> *url = [<span class="s2">NSURL</span> <span class="s8">fileURLWithPath</span>:<span class="s7">path</span>];</div>
<div class="p2">
</div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s4">double</span><span class="s3"> start = </span>CACurrentMediaTime<span class="s3">();</span></div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s4">__unused</span><span class="s3"> </span><span class="s2">NSArray</span><span class="s3"> *array = [[</span><span class="s2">NSFileManager</span><span class="s3"> </span>defaultManager<span class="s3">] </span>contentsOfDirectoryAtPath<span class="s3">:</span><span class="s7">path</span><span class="s3"> </span>error<span class="s3">:</span><span class="s4">nil</span><span class="s3">];</span></div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> [[</span><span class="s2">NSFileManager</span><span class="s3"> </span>defaultManager<span class="s3">] </span>contentsOfDirectoryAtURL<span class="s3">:url</span></div>
<div class="p4">
<span class="s3"> </span>includingPropertiesForKeys<span class="s3">:[[</span><span class="s2">NSArray</span><span class="s3"> </span>alloc<span class="s3">] </span>initWithObjects<span class="s3">:</span><span class="s2">NSURLIsRegularFileKey</span><span class="s3">, </span><span class="s4">nil</span><span class="s3">]</span></div>
<div class="p3">
<span class="s8">options</span>:options</div>
<div class="p3">
<span class="s8">error</span>:<span class="s4">nil</span>];</div>
<div class="p2">
</div>
<div class="p4">
<span class="s3"> </span><span class="s4">double</span><span class="s3"> end = </span>CACurrentMediaTime<span class="s3">();</span></div>
<div class="p2">
</div>
<div class="p1">
<span class="s3"> </span><span class="s8">NSLog</span><span class="s3">(</span>@"enumerate time %lf"<span class="s3">, end - start);</span></div>
<div class="p3">
<span class="s8">NSLog</span>(<span class="s6">@"count %d"</span>, (<span class="s4">int</span>)[array <span class="s8">count</span>]);</div>
<div class="p2">
</div>
<div class="p6">
<span class="s3"> </span>return<span class="s3">;</span></div>
<div class="p3">
}</div>
</div>
Taha Bekir Erenhttp://www.blogger.com/profile/13762644176648451720noreply@blogger.com0