Saturday, May 31, 2014

Volatile, Memory Barriers, and Load/Store Reordering

Introduction

Volatile is a memory access keyword in several languages which is much more complicated and confusing in practice then what it appears to be. There are pieces of information about volatile and related concepts everywhere but it's hard to build a complete understanding. Let's first understand the underlying concepts.

Some Memory Concepts

Load/Store

This is a more formal terminology for memory read/write. Load means brining something from main memory (ram) to processor registers, and store is the same thing in other direction.

Load/Store Omission by the Compiler

Every memory read/write in a program source code doesn't have to result in a processor instruction that does an actual load/store. Compilers can decide to cache values instead of actually loading them from memory when they think it doesn't change the results of the program. This can result in buggy programs in multi threaded programs and in programs with 'unusual memory'.

'Unusual Memory'

Unusual memory refers to a memory location that can change in value without compiler having anyway to know it. Hardware mapped memory is such a case. If a sensor hardware is updating a memory location on its own, compiler can't know about this and incorrectly omit stores from that memory location assuming the value will never change.

Load/Store Reordering by the Compiler

Compilers can reorder loads and stores in a program for optimization reasons, if the reordering won't change program results. This logic can breakdown in multithreaded programs and cause buggy behavior in the program.

Out of Order Execution and Cache Coherency

Similar to compilers reordering loads/stores, CPUs can do the same via 'out of order execution'. This means that a CPU might execute instructions in a slightly different order then they appear in computer memory which includes instructions getting reordered and interleaved execution of two or more instructions. Interleaved execution means instructions aren't strictly ordered in their execution i.e one instruction might begin its execution before the 'already in progress' instruction is 'finished'.

Also, when a CPU stores a value in a memory location, all CPUs may not see the same stores in the same order because of cache coherency issues i.e each processor having its own cache of the main memory.

 These issues create buggy behavior in multithreaded programs. To prevent this programmers can use memory barriers, which are special instructions telling the CPU not to do out of order execution across the barrier instruction. Memory barriers also ensure cache consistency which means we won't get hurt by cache coherency problems.

Atomic Load/Store

Loading OR storing a primitive value from main memory may or may not be atomic. Don't confuse this with 'read, modify, write' atomicity which can only be achieved either by proper locking or special 'interlocked' instructions. On certain hardware platforms loading an integer may not be atomic i.e some bytes of the integer may be modified while we're reading the integer which means we'll end up with a value of the integer that was never the value of it any point in time. Pointer aligned word sized (32bits) load/store operations on X86 architecture are atomic.

Volatile in C/C++ vs. Java and C#

Volatile has different meanings in different programming languages. Let's see these differences.

C/C++

Volatile provides:
  • When a volatile variable is loaded/stored, it will actually be loaded/stored and the load/store won't be cached or omitted. Code reading volatile variable will always see the latest value.
  • Loads/Stores of volatile variables won't be reordered i.e they'll appear in the same order they appear in the source code.
Volatile doesn't provide:
  • Atomic load/store of the volatile variable. For example if the volatile variable is bigger than word size or it's not pointer aligned on an X86 h/w load/store operations on the variable may not be atomic. Again this is different than read, modify, write atomicity which is an altogether different problem.
  • Guarantee of not reordering non volatile load/store operations around the volatile loads/stores. Compiler can place a non volatile store operation after a volatile store operation even if non volatile store happens before in the source code. This is actually called the 'happens before' relationship, and volatile keyword doesn't provide that in C/C++.
  • Memory barriers around volatile loads/stores. CPU can still reorder instructions around load/store of volatile variables and there can still be CPU cache inconsistencies.
Because of these issues volatile is only recommended for memory mapped I/O (i.e unusual memory) in C/C++. If our only concern is seeing the latest value set by a h/w device or some other process that we're sharing memory with, volatile does the job.

Java/C#

Volatile provides:
  • Guarantee that load/store operations won't be cached/omitted. Code reading volatile variable will always see the latest value.
  • Guarantee that load/store operations of volatile variables won't be reordered, including the non volatile loads/stores around them. This guarantee came to Java starting with Java 5 only. We'll see why this is important in the case study.
  • Atomic load/store of the volatile variable i.e loads/stores to a volatile variable won't be interleaved.
  • Memory barriers. CPU won't do out of order execution of volatile loads/stores (and other loads/stores around them) and it won't cause cache inconsistencies.
Volatile doesn't provide:
  • A solution to all your multi threading problems. There are countless scenarios where you'd need proper locking via mutexes etc. or interlocked API that operate on data atomically.

A Case Study

Let's look at a small piece of pseudocode which will provide us with the basis for discussion.

MyType my_global;
volatile bool f_my_global_ready = false;

void thread1()
{
  my_global = setup_my_object();
f_my_global_ready = true;
}

void thread2()
{
while (!f_my_global_ready);
use_my_global_object();
}

Code is fairly simple. One thread initializes a global object and sets the flag indicating that it's ready, the other busy waits (polls) until the flag is set and then uses the object prepared by the other thread.

I should mention that while loads/stores of f_my_global_ready are volatile load/store operations, initialization and consumption of my_global consists of non volatile loads/stores. This will be important in terms of reordering guarantees.

Now let's think about how using volatile keyword in different languages would result for the above piece of code.

Analysis of the Case Study

C/C++

The code specified in case study won't work as expected in C/C++ and there are several reasons.

If the compiler optimizes the read of f_my_global_ready in Thread2, we may never see the flag getting set and the while loop will keep going forever. To prevent compiler doing this optimization we can declare f_my_global_ready as volatile.

However, even if we make the flag volatile there are other issues making the code buggy. First there's the atomicity of the load. In this specific example we're loading a bool which is a single byte so I doubt this will be a problem on any h/w platform. But if we were working with a multi byte primitive such as 32 bit integer we could be vulnerable to race conditions, depending on the h/w platform and depending on how the integer is aligned in memory. Volatile doesn't fix these issues in C/C++.

Second, compiler won't optimize or reorder loads/stores to the volatile flag but it can reorder other 'non volatile' load and stores around it. This means that part of the instructions setting up my_global might get placed by the compiler after the setting of the flag. This will result in Thread2 breaking the while loop and using my_global before my_global was fully initialized. This would be a serious bug.

Third, similar to the compiler reordering, the CPU might do out of order execution which again will result in Thread2 accessing a not fully initialized my_global. Cache coherency issues can also result in the same problem as other CPUs might see loads/stores in a different order.

Java/C#

For Java (5 and later) and C# code in the case study will work just fine. Thanks to the guarantees given by volatile in these languages everything works as expected. Let's go over these one more time:

  • After the flag is set, Thread2 will see this new value the next time it polls the flag because it's declared as volatile and compiler won't cache the value for the flag.
  • Load/store of the variable is atomic (again, not read, modify, and write) so we're not concerned with reading an interleaved value accidentally.
  • Initialization of my_global will be done strictly before the f_my_global_ready flag is set because compiler doesn't reorder non volatile load/store operations across volatile load/store operations. The volatile load/store acts as a strict barrier for all other loads/stores.
  • CPU won't do out of order execution or have cache inconsistency for the volatile loads/stores thanks to the compiler inserted memory barrier before the volatile loads/stores.

Solution to the C/C++ Case

Volatile doesn't help with the code we've been working on in C/C++, so what other options do we have?

  • If we're using C++ 11, we can use the std::atomic<T> types which act in the same way as volatile in C#/Java.
  • We can use the Interlocked* API on Windows or similar API present on other platforms.
  • Use proper locking techniques like spinlocks, mutexes, semaphores etc.

Further Reading 




Friday, May 16, 2014

How Do File Recovery Tools Work?

Introduction

If you are a software person you must have wondered how file recovery tools are able to find and recover deleted files on disk. Sure, it's easy to guess that sectors used by the file are potentially untouched and in that case the tool recovers the file but the problem is each sector on disk is simply and array of bytes. How does the recovery tool know what file it used to belong to when it looks at a sector? Here's how.

The 'Master' File

Any serious file system has a 'master' file which keeps a list of all files that exist on a volume. This is called the master file table in NTFS and the catalog file in HFS+. This is a special file and file system knows how to locate this file in a predefined way when it mounts the volume. For example HFS+ keeps the location of the catalog file in volume header which itself is at a fixed location.

You can think of the master file as a giant table which has an entry for each file on the volume. These entries keep certain important information about the file, one of which is the disk sectors occupied by the file. In fact when you read from a file this is how the file system knows from which disk sectors to read.

Adding/Removing a File

It's pretty easy to see that master file can't be a full table where every entry is occupied by a current file record and when a new file is added we just add a new entry at the end. This is not impossible, but it's infeasible. Keeping such a compact table would be very costly and it'd kill performance.

When a file is deleted from the volume, the file system does as little as possible work on the master file to have high performance. Most of the times it simply marks the corresponding entry in the master file as 'deleted' so in the future it can know file pointed by that entry was removed.

When a new file is added it has to have a corresponding entry in the master file. To do this file system can reuse the entry for a previously deleted file, or it can allocate new space on disk for a brand new entry. Obviously file system has to do a lot of house keeping to be able to do these operations efficiently.

Finding Deleted Files

Conceptually finding the deleted files is pretty straightforward. You just need to walk over all entries in the master file and see which entries are marked as deleted. Obviously a problem is if an entry is reused after the file referred by it is deleted, it's not possible to recover that file anymore as we simply don't know about that file. So all file entries in the master file that are marked as deleted are potentially recoverable file.

Recovering the Data For a File

Now that we have a list of deleted files, the problem is how to find sectors that were making up the deleted files. We'll use the entry in the master file for this. A key point is when a file system marks the file entry as deleted, that's pretty much all it does. Sector information for that file will still be there. This is great but we're far from done.

Let's say a file occupied two sectors on disk and we know which sectors they are. Now the question is how do we know whether these sectors aren't being used by another file now. If that's the case we won't be able to recover the file. To determine this we need to use the 'sector map' which is a data structure on disk used by the file system to keep track of used/unused sectors. If both sectors of the file are showing up as unused then we may be able to recover the file.

Complications Around Recovery

We still can't be certain because if we're unlucky those sectors might have been used by some other file at some point, and then got freed. In this case we'd think the sectors still contain data from the deleted file we're looking for but in fact they would contain data from another deleted file. There isn't a bullet proof way to know whether the data we recovered is truly intact. One way to extract further information would be to see if deleted files have any colliding sectors. If there are N collisions for a specific sector, at least N - 1 of those files have lost their data in that sector. There are still no guarantees around this because master file entries of some files using that sector might have been overwritten by entries of newer files, which means we have no idea those files ever existed.

Also I should note that we may only recover certain sectors of a file but not the others. If some sectors of the deleted file are in use by other files, it's certain that we lost at least part of the file's data. If all are unused currently we have some hope to recover the file in its entirety but we can't be sure for the reasons I explained above. If the file has some internal verification information (like MD5 hash of the content) we can determine if the data was really recovered entirely or not.