Friday, May 16, 2014

How Do File Recovery Tools Work?

Introduction

If you are a software person you must have wondered how file recovery tools are able to find and recover deleted files on disk. Sure, it's easy to guess that sectors used by the file are potentially untouched and in that case the tool recovers the file but the problem is each sector on disk is simply and array of bytes. How does the recovery tool know what file it used to belong to when it looks at a sector? Here's how.

The 'Master' File

Any serious file system has a 'master' file which keeps a list of all files that exist on a volume. This is called the master file table in NTFS and the catalog file in HFS+. This is a special file and file system knows how to locate this file in a predefined way when it mounts the volume. For example HFS+ keeps the location of the catalog file in volume header which itself is at a fixed location.

You can think of the master file as a giant table which has an entry for each file on the volume. These entries keep certain important information about the file, one of which is the disk sectors occupied by the file. In fact when you read from a file this is how the file system knows from which disk sectors to read.

Adding/Removing a File

It's pretty easy to see that master file can't be a full table where every entry is occupied by a current file record and when a new file is added we just add a new entry at the end. This is not impossible, but it's infeasible. Keeping such a compact table would be very costly and it'd kill performance.

When a file is deleted from the volume, the file system does as little as possible work on the master file to have high performance. Most of the times it simply marks the corresponding entry in the master file as 'deleted' so in the future it can know file pointed by that entry was removed.

When a new file is added it has to have a corresponding entry in the master file. To do this file system can reuse the entry for a previously deleted file, or it can allocate new space on disk for a brand new entry. Obviously file system has to do a lot of house keeping to be able to do these operations efficiently.

Finding Deleted Files

Conceptually finding the deleted files is pretty straightforward. You just need to walk over all entries in the master file and see which entries are marked as deleted. Obviously a problem is if an entry is reused after the file referred by it is deleted, it's not possible to recover that file anymore as we simply don't know about that file. So all file entries in the master file that are marked as deleted are potentially recoverable file.

Recovering the Data For a File

Now that we have a list of deleted files, the problem is how to find sectors that were making up the deleted files. We'll use the entry in the master file for this. A key point is when a file system marks the file entry as deleted, that's pretty much all it does. Sector information for that file will still be there. This is great but we're far from done.

Let's say a file occupied two sectors on disk and we know which sectors they are. Now the question is how do we know whether these sectors aren't being used by another file now. If that's the case we won't be able to recover the file. To determine this we need to use the 'sector map' which is a data structure on disk used by the file system to keep track of used/unused sectors. If both sectors of the file are showing up as unused then we may be able to recover the file.

Complications Around Recovery

We still can't be certain because if we're unlucky those sectors might have been used by some other file at some point, and then got freed. In this case we'd think the sectors still contain data from the deleted file we're looking for but in fact they would contain data from another deleted file. There isn't a bullet proof way to know whether the data we recovered is truly intact. One way to extract further information would be to see if deleted files have any colliding sectors. If there are N collisions for a specific sector, at least N - 1 of those files have lost their data in that sector. There are still no guarantees around this because master file entries of some files using that sector might have been overwritten by entries of newer files, which means we have no idea those files ever existed.

Also I should note that we may only recover certain sectors of a file but not the others. If some sectors of the deleted file are in use by other files, it's certain that we lost at least part of the file's data. If all are unused currently we have some hope to recover the file in its entirety but we can't be sure for the reasons I explained above. If the file has some internal verification information (like MD5 hash of the content) we can determine if the data was really recovered entirely or not.


No comments:

Post a Comment