Tuesday, April 29, 2014

Storing Image Data in Transactional Databases

Introduction

You might have noticed that a lot of websites serve their images as links to regular files, instead of storing them as a field in database rows. Generally it's not recommended to store image files directly in the database (e.g Mysql) but people don't say why. I'll talk about two reasons for not doing so.

Serving Media Through Content Distribution Networks

For websites with users all around the world and high traffic it makes a lot of sense to store media content at content distribution networks (CDN). Media content is generally immutable once it's created and they consume a lot of bandwidth to download. By having the content stored as plain files on a CDN the website makes use of CDN's bandwidth and global points of presence (POP). Having several POPs around the globe also makes roundtrip time to reach the data much smaller which results in more prompt downloads.

Keeping media files separate form the database fits better to working with CDN strategy. You can actually still keep the primary copy of the data in your database and put the corresponding file to a CDN but to me that's a weird design. Also this brings us to our next point.

Cost of Having Transaction Properties

Almost all production quality database engines provide the so called ACID transaction properties. A(atomicity) and D(durability) of transactions require a design that employs 'transaction logging'. There can be numerous ways to implement transaction logging but two popular ways are 'write ahead logging' and 'undo/redo logging'. I won't go into details of these techniques but in a nutshell they involve writing the database mutations to transaction logs AND to the actual database image itself. Depending on the logging technique used this means image data will be written to disk 2 or 3 times at least. A single image can occupy space as big as thousands of regular database rows would occupy (imagine tables that have just a few integer and string columns in their definitions) so on a busy database server incurring this extra write(s) is a non trivial performance penalty. For most large scale websites this is unacceptable.

Conclusion

I've picked image files as a case study because it's a fairly common case but this applies to all BLOB (binary large object) types. Purpose of this article isn't to say 'thou shall not store BLOBs in databases'. I'm simply saying know the implications if you want to do so.

No comments:

Post a Comment