Thursday, June 26, 2014

What Constitutes an OS Kernel?


Introduction

The question in the title might seem very trivial to most. But you'd be surprised if you knew the extent of kernel being a misunderstood concept. I have heard following sentences several times in my life and to a person familiar with OS kernels they are really absurd.

  • They changed the kernel in this version
  • They used the same kernel in this version
  • Until Windows 8 they have been using the same kernel since Windows 98
    • This is wrong in so many levels, such that I have to specifically talk about this later in this post.
I'll stick to Windows NT as an example here but mainstream OSes like Linux and Mac OS X are similar in this perspective.

Kernel Mode

Before diving deep into discussion we should define the 'kernel mode' concept. Processors have different privilege levels. At the most privileged level anything is possible. At lesser privilege levels certain actions are restricted. Although most modern processors support several (e.g 4) privilege levels (rings) mainstream operating systems like Windows and Linux only make user 2 of them for practical reasons. These two privilege levels are called user mode and kernel mode.

User mode is what all user programs run in. In this mode processor can't execute certain instructions, can't access certain registers etc. This is to keep user mode programs from damaging and destabilizing the entire system. Kernel mode is the opposite and everything is allowed.

Potential Definitions of the 'Kernel'

Most user mode programmers think of the kernel as an atomic indivisible piece of software that magically does everything user mode programs need, from file system management to hardware bus communication. Funny enough if you stick to the stricter definitions of a kernel these two jobs fall out of the kernel boundary.

First of all you should understand that there isn't even a single definition of the kernel. When people say kernel they might be referring to many significantly different things. Please take a moment to take a look at the architecture diagram below for Windows NT. As you can see the boxes below the 'kernel' line are numerous and complicated in their interactions.


Now let's start with potential definitions of the kernel from most narrow to most inclusive.

Core Kernel

'Core kernel' would refer to pieces of software that run in kernel mode, written by the OS vendor (e.g Microsoft, Apple) and contains only pieces that are minimal in the services they provide. These services are mainly:
  • Synchronization primitives
  • Thread dispatcher, or simply the dispatcher
  • Interrupt dispatcher/handler
This definition of the kernel somewhat incorrectly corresponds to the microkernel in the above diagram. kernel components that seem very core and low level as virtual memory manager are left out of this definition. This may seem really weird but from an architectural standpoint it is correct as this core part of the kernel doesn't depend on the rest of the kernel in what they do.

Core Kernel + Executive Components

You can view this definition of the kernel as the 'static kernel'. By static I'm referring to the fact that components that are part of this definition aren't dynamically loaded, they are always present and they can't be replaced. For example kernel wouldn't dynamically load a virtual memory manager or unload it. Same for I/O manager. These 'static' parts of the kernel are also written by the OS vendor company. A sample of these static kernel components are:
  • Virtual memory manager
  • I/O manager
  • Security subsystem
  • Cache manager
Note that crucial parts of the system such as file systems or TCP driver (yes it's a driver in Windows) aren't included in this definition as they are dynamic and can be loaded/unloaded.

Core Kernel + Executive + Drivers Shipped by the Vendor

This definition of the kernel extends the previous one by including the drivers that are developed by the vendor. I'm not saying that come off the box because some drivers that come off the box are still 3rd party software. These drivers are considered crucial parts of the OS yet they're architecturally dynamic and different than the rest of the kernel. I think a great example of these drivers is the NTFS driver. You can't imagine a Windows kernel without it yet technically it's an outer part of the kernel and it's dynamic by nature.

Anything That Runs in Kernel Mode Including 3rd Party Drivers

This is also a very legitimate definition of the kernel. Running in the kernel mode gives unlimited access to the processor and the hardware and presents similar challenges to both vendor shipped code and 3rd party code. These are challenges and concepts that simply don't exist in the user mode. A few examples are:
  • More complicated synchronization primitives (like fast mutexes and spinlocks. you might think you used spinlocks in user mode but they're substantially different)
  • Responding to interrupts
  • Dealing with different processor interrupt request levels
  • Being able to use non-paged memory opposed to all memory being paged. This is not just a matter of ability but is sometimes a requirement
Most importantly any crash in kernel mode drivers is as fatal as a crash in the core kernel.

Wrap-up

So going back to where we started, when you hear someone talking about kernel changing or not changing now it should be clear why that discussion doesn't make much sense. First of all, what do you mean by kernel?

With every OS release there are definitely some changes to what can be defined as kernel at any level. 

  • OS vendors ship new drivers all the time (like the reFS file system driver that shipped with Windows Server 2012)
  • They extend existing drivers all the time (e.q TCP driver adding support for IPv6)
  • There are new APIs added to executive level components such as I/O manager, object manager
  • Even core kernels change with every release, sometimes in big ways sometimes in small ways. For example Microsoft made the dispatcher lock more granular in Windows7 which resulted in a better multi processor support with less lock contention
My impression is when people see that their existing drivers don't work in a new release of the OS they think the kernel has changed. Most of the time this is simply because of a breaking change in a kernel mode API, most likely in the I/O manager as kernel drivers interact with it extensively. This tells very little about the over all changes made in the kernel mode components.

Appendix - Windows 98 to Windows 8

This one is really funny. Windows 98 and Windows 8 don't even belong to the same family of OSes. 

Until Windows XP Microsoft built two different families of OSes, Windows and Windows NT. Windows was a descendant of DOS and it was an inferior OS in many ways. However Windows NT was an OS designed by David Cutler later in Microsoft and it was a much modern and robust architecture compared to Windows. These two OSes coexisted until 2000. Following Windows 95 and Windows 98, the Windows line had its final release in Windows ME. Following the Windows NT 4.0, Windows 2000, Windows XP line Windows NT is still alive and the latest release has been Windows 8.1 in 2013.

So saying Windows had the same kernel since Windows 98 until Windows 8 doesn't make any sense.