Abusing nested paging for non-virtualized performance
Post date: Sep 18, 2012 10:53:43 PM
It occurs to me, having just had a quick scan of the literature, that some of the hardware virtualization extensions that have been introduced in the past several years may also be leveraged to provide improved performance of non-virtualized operating systems.
The feature in particular that's caught my eye today is nested paging; essentially the introduction of an additional layer of hardware indirection between a virtual memory address and it's destination. While this doesn't have the fine grained control I've been lusting after for many years what it does provide is a wonderfully high performance page level mechanic for the following:
thread local storage
function pointer tables
The first is a much more appealing proposition in many ways as it's a much better mapping for the semantics of nested paging, however it's the second that would likely offer more of a performance benefit, not because it's inherently slower than a TLS access, but because it's so much more common. It does however require a lot more work to implement and is likely to be more wasteful of memory, but then that's almost always the tradeoff you make for performance.
Obviously this function would in many ways be better implemented in the operating system, however the common address space generally sits as the core concept of a thread, and as such there is significant resistance to allowing per thread page mappings both due to the technical changes required and a mental reluctance to turn the process/thread binary set into a spectrum, maybe due to a reluctance to make the execution model more complex or an attachment to existing models.