Posted by Mateusz Jurczyk of Google Project Zero
Patch diffing is a common technique of comparing two binary builds of the same code – a known-vulnerable one and one containing a security fix. It is often used to determine the technical details behind ambiguously-worded bulletins, and to establish the root causes, attack vectors and potential variants of the vulnerabilities in question. The approach has attracted plenty of research  and tooling development  over the years, and has been shown to be useful for identifying so-called 1-day bugs, which can be exploited against users who are slow to adopt latest security patches. Overall, the risk of post-patch vulnerability exploitation is inevitable for software which can be freely reverse-engineered, and is thus accepted as a natural part of the ecosystem.
In a similar vein, binary diffing can be utilized to discover discrepancies between two or more versions of a single product, if they share the same core code and coexist on the market, but are serviced independently by the vendor. One example of such software is the Windows operating system, which currently has three versions under active support – Windows 7, 8 and 10 . While Windows 7 still has a nearly 50% share on the desktop market at the time of this writing , Microsoft is known for introducing a number of structural security improvements and sometimes even ordinary bugfixes only to the most recent Windows platform. This creates a false sense of security for users of the older systems, and leaves them vulnerable to software flaws which can be detected merely by spotting subtle changes in the corresponding code in different versions of Windows.
In this blog post, we will show how a very simple form of binary diffing was effectively used to find instances of 0-day uninitialized kernel memory disclosure to user-mode programs. Bugs of this kind can be a useful link in local privilege escalation exploit chains (e.g. to bypass kernel ASLR), or just plainly expose sensitive data stored in the kernel address space. If you're not familiar with the bug class, we recommend checking the slides of the Bochspwn Reloaded talk given at the REcon and Black Hat USA conferences this year as a prior reading .
Chasing memset calls
Most kernel information disclosures are caused by leaving parts of large memory regions uninitialized before copying them to user-mode; be they structures, unions, arrays or some combination of these constructs. This typically means that the kernel provides a ring-3 program with more output data than there is relevant information, for a number of possible reasons: compiler-inserted padding holes, unused structure/union fields, large fixed-sized arrays used for variable-length content etc. In the end, these bugs are rarely fixed by switching to smaller buffers – more often than not, the original behavior is preserved, with the addition of one extra memset function call which pre-initializes the output memory area so it doesn't contain any leftover stack/heap data. This makes such patches very easy to recognize during reverse engineering.
When filing issue #1267 in the Project Zero bug tracker (Windows Kernel pool memory disclosure in win32k!NtGdiGetGlyphOutline, found by Bochspwn) and performing some cursory analysis, I realized that the bug was only present in Windows 7 and 8, while it had been internally fixed by Microsoft in Windows 10. The figure below shows the obvious difference between the vulnerable and fixed forms of the code, as decompiled by the Hex-Rays plugin and diffed by Diaphora:
Figure 1. A crucial difference in the implementation of win32k!NtGdiGetGlyphOutline in Windows 7 and 10
Considering how evident the patch was in Windows 10 (a completely new memset call in a top-level syscall handler), I suspected there could be other similar issues lurking in the older kernels that have been silently fixed by Microsoft in the more recent ones. To verify this, I decided to compare the number of memset calls in all top-level syscall handlers (i.e. functions starting with the Nt prefix, implemented by both the core kernel and graphical subsystem) between Windows 7 and 10, and later between Windows 8.1 and 10. Since in principle this was a very simple analysis, an adequately simple approach could be used to get sufficient results, which is why I decided to perform the diffing against code listings generated by the IDA Pro disassembler.
When doing so, I quickly found out that each memory zeroing operation found in the kernel is compiled in one of three ways: with a direct call to the memset function, its inlined form implemented with the rep stosd x86 instruction, or an unfolded series of mov x86 instructions:
Figure 2. A direct memset function call to reset memory in nt!NtCreateJobObject (Windows 7)
Figure 3. Inlined memset code used to reset memory in nt!NtRequestPort (Windows 7)
Figure 4. A series of mov instructions used to reset memory in win32k!NtUserRealInternalGetMessage (Windows 8.1)
The two most common cases (memset calls and rep stosd) are both decompiled to regular invocations of memset() by the Hex-Rays decompiler:
Figures 5 and 6. A regular memset call is indistinguishable from an inlined rep movsd construct in the Hex-Rays view
Unfortunately, a sequence of mov's with a zeroed-out register as the source operand is not recognized by Hex-Rays as a memset yet, but the number of such occurrences is relatively low, and hence can be neglected until we manually deal with any resulting false-positives later in the process. In the end, we decided to perform the diffing using decompiled .c files instead of regular assembly, just to make our life a bit easier.
A complete list of steps we followed to arrive at the final outcome is shown below. We repeated them twice, first for Windows 7/10 and then for Windows 8.1/10:
1、 Decompiled ntkrnlpa.exe and win32k.sys from Windows 7 and 8.1 to their .c counterparts with Hex-Rays, and did the same with ntoskrnl.exe, tm.sys, win32kbase.sys and win32kfull.sys from Windows 10.
2、Extracted a list of kernel functions containing memset references (taking their quantity into account too), and sorted them alphabetically.
3、Performed a regular textual diff against the two lists, and chose the functions which had more memset references on Windows 10.
4、Filtered the output of the previous step against the list of functions present in the older kernels (7 or 8.1, again pulled from IDA Pro), to make sure that we didn't include routines which were only introduced in the latest system.
In numbers, we ended up with the following results:
Table 1. Number of old functions with new memset usage in Windows 10, relative to previous system editions
Quite intuitively, the Windows 7/10 comparison yielded more differences than the Windows 8.1/10 one, as the system progressively evolved from one version to the next. It's also interesting to see that the graphical subsystem had fewer changes detected in general, but more than the core kernel specifically in the syscall handlers. Once we knew the candidates, we manually investigated each of them in detail, discovering two new vulnerabilities in the win32k!NtGdiGetFontResourceInfoInternalW and win32k!NtGdiEngCreatePalette system services. Both of them were addressed in the September Patch Tuesday, and since they have some unique characteristics, we will discuss each of them in the subsequent sections.
The inconsistent memset which gave away the existence of the bug is as follows:
Figure 8. A new memset added in win32k!NtGdiGetFontResourceInfoInternalW in Windows 10
This was a stack-based kernel memory disclosure of about 0x5c (92) bytes. The structure of the function follows a common optimization scheme used in Windows, where a local buffer located on the stack is used for short syscall outputs, and the pool allocator is only invoked for larger ones. The relevant snippet of pseudocode is shown below:
Figure 9. Optimized memory usage found in the syscall handler
It's interesting to note that even in the vulnerable form of the routine, memory disclosure was only possible when the first (stack) branch was taken, and thus only for requested buffer sizes of up to 0x5c bytes. That's because the dynamic PALLOCMEM pool allocator does zero out the requested memory before returning it to the caller:
Figure 10. PALLOCMEM always resets allocated memory
Furthermore, the issue is also a great example of how another peculiar behavior in interacting with user-mode may contribute to the introduction of a security flaw (see slides 32-33 of the Bochspwn Reloaded deck). The code pattern at fault is as follows:
1、 Allocate a temporary output buffer based on a user-specified size (dubbed a4 in this case), as discussed above.
2、Have the requested information written to the kernel buffer by calling an internal win32k!GetFontResourceInfoInternalW function.
3、Write the contents of the entire temporary buffer back to ring-3, regardless of how much data was actually filled out by win32k!GetFontResourceInfoInternalW.
Here, the vulnerable win32k!NtGdiGetFontResourceInfoInternalW handler actually "knows" the length of meaningful data (it is even passed back to the user-mode caller through the 5th syscall parameter), but it still decides to copy the full amount of memory requested by the client, even though it is completely unnecessary for the correct functioning of the syscall:
Figure 11. There are v10 output bytes, but the function copies the full a4 buffer size.
The combination of a lack of buffer pre-initialization and allowing the copying of redundant bytes is what makes this an exploitable security bug. In the proof-of-concept program, we used an undocumented information class 5, which only writes to the first four bytes of the output buffer, leaving the remaining 88 uninitialized and ready to be disclosed to the attacker.
In this case, the vulnerability was fixed in Windows 8 by introducing the following memset into the syscall handler, while still leaving Windows 7 exposed:
Figure 12. A new memset added in win32k!NtGdiEngCreatePalette in Windows 8
The system call in question is responsible for creating a kernel GDI palette object consisting of N 4-byte color entries, for a user-controlled N. Again, a memory usage optimization is employed by the implementation – if N is less or equal to 256 (1024 bytes in total), these items are read from user-mode to a kernel stack buffer using win32k!bSafeReadBits; otherwise, they are just locked in ring-3 memory by calling win32k!bSecureBits. As you can guess, the memory region with the extra memset applied to it is the local buffer used to temporarily store a list of user-defined RGB colors, and it is later passed to win32k!EngCreatePalette to actually create the palette object. The question is, how do we have the buffer remain uninitialized but still passed for the creation of a non-empty palette? The answer lies in the implementation of the win32k!bSafeReadBits routine:
Figure 13. Function body of win32k!bSafeReadBits
As you can see in the decompiled listing above, the function completes successfully without performing any actual work, if either the source or destination pointer is NULL. Here, the source address comes directly from the syscall's 3rd argument, which doesn't undergo any prior sanitization. This means that we can make the syscall think it has successfully captured an array of up to 256 elements from user-mode, while in reality the stack buffer isn't written to at all. This is achieved with the following system call invocation in our proof-of-concept program:
HPALETTE hpal = (HPALETTE)SystemCall32(__NR_NtGdiEngCreatePalette, PAL_INDEXED, 256, NULL, 0.0f, 0.0f, 0.0f);
Once the syscall returns, we receive a handle to the palette which internally stores the leaked stack memory. In order to read it back to our program, one more call to the GetPaletteEntries API is needed. To reiterate the severity of the bug, its exploitation allows an attacker to disclose an entire 1 kB of uninitialized kernel stack memory, which is a very powerful primitive to have in one's arsenal.
In addition to the memory disclosure itself, other interesting quirks can be observed in the nearby code area. If you look closely at the code of win32k!NtGdiEngCreatePalette in Windows 8.1 and 10, you will spot an interesting disparity between them: the stack array is fully reset in both cases, but it's achieved in different ways. On Windows 8.1, the function "manually” sets the first DWORD to 0 and then calls memset() on the remaining 0x3FC bytes, while Windows 10 just plainly memsets the whole 0x400-byte area. The reason for this is quite unclear, and even though the end result is the same, the discrepancy provokes the idea that not just the existence of memset calls can be compared across Windows versions, but also possibly the size operands of those calls.
Figure 14. Different code constructs used to zero out a 256-item array on Windows 8.1 and 10
On a last related note, the win32k!NtGdiEngCreatePalette syscall may be also quite useful for stack spraying purposes during kernel exploitation, as it allows programs to easily write 1024 controlled bytes to a continuous area of the stack. While the buffer size is smaller than what e.g. nt!NtMapUserPhysicalPages has to offer, the buffer itself ends at a higher offset relative to the stack frame of the top-level syscall handler, which can make an important difference in certain scenarios.
The aim of this blog post was to illustrate that security-relevant differences in concurrently supported branches of a single product may be used by malicious actors to pinpoint significant weaknesses or just regular bugs in the more dated versions of said software. Not only does it leave some customers exposed to attacks, but it also visibly reveals what the attack vectors are, which works directly against user security. This is especially true for bug classes with obvious fixes, such as kernel memory disclosure and the added memset calls. The "binary diffing" process discussed in this post was in fact pseudocode-level diffing that didn't require much low-level expertise or knowledge of the operating system internals. It could have been easily used by non-advanced attackers to identify the three mentioned vulnerabilities (CVE-2017-8680, CVE-2017-8684, CVE-2017-8685) with very little effort. We hope that these were some of the very few instances of such "low hanging fruit" being accessible to researchers through diffing, and we encourage software vendors to make sure of it by applying security improvements consistently across all supported versions of their software.