Guest Windows debugging and crashdumping under QEMU/KVM: dump-guest-memory, vmcoreinfo and virtio-win
In the previous introductory article we’ve discussed crash dump capturing and debugging of guest Windows with built-in tools. This post address a case when Windows can’t produce the Complete Memory Dump during BSoD by itself or a live dump is needed. We will discuss QEMU’s command dump-guest-memory -w
which captures guest Windows dump in WinDbg-readable DMP format and saves it on the host side.
Preparation
This method is applicable for all Windows versions from Windows Server 2012 to, at least, Windows Server 2022 (in other words, any of the versions supported by Microsoft as of 2023) for both 32-bit and 64-bit platforms. It requires two things:
- vmcoreinfo device on the QEMU side
- FwCfg driver on the guest side
vmcoreinfo
If QEMU is run from the command line, add -device vmcoreinfo
to argument list. In case of libvirt, add <vmcoreinfo state='on'/>
to the XML config under <features></features>
.
virtio-win
Install FwCfg driver from virtio-win to the guest Windows. Note: There is also an old driver which doesn’t support 32-bit platform, but supports deprecated Windows kernel versions starting from 6.1 (Windows 7 and Windows Server 2008 R2).
Capturing crash or live dump
When either the system is still alive, or a BSoD occurs and the system does not reboot (this can be achieved using PVPanic device or unchecking “Automatically restart” option), the following command can be executed in QEMU monitor:
dump-guest-memory -w memory.dmp
The Complete Memory Dump will be written to memory.dmp
file. Then it can be opened in WinDbg like any other dump.
Typical issues
These are messages typically reported by dump-guest-memory -w
:
win-dump: invalid vmcoreinfo note size
: most likely, guest driver is not running yet (too early stage of Windows kernel initiatization) or not present at all, so the dump can’t be capturedwin-dump: number of QEMU CPUs is bigger than NumberProcessors (%u) in guest Windows
: mostly likely, a desktop version of Windows is running and limits number of CPUs, but the dump can still be captured
Internals
This section is dedicated to internals of the method discussed above.
Anatomy of the Complete Memory Dump
The structure of the Complete Memory Dump consists of several large parts:
- A one-page (4 KiB) header on a 32-bit system and 2 pages (8 KiB) on a 64-bit system.
- Snapshots (so-called “runs”) of contiguous regions of physical memory (Run #0 - Run #N).
Simplified scheme of the Complete Memory Dump
The number of runs, start addresses, and lengths are stored in the header. The dumpchk.exe
utility, which comes with the WinDbg debugger, is able to show information about the physical memory stored in the dump file. Usually there is more than one region due to PCI holes, because some of the physical address space is used for communication with peripherals and is not suitable for data storage. Therefore, not all of the physical address space is stored in the dump file. On 64-bit platform, each physical memory region is described in the dump header with the _PHYSICAL_MEMORY_RUN64
structure, which stores the start and the length of the region:
struct _PHYSICAL_MEMORY_RUN64 {
ULONG64 BasePage;
ULONG64 PageCount;
}
The list of physical memory regions is displayed by the dumpchk.exe
In addition to information about physical memory regions, the dump header contains other fields required for debugger operation, including:
BugcheckData
- error code and 4 parameters that describe the reason of the crashRequiredDumpSpace
- total dump size in bytesDirectoryTableBase
- the physical address of the root of the virtual-to-physical address translation for the debuggerPsLoadedModuleList
- the virtual address of the list of loaded executable modulesPfnDatabase
- virtual address of page frame number databaseMinorVersion, MajorVersion
- two fields that together determine the version of Windows kernelKdDebuggerDataBlock
- virtual address of Windows kernel structure, which stores information required for debugger (see below)
KdDebuggerDataBlock
KdDebuggerDataBlock contains the addresses of another important kernel data structures and offsets within them, for example:
KernBase
- virtual address of the Windows kernel image loaded into memory (ntoskrnl.exe) which is required for WinDbg to download PDB symbolsKiProcessorBlock
is a pointer to an array of pointers to PRCBs (processor control block) where Windows stores the data on each processor in use processor usedOffsetPrcbContext
- the offset of the structure inside PRCB, where Windows saves register context on crashOwnerTag
- KdDebuggerDataBlock signature - ASCII characters"KDBG"
.
The following fields contained in KdDebuggerDataBlock have their analogs in the dump header:
KiBugcheckData
MmPfnDatabase
PsLoadedModuleList
The structure of the KdDDebuggerDataBlock contents is described in _KDDEBUGGER_DATA64
structure in wdbgexts.h
, which comes with WinDbg. With a new version of Windows, new fields are only added to end of this structure, and already existing fields remain at the same offsets, this can be relied on regardless of the version of Windows.
It is easy to see that header fields such as RequiredDumpSpace
can be calculated and filled based on data from QEMU, but fields like KdDebuggerDataBlock
and PsLoadedModuleList
are known only to the guest kernel.
Unfortunately, practice shows that KdDebuggerDataBlock
in modern versions of Windows may be encrypted by the system at boot time and be inaccessible during operation, and decrypted only during a crash. So we need a way to decrypt it, but let’s come back to this problem a bit later.
KeInitializeCrashDumpHeader
KeInitializeCrashDumpHeader
is a partially documented function that driver calls to get the dump header. According to the documentation, the function returns a header that will be correct for the lifetime of the system, although it has the following limitations:
- If the amount of physical memory is changed, the header has to be retrieved again.
- The header received in this way does not contain data about the occurred exception (
BugcheckData
).
Also, according to the documentation, starting with Windows 8 the DirectoryTableBase
address in the resulting header always corresponds to the system context, but for earlier versions the context will be the same as the context of the current process. It can also be user context, which may then prevent the debugger from accessing the system structures using the virtual addresses.
Furthermore, after analyzing of some number of such dumps, some more inconsistencies were found:
RequiredDumpSpace
is not filledContext
structure is emptyPfnDatabase
field has wrong value
The RequiredDumpSpace
value must be filled properly othewise WinDbg will not work properly. The filling of the Context
structure does not affect the result, but the similar structures in the memory do. They are required by WinDbg to display register and call stack values. Some header values are duplicated in KdDebuggerDataBlock
, so it can be used to restore them. The ways of restoring the header fields are gathered in the corresponding table:
Header field | How to fix |
---|---|
BugcheckData |
Take structure pointed to by KiBugcheckData from KdDebuggerDataBlock in case of BSoD or set BugcheckCode to 0x161 in case of live dump |
PfnDatabase |
Take MmPfnDatabase from KdDebuggerDataBlock |
RequiredDumpSpace |
Calculate from physical memory info |
Context |
No fix needed |
fw_cfg and vmcoreinfo
After the header is received, it must be passed to the host. The vmcoreinfo
device is used to do this. The vmcoreinfo
device is an add-on to the fw_cfg
device.
fw_cfg
is a virtual device provided by QEMU that helps guest software to communicate with the host. From the guest point of view, it is a device with I/O ports 0x510-0x51B
. It provides access to an array of entries, which are simply blocks of arbitrary data and a string key associated with them.
vmcoreinfo
is a virtual device from QEMU accessed via a fw_cfg
entry named "etc/vmcoreinfo"
. The device was originally designed to transfer some Linux kernel data when creating guest Linux dumps.
To read from any fw_cfg
entry, the driver does the following:
- Sends the number of the desired entry to port
0x510
- Reads the data from port
0x511
byte-by-byte
To write data to vmcoreinfo
, we first need to find the entry by its name ("etc/vmcoreinfo"
) and check that data transfer from the guest to the host is possible:
- Read 4-byte entry with number
0x19
to get total number of entries - Read entries until the desired name is found
- Read 8 bytes from
0x514-0x51B
and check against"QEMU CFG"
- this means thatfw_cfg
supports DMA-like writing interface
In case of VMCoreInfo a packed structure of the following type is passed:
struct FWCfgVMCoreInfo {
uint16_t host_format; /* formats host supports */
uint16_t guest_format; /* format guest supplies */
uint32_t size; /* size of vmcoreinfo region */
uint64_t paddr; /* physical address of vmcoreinfo region */
} QEMU_PACKED;
Data transfer to host is done through a DMA-like interface:
- Write physical address of
FWCfgDmaAccess
structure containing the physical address of the data to be written, size, entry number and control bits to0x514-0x51B
- Check control bits, if they are all equal to 0, then the data has been successfully transferred
After this, QEMU will have access to the data at paddr
address. QEMU interprets them as an ELF Note section, which consists of a name (in this case "VMCOREINFO"
) and content - the dump header. If the structure of the section is correct, it becomes available to the QEMU dumping subsystem.
All the driver logic described above is implemented here.
Register context
In the debug dump analysis process, the register values are important in and of themselves, moreover, their values are necessary for the correct reconstruction of the call stack.
In the wdm.h
header file, which is delivered as part of the WDK, there is a definition of a structure called CONTEXT
. After comparing this definition, the register values from QEMU and Context
field from the saved Windows dump, it is clear that Context
field contains an implementation of this structure. But as on modern systems there is more than one processor, register contexts of all processors can’t fit into dump header (there is only space for one CONTEXT
instance).
An address space of Windows kernel contains per-CPU PRCB
structures with ContextFrame
fields. Each of these fields stores an address of a context frame from a corresponding CPU. If these structures are filled with zeroes, WinDbg cannot recover the context. In addition, the context structure contains a field of flags, one of which indicates that the context is 64-bit, so when the structure is zeroed, the debugger displays a message that only the 32-bit context is available. Once the context structures are filled in, these messages do not occur. So WinDbg definetely takes the contents of registers from these structures.
Thus, in order for WinDbg to retrieve the actual register values, they must be stored in the corresponding structures inside the memory dump. This procedure is implemeted in patch_and_save_context
function from dump/win_dump.c
.
KiBugcheckData
BugcheckData
is automatically saved at BSoD and this data can be simply copied to the header. But it turns out that when creating a live system dump, it is not enough to write BugcheckData to the header. In order for the debugger to use this data, it must also be saved in KdDebuggerDataBlock->KiBugcheckData
. This logic is implemented in patch_bugcheck_data
routine from dump/win_dump.c
.
Small Memory Dump
Along with a full memory dump, there is a Small Memory Dump. For example, Windows creates a dump in this format after a BSoD at default settings. Such dump contains the following data:
BugcheckData
structureEPROCESS
structure with an information about the faulty processETHREAD
structure with an information about the faulty threadPRCB
structure for the CPU on which the error occured- List of loaded modules
KdDebuggerDataBlock
structure
Unlike the Complete Memory Dump, KdDebuggerDataBlock
is stored in the Small Memory Dump at the offset written in its header, so the address translation is not required. Thus, if we have a small dump, we can use it to create a full memory dump.
KeCapturePersistentThreadState
KeCapturePersistentThreadState
is an undocumented Windows kernel function. The driver can use it to retrieve a Small Memory Dump.
KdDebuggerDataBlock
inside such dump will be decrypted. The guest driver can pass to the hypervisor not only the address of the original KdDDebuggerDataBlock
(which can be encrypted), obtained with KeCapturePersistentThreadState
, but also the address of its decrypted version, which will be stored in the driver’s memory.
In order for the hypervisor to take advantage of this feature, it must check the signature of the KdDebuggerDataBlock
whose address lies in its usual place in the header and, if the signature does not match, use the KdDebuggerDataBlock
whose address is passed by the driver through one of the unused header fields, such as BugcheckParameter1
, since this field has a null value anyway and must be filled in by the hypervisor. This is what QEMU does in check_kdbg
from dump/win_dump.c
.
Connection between dump elements at the stage of their loading from the guest memory
Overall host-side algorithm
- Pause the VM
- Synchronize state with KVM
- Take the header from the
vmcoreinfo
device - Calculate
RequiredDumpSpace
field value as the sum of the sizes of continuous physical memory regions described in the header - Use
DirectoryTableBase
field value from the header asCR3
register value when further accessing the guest OS virtual address space from QEMU - Take
KdDebuggerDataBlock
structure address from the header - Substitute
PfnDatabase
field value with theKdDebuggerDataBlock->MmPfnDatabase
value - Fill
BugcheckData
structure (error code and parameters)- In the case of the BSoD, take the content from
KdDebuggerDataBlock->KiBugcheckData
- In the case of the live system dump, write
0x161
(LIVE_SYSTEM_DUMP
) code and zero error parameters toKdDebuggerDataBlock->KiBugcheckData
- In the case of the BSoD, take the content from
- Write the register context to
KdDebuggerDataBlock->KiProcessorBlock[i]->ContextFrame
for eachi
in a range of CPU numbers, based on the registers from QEMU - Write header to the file
- Write regions of the guest physical memory to the file
- Unpause the VM
Conclusion
We have discussed the usage and internals of dump-guest-memory -w
command which is a useful tool for anyone looking to debug their Windows guests in the QEMU/KVM environment. The next post will be devoted to creating a dump with literally no action on the guest side.