Misc

QEMU release planning

Planning - QEMU

QEMU thread model / type

QEMU has several different types of threads:

  • vCPU threads that execute guest code and perform device emulation synchronously with respect to the vCPU.
  • The main loop that runs the event loops (yes, there is more than one!) used by many QEMU components.
  • IOThreads that run event loops for device emulation concurrently with vCPUs and "out-of-band" QMP monitor commands.

注意 main loop thread 和 IOThread 是不同的,可以看下面这篇文章的解释。

Stefan Hajnoczi: QEMU Internals: Event loops

Singly-linked list in QEMU

It is implemented in file include/qemu/queue.h.

A singly-linked list(单向链表)is headed by a single forward pointer. The elements are singly linked for minimum space and pointer manipulation overhead at the expense of O(n)O(n) removal for arbitrary elements. 一个链表的操作复杂度是 O(n)O(n),很正常。

New elements can be added to the list after an existing element or at the head of the list. Elements being removed from the head of the list should use the explicit macro for this purpose for optimum efficiency.

A singly-linked list may only be traversed in the forward direction.

Singly-linked lists are ideal for applications with large datasets and few or no removals or for implementing a LIFO queue(last-in-first-out,就是栈)。

HMP, QMP and QAPI

The QEMU Machine Protocol (QMP) is a JSON-based protocol which allows applications to control a QEMU instance.

The HMP (Human Monitor Interface) is the simple interactive monitor on QEMU, designed primarily for debugging and simple human use. Higher level tools should connect to the QMP which offers a stable interface with JSON to make it easy to parse reliably.

QAPI: Introduce a number of changes to the internal implementation of QMP to simplify maintainability and enable features such as asynchronous command completion and rich error reporting. Minimize protocol-visible changes as much as possible. Provide a C library interface to QMP such that can be used to write thorough in-tree unit tests.

Features/QAPI - QEMU

Qemu's implementation of common date structures

qemu has its own implementation of

  • singly-linked list;
  • list;
  • simple queue;
  • tail queue.

they are in file include/qemu/queue.h

Function name with or without prefix qemu_

Wrapped version of standard library or GLib functions use a qemu_ prefix to alert readers that they are seeing a wrapped version, for example qemu_strtol or qemu_mutex_lock. Other utility functions that are widely called from across the codebase should not have any prefix, for example pstrcpy or bit manipulation functions such as find_first_bit.

Struct naming convention

QemuObject or QEMUObject or QObject.

hw/ Folder?

包含了所有支持的硬件设备。

Relationship with qdev

qdev is rebuilt on top of QOM (2011), so maybe there is qdev first, then QOM, then rebuilt qdev based on QOM.

Their difference can be found in

Functions

cpu_x86_cpuid

Input:

  • env (The processed information based on CPU Model. User can specify cpu model in the command line, then the value will be expanded and filtered into the env->features structure)
  • index (CPUID's leaf, EAX)
  • count (CPUID's subleaf, ECX);

Output: CPUID word value exposed to the guest.

Data Structures

include/qemu/typedefs.h has almost all the data structures in QEMU, you can check there.

QEMUOption

typedef struct QEMUOption {
    const char *name; // valid option name with out the dash, such as "machine"
    int flags; // some flags, including if this option need a value? e.g., "--cpu" need a value <number> but "-enable-kvm" needn't
    int index; // order in the array qemu_options, also can be seen in the build/qemu-options.def
    uint32_t arch_mask;
} QEMUOption;

QDict


CPUX86State

In target/i386/cpu.h.

VMState

Most device data can be described using the VMSTATE macros (mostly defined in include/migration/vmstate.h).

X86CPUDefinition (aka. CPU Model)

target/i386/cpu.c

A global list of this struct builtin_x86_defs[] is used to define CPU models such as SPR.

Globals

qemu_options

a global variable of type QEMUOption[], will be initialized on start with all the avalable options such as:

  • h
  • help
  • machine
  • m
  • cpu
  • device
  • drive

Processes

How does QEMU create a KVM vcpu?

x86_cpu_realizefn
	qemu_init_vcpu
		create_vcpu_thread (kvm_start_vcpu_thread)
			kvm_vcpu_thread_fn
				kvm_init_vcpu
					kvm_arch_init_vcpu

Event Loop

qemu 中有两种类型的事件循环(event loop),其中主循环(main loop)是 qemu 运行的基础,最初整个 qemu 只有这一个事件循环,所有的事件,包括定时器、事件通知(pipe)、后端 IO 事件,都经由主循环收集(poll)并分发处理(dispatch),主循环的退出也意味着 qemu 进程的退出。在后期为了降低虚机硬盘 IO 的抖动,为硬盘 IO 增加了独立的 IO 线程,所有的 IO 事件也都在该 IO 线程进行处理。

qemu 的事件处理 | Runsisi's Blog

Why using the event loop: The application can appear to do multiple things at once without multithreading because it switches between handling different event sources.

The most important event sources in QEMU are:

  • File descriptors such as sockets and character devices.
  • Event notifiers (implemented as eventfds on Linux).
  • Timers for delayed function execution.
  • Bottom-halves (BHs) for invoking a function in another thread or deferring a function call to avoid reentrancy.

QEMU uses 2 types event loop implementations:

  • its native AioContext event loop and,
  • glib's GMainContext.
Thread type glib's event loop num AioContext event loop num
Main loop 1 2
IOThreads 1 1

How to combine AioContext and GMainContext?

QEMU components can use any of these event loop APIs and the main loop combines them all into a single event loop function os_host_main_loop_wait() that calls qemu_poll_ns() to wait for event sources. This makes it possible to combine glib-based code with code using the native QEMU AioContext APIs.

Can we use event loop and coroutines both in a single QEMU thread?

如果 event loop 的 event callback 函数里有这种 block 的操作,比如:

/* 3-step process using coroutines */
void coroutine_fn say_hello(void)
{
    const char *name;
    // 等待用户输入名字...
    co_send("Hi, what's your name? ");
    name = co_read_line();
    co_send("Hello, %s\n", name);
}

那么整个 event loop 就会被 block 住。

Coroutines make it possible to write sequential code that is actually executed across multiple iterations of the event loop. This is useful for code that needs to perform blocking I/O and would quickly become messy if split into a chain of callback functions.

If the coroutine needs to wait for an event such as I/O completion or user input, it calls qemu_coroutine_yield().

如果一个 event 的 callback 是一个 coroutine,如何保证这个 coroutine 在 yield 之后还能重新跑?

ram_load_precopy
    if ((i & 32767) == 0 && qemu_in_coroutine()) {
        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
        qemu_coroutine_yield();
    }
    i++;

可以看到我们不仅仅通过 qemu_coroutine_yield() yield 出去了,我们在之前还调用了 aio_co_schedule()^,这个会把这个协程设计成 BH 也就是可以让其在下一个 iteration 触发,这样就不用我们手动再去 enter 了。

Stefan Hajnoczi: Coroutines in QEMU: The basics

What AioContext supersede GMainContext?

  • AioContext event sources can have a polling function that detects events without syscalls. This allows the event loop to avoid block syscalls that might lead the kernel scheduler to yield the thread.
  • O(1) time complexity with respect to the number of monitored file descriptors. i.e., Using epoll-like to replace poll.
  • Nanosecond timers. glib's event loop only has millisecond timers, which is not sufficient for emulating hardware timers.

Stefan Hajnoczi: QEMU Internals: Event loops

main_loop_wait() QEMU

QEMU's main event loop is main_loop_wait(). 这个函数每次只执行一个 iteration,所有我们可以看到这个函数往往都是在 while 循环中被调用的。

main
    qemu_main
        qemu_default_main
            qemu_main_loop
                while (!main_loop_should_exit(&status))
                    main_loop_wait(false);
                        os_host_main_loop_wait

os_host_main_loop_wait() QEMU

static int os_host_main_loop_wait(int64_t timeout)
{
    GMainContext *context = g_main_context_default();
    // 拿到 thread default context 的使用权
    // 这样就可以从这个 context 里面的 event source 里面获得 event 了
    g_main_context_acquire(context);
    glib_pollfds_fill(&timeout);
    //...
    ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, timeout);
    //...
    glib_pollfds_poll();
    // 释放这样别的线程就可以用了。
    g_main_context_release(context);
}

Announce in QEMU

struct AnnounceParameters QEMU

struct AnnounceParameters {
    int64_t initial;
    int64_t max;
    int64_t rounds;
    int64_t step;
    bool has_interfaces;
    strList *interfaces;
    char *id;
};

include/net/announce.h

struct AnnounceTimer QEMU

struct AnnounceTimer {
    QEMUTimer *tm;
    AnnounceParameters params;
    QEMUClockType type;
    int round;
};

net/announce.c

qemu_announce_self() QEMU

void qemu_announce_self(AnnounceTimer *timer, AnnounceParameters *params)
{
    qemu_announce_timer_reset(timer, params, QEMU_CLOCK_REALTIME, qemu_announce_self_once, timer);
    if (params->rounds) {
        qemu_announce_self_once(timer);
    } else {
        qemu_announce_timer_del(timer, true);
    }
}