KVM Address Space

Why x86 KVM has 2 address spaces?

// include/linux/kvm_host.h
#define KVM_ADDRESS_SPACE_NUM	1
// arch/x86/include/asm/kvm_host.h
# define KVM_ADDRESS_SPACE_NUM 2

How to resolve conflicts when applying patches

git am --reject --whitespace=fix

这样会生成一个 reg 文件。

Userspace and memory page

用户态程序好像并不需要有分页的概念,用户态程序的视角里地址空间是连续的,并且粒度是字节而不是页。

Get self's thread id (SPID, TID)

pid_t tid = gettid();
info_report("tid: %d", tid);

How to see a process's thread

ps -T -p <pid>

Query enabled capabilities in HMP during live migration (QEMU)

In QMP:

info migrate_capabilities

Exit reason / vcpu->run->exit_reason / exit type (KVM)

有两种 Exit reasons:

  • 第一种是从 non-root mode exit 到 root mode (KVM) 的 exit reason(SDM-defined, APPENDIX C VMX BASIC EXIT REASONS)。
  • 第二种是从 KVM exit 到 Userspace (QEMU) 的 exit reason。

第一种 reason 的定义都是以 EXIT_REASON_* 开头的。第二种 reason 的定义都是以 KVM_EXIT_* 开头的,且存放在 vcpu->run->exit_reason 里面返回给 QEMU。

See thread's CPU utilization

top -H

Current folder size / 查看当前文件夹大小

du -sh

查看当前文件下各个文件的大小:

du -h –max-depth=1 *

Bulk stage QEMU

From ChatGPT:

QEMU live migration consists of several stages, one of which is the "bulk stage." In the context of QEMU and live migration, the bulk stage is when the majority of the virtual machine's memory and state data is transferred from the source host to the destination host. This is a critical phase in live migration, as it involves copying the VM's active memory pages and ensuring that the destination host has an up-to-date replica of the VM's state.

During the bulk stage, QEMU tries to minimize downtime and ensure that the VM remains operational on the source host while transferring as much data as possible to the destination host. Once the bulk stage is complete and the destination host has caught up with the source host's state, the migration enters the "incremental stage," where any remaining changes are synchronized before the final cutover to the destination host.

The bulk stage is a crucial step in ensuring a smooth and seamless live migration of virtual machines between hosts in virtualized environments.

QEMU debug thread "debug-threads"

This causes the naming of individual QEMU threads to be helpful; e.g. "CPU/KVM 0" or "migration". This allows libvirt to identify the purpose of each individual QEMU thread (vCPU number, iothread, etc.).

QEMU trace event / tracing

A file trace-events must exist in the directory that contains the source code file you are going to trace. 比如:

bsd-user/trace-events
migration/trace-events
backends/trace-events
hw/tpm/trace-events
hw/pci/trace-events
hw/ssi/trace-events
hw/xen/trace-events
//...
io/trace-events
ui/trace-events
trace-events

这个文件长这个样子:

# See docs/devel/tracing.rst for syntax documentation.

# savevm.c
qemu_loadvm_state_section(unsigned int section_type) "%d"
qemu_loadvm_state_section_command(int ret) "%d"
# ...
loadvm_handle_cmd_packaged(unsigned int length) "%u"

如果要使用,在启动 QEMU cmdline 的时候需要指定 trace point:

--trace "kvm_*" --trace "virtio_*"

QEMU trace backend

Before answering what is a "trace-backend" , let me ask you a question: Where do you want the traces be printed? Stdout, a files or anywhere? trace-backend answers this question, it specifies where the trace go.

./configure --enable-trace-backends=simple,dtrace
[Tracing in QEMU. Do you know how to print information in… by Michael Zhao Medium](https://michael2012z.medium.com/tracing-in-qemu-8df4e4beaf1b)

To see all the backends:

Tracing — QEMU documentation

"simple" Trace backend

生成一个二进制的 log 文件。如果要分析这个二进制文件,需要借助第三方的工具:

./scripts/simpletrace.py build/trace/trace-events-all trace-12345

可以在 QEMU HMP monitor 里打开/关闭/flush/ trace file 或者设置 trace file 的名字。

trace-file on|off|flush|set <path>

After exiting the virtual machine, a new file was generated in my working directory: trace-<pid>.

Transparent Huge Page (THP)

Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages. 一定要明白的是 THP 是一个软件上的概念。

为什么叫做 Transparent?

Huge pages can be difficult to manage manually, and often require significant changes to code in order to be used effectively. As such, Red Hat Enterprise Linux 6 also implemented the use of transparent huge pages (THP). THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages.

THP hides much of the complexity in using huge pages from system administrators and developers.

THP Are named as such because the allocation and management of these large pages are transparent to the applications running on the system.

  • Application unaware: Applications don't need to explicitly request or be aware of THP allocation. The operating system kernel handles the process of identifying suitable memory regions and converting them into THPs.
  • Memory management abstraction: THP provides a layer of abstraction between the application's virtual memory addresses and the physical memory layout. Applications continue to use their regular virtual memory addresses, while the kernel manages the mapping to the larger THP physical pages in the background.(也就是说 VA 都是连续的,但是 back VA 的那些才是 huge page)。
  • No code changes required: Applications don't need code modifications to benefit from THP. The kernel automatically manages the mapping and translation between virtual and physical addresses.

THP 需要硬件对于大页的支持吗?

不需要。如果开始用了大页,那么很明显,一个 non-4K 级的 PSE 可能是 leaf(一个大页),也可能是 non-leaf(下面管着很多小页), 那怎么区分一个页表项是否为 leaf 呢?

THP and hardware page size

一定要确认的是 THP 是一个软件上的概念。

While THP itself is a software concept, its effectiveness can be influenced by hardware page sizes. Ideally, the THP size should align with the hardware's supported large page size for optimal performance. This alignment ensures efficient address translation using hardware mechanisms designed for handling large pages.

硬件页大小(hardware page size)指的可能就是比如说 PTE 的设计将 offset 设计为了几位。在 x86 中我们将 offset 设计为了 12 bit,这就有了 4K 的页。 一个支持 2M 页的 hardware 同时也支持 4K 页。这种支持是通过

THP and memory folios

THP (Transparent Huge Pages):

  • Focuses on reducing memory overhead by using larger page sizes for memory allocations.
  • By using larger pages (e.g., 2MB or 1GB instead of the usual 4KB pages), THP reduces the number of page table entries required to manage memory, leading to potentially better performance and lower memory management overhead.

Memory Folios:

  • Focuses on optimizing memory management for guest virtual machines (VMs) running on a Linux host.
  • Introduced in newer Linux versions, memory folios are essentially contiguous regions of physical memory allocated to a VM.
  • They improve efficiency by reducing page table fragmentation within the guest VM and simplifying the translation between guest virtual addresses and host physical addresses.

HugeTLB

需要硬件支持多个不同的页大小才行。For example, x86 CPUs normally support 4K and 2M (1G if architecturally supported) page sizes.

First the Linux kernel needs to be built with the CONFIG_HUGETLBFS (present under “File systems”) and CONFIG_HUGETLB_PAGE (selected automatically when CONFIG_HUGETLBFS is selected) configuration options.

The 'hugetlb" term is also (and mostly) used synonymously with a HugePage, 所以说其实 HugeTLB 就是标准大页,后面 RH 为了增加大页分配的灵活性,引入了 THP,更加灵活也更加复杂。

Hugetlbfs

Applications can allocate memory from hugetlbfs and benefit from potential performance improvements.

主要是把 huge page 这个 feature 暴露给 application 来用。application 可以在这个 fs 下面创建文件,其实是 backed by memory 里的 huge page 的。

The method to use hugetlbfs basically boils down to:

  • use mmap with MAP_HUGETLB flag;
  • or, map a file from the mounted hugetlb filesystem, if it exists.

复合页, HugeTLB, THP 三者区别 / 大页

此三者基本都有一个特点,就是都属于多 page 组合而成,所以他们都能称之为复合页。

Linux 下的大页分为两种类型:标准大页(Huge Pages)和透明大页(Transparent Huge Pages)。Huge Pages 有时候也翻译成大页/标准大页/传统大页。

THP 是 RHEL 6 开始引入的一个功能。

这两者的区别在于大页的分配机制,标准大页管理是预分配的方式,而透明大页管理则是动态分配的方式。

Redhat 的文档,谈了 Huge page 以及 Transparent huge page:5.2. Huge Pages and Transparent Huge Pages Red Hat Enterprise Linux 6 | Red Hat Customer Portal

传统大页很难手动管理,而且通常需要对代码进行重大更改才能有效地使用。因此,红帽实现引入了透明大页面 (THP)。THP 是一个抽象层,可以自动创建、管理和使用传统大页的大多数方面。

The HugeTLB term is also (and mostly) used synonymously with a HugePage.

THP is a much more complex mechanism than HugeTLB. 更复杂更灵活。

hugetlb: This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage (See Note 261889.1). In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.

Linux传统Huge Pages与Transparent Huge Pages再次学习总结 - 潇湘隐者 - 博客园