2022-10 Monthly Archive

Misc ideas

BIOS firmware was stored in a ROM chip on the PC motherboard.

Intel AVX-512 enables twice the number of floating point operations per second (FLOPS) per clock cycle compared to its predecessor, Intel AVX2.

What is INT8

What Is int8 Quantization and Why Is It Popular for Deep Neural Networks? - MATLAB & Simulink

What is BF16 (BFloat16)?

Essentially all AI training is done with 32-bit floating point. But doing AI inference with 32-bit floating point is expensive, power-hungry and slow.

BFloat16 offers essentially the same prediction accuracy as 32-bit floating point while greatly reducing power and improving throughput with no investment of time or $.

Advantages Of BFloat16 For AI Inference

What is cache aliasing?

Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached.

VIVT, VIPT, PIPT

VIVT: may have cache aliasing problem because 2 virtual address maybe mapped into 1 physical address.

Why Intel use VIPT:

Virtual indexing means you can start reading the set from the cache before looking up the translation in the TLB.
Physical tagging allows you to avoid aliasing.

caching - Why does Intel use a VIPT cache and not VIVT or PIPT? - Stack Overflow

Cache basics

Cache line and cache block are the same, is the base unit like a cell. Note that this is not the same thing as a “row” of cache.

Cache set is a "row" in the cache, which contains multiple cache lines.

tag：感觉有点类似 hash 里的值，唯一表示这些数据，这样即使两个值 hash 是一样的，还是可以通过值来辨认它们。

Index: Use the set index to determine which cache set the address should reside in. Use the tag to compare cache lines in a set to find if the data is in the cache.

An address is composed of (From the most significant to the lowest significant):

tag: $t = l-s-b$ bits;
set index: $s=log_2{SetsNumber}$ bits;
block offset; $b=log_2{BlockSize}$ bits.

A tag is just partial address because in the same place in the cache, we can ensure that $s$ and $b$ are the same.

cache-handout.dvi

Msr bitmap

In Primary Processor-Based VM-Execution Controls, bit 28 is Use MSR bitmaps.

If the MSR bitmaps are not used, all executions of the RDMSR and WRMSR instructions cause VM exits.

If the bitmaps are used, an execution of RDMSR or WRMSR causes a VM exit if the value of RCX is in neither of the ranges covered by the bitmaps or if the appropriate bit in the MSR bitmaps (corresponding to the instruction and the RCX value) is 1.

Why system management mode in x86 processors?

ring −2.

All normal execution, including the operating system, is suspended. An alternate software system which usually resides in the computer's firmware, or a hardware-assisted debugger, is then executed with high privileges.

It is intended for use only by system firmware (BIOS or UEFI), not by applications software or general-purpose systems software.

The SMM can only be entered through SMI (System Management Interrupt).

Elf vs. bin

bin is the final way that the memory looks before the CPU starts executing it. ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.

vmlinux is an ELF file not a binary. kernel build may create zImage, that is a binary file.

Qemu cores/threads relationship with host cores/threads

In the case of QEMU, this is actually entirely **decoupled **from what it exposes to the guest system.

By default, it provides exactly one core, and exposes nothing about the host system's hardware topology.

With the use of the -smp option to provide arbitrary topologies. With just a number, it simulates a single CPU with that many cores, but for many platforms, you can specify exact values for the number of cores per package, number of threads per core, and number of sockets to expose.

So, in theory you could expose the 8 cores with 2 TPC (threads-per-core) of a Ryzen 7 to a QEMU guest as

16 independent cores, or
a single core with 16 threads, or
4 cores each with 4 threads, or
even as 16 separate CPU packages with 1 core each.
You can even give the guest more virtual cores than there are physical cores on the system, in which case those core swill be multiplexed across however many physical CPU's you have (this is really useful for testing purposes).

Also, from QEMU doc:

-smp <NUMBER> - Specify the number of cores the guest is permitted to use. The number can be higher than the available cores on the host system. Use -smp $(nproc) to use all currently available cores.

virtual machine - Is it possible to treat a thread as a core in QEMU? - Super User

Cpuid.htt

0 not supported, 1 supported.

cpuid -l 1 -1 | grep hyper-threading

Cycles used

Operation	Cycles
VM-exit	~4000

How to rebase or checkout a tag?

git checkout tags/<tag_name>
git rebase tags/<tag_name>

What is git tag, How to create tags & How to checkout git remote tag - Stack Overflow

Fast path

A path with shorter instruction path length through a program compared to the normal path. For a fast path to be effective it must handle the most commonly occurring tasks more efficiently than the normal path, leaving the latter to handle uncommon cases, corner cases, error handling, and other anomalies.

Fast path - Wikipedia

Line wrapping for qemu and kernel

	QEMU	Kernel
Code	80	80
Commit Title	76	the `summary` must be no more than 70-75 characters
Commit Body	76	75/76

QEMU

Please do not use lines that are longer than 76 characters in your commit message (so that the text still shows up nicely with “git show” in a 80-columns terminal window).

Lines should be 80 characters; try not to make them longer.

Kernel

The limit on the length of lines is 80 columns and this is a strongly preferred limit.

[PATCH V2] checkpatch/SubmittingPatches: Suggest line wrapping commit messages at 75 columns - Joe Perches

What is dangling commit?

A danging commit is any commit that is not reachable from some branch or tag.

Difference between cpuid on 64bit and 32bit machine

the instruction is operate the same for both 64b and non-64b

but even though that it's behave the same, keep in mind that CPUID instruction clears the high 32 bits of the RAX/RBX/RCX/RDX registers in all modes, so if you check the highest 32 bits in the registers mention above you will read 0x0.

32bit 64bit - What's the difference in CPUID work on 32-bit and 64-bit machines? - Stack Overflow

Rsync copy from remote to local

rsync -chavzP --stats sdp@10.45.76.123:~/lei.img ~/

ssh - Copying files using rsync from remote server to local machine - Stack Overflow

The popek/goldberg theorem

Formal Requirements for Virtualizable Third-Generation Architectures.

Assumption: conventional thirdgeneration architecture:

The computer has one processor with two execution levels: supervisor mode and user mode.
…

Live migration

Live migration using -cpu host (host passthrough) is not recommended:

Live migration is unsafe when this mode is used as libvirt / QEMU cannot guarantee a stable CPU is exposed to the guest across hosts. This is the recommended CPU to use, provided live migration is not required.

QEMU / KVM CPU model configuration — QEMU 7.1.50 documentation

vm2's command line add -incoming tcp:0:6666,

vm1's monitor: migrate tcp:127.0.0.1:6666.

Notice: The 2 image should be the same image.

The VM image is accessible on both source and destination hosts (located on a shared storage, e.g. using nfs).

QEMU KVM Libvirt: Live Migration - popsuper1982 - 博客园

How to switch to qemu monitor

Ctrl-Alt-2 to switch to QEMU monitor;

Ctrl-Alt-1 to switch back.

Image vs .tar

They can both be uncompressed to some files, what's the difference between them?

Image contains the total filesystem, but tar may just contain some folder, so it is filesystem agnostic;
This advantage is achieved by introducing a specific layer of dependency: the tar utility. its contents can be analyzed and manipulated externally by this utility too.

backup - What is the difference between a tar of a complete filesystem and an image? - Unix & Linux Stack Exchange

When mount a partition, should i know the file system first?

In general you don't need to specify what type of file system you are mounting. Linux can detect the type from the file system metadata.

linux - Mounting a hard drive without having the corresponding file system - Stack Overflow

Tmux send-keys list

tmux(1) - OpenBSD manual pages

Parameter 'type' expects a netdev backend type

The newer QEMU has dropped the slirp submodule, which is used for the user network stack. It seems Ubuntu Server 22.04 will not install it by default, so you need to:

sudo -E apt install libslirp-dev
./configure
make

On CentOS 8, it is:

sudo yum install libslirp-devel

Word

The majority of the registers in a processor are usually word sized and the largest datum that can be transferred to and from the working memory in a single operation is a word in many (not all) architectures.

The largest possible address size, used to designate a location in memory, is typically a hardware word.

Why could cpu load multiple bytes (a word) at once?

You can specify addresses at the byte level, does not mean you have to have an 8 bit data bus. Most (possibly all) modern x86 processors use a 64 bit data bus and every time they read from memory, they read 64 bits. If you only requested 8 bits, the excess is simply discarded.

The CPU always reads at its word size.

Purpose of memory alignment - Stack Overflow

Why CPU shouldn't read an unalignment word?

It has everything to do with how the underlying low-level memory access hardware works…

As for why it is so… well, that's just how modern computer memory hardware works. The data has to be aligned. If it is not aligned, the access either is less efficient or does not work at all.

A very simplified model of modern memory would be a grid of cells (rows and columns), each cell storing a word of data. A programmable robotic arm can put a word into a specific cell and retrieve a word from a specific cell. One at a time. If your data is spread across several cells, you have no other choice but to make several consecutive trips with that robotic arm.

You can see:

https://stackoverflow.com/a/3655952/18644471

assembly - How does a CPU load multiple bytes at once if memory is byte addressed? - Software Engineering Stack Exchange

Boot guard

如何以及是谁来验证 BIOS 的完整性。一个显而易见的方法是对 BIOS 进行签名，用主板生产厂商的私钥进行签名，如果签名校验失败，就拒绝执行 BIOS 代码，这就是 Boot Guard 的大致工作原理。

Boot Guard。一旦开启就不能关闭，这也是为了安全计。

由本文也可见，ACM 是在 reset vector 前就执行。

什么是 Boot Guard？电脑启动中的信任链条解析 - 老狼的文章 - 知乎 https://zhuanlan.zhihu.com/p/116740555

Difference between driver and firmware

驱动是 OS 的一部分，跑在 CPU 上；Firmware 是硬件的一部分，跑在硬件板载的嵌入式芯片上。两者之间通过某些协议进行沟通，譬如对于硬盘驱动和 firmware 之间就是 SCSI 之类的协议。

驱动与固件的区别是什么？ - 知乎 https://www.zhihu.com/question/22175660/answer/20548281

但是为什么不把 fimware 做的很完美，做的不需要驱动支持呢？因为有不同的操作系统。

不同操作系统的驱动是不能兼容的，原因就是驱动是为操作系统服务的。

硬件厂商一方面为了自己的硬件能被软件更简单的使用，就需要写 firmware，而另一方面为了兼容各种操作系统，又不能把 firmware 写的太死，必须预留足够的余地让软件自由发挥 —— 软件的自由发挥就是驱动。

驱动与固件的区别是什么？ - 时国怀的回答 - 知乎 https://www.zhihu.com/question/22175660/answer/20547502

Reset vector

Default location a CPU will go to find the first instruction it will execute after a reset.

The reset vector for the 8086 processor is at physical address FFFF0h (16 bytes below 1 MB). The value of the CS register at reset is FFFFh and the value of the IP register at reset is 0000h to form the segmented address FFFFh:0000h, which maps to physical address FFFF0h.

The address is in a section of non-volatile memory initialized to contain instructions to start the operation of the CPU, as the first step in the process of booting the system containing the CPU.

Intel processor tracing (intel pt)

Henschel_Intel-PT_2017.pdf

Some cpuids don't report a 1-bit on/off, but a number (0x1d/0x1e)

0x1D:

EAX: Bits 15-00: Palette 1 total_tile_bytes. Value = 8192. Bits 31-16: Palette 1 bytes_per_tile. Value = 1024.
EBX: Bits 15-00: Palette 1 bytes_per_row. Value = 64. Bits 31-16: Palette 1 max_names (number of tile registers). Value = 8.
ECX: Bits 15-00: Palette 1 max_rows. Value = 16. Bits 31-16: Reserved = 0.
EDX: Bits 31-00: Reserved = 0.

Datum

Plural of data.

An useful web tool to show compiled c source code in assembly

Compiler Explorer

How to understand little/big endian in a human-friendly way?

The key to remember is "byte", little/big endian has no relationship with the bit order.

Big Endian Byte Order: The most significant byte (the "big end") of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.

Little Endian Byte Order: The least significant byte (the "little end") of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.

Endianness - Wikipedia

In some program, you may see:

A1 34 12 00 00 mov a,%eax 
A3 00 00 00 00 mov %eax,b

That's because the higher address the downer and righter.

Image

An image file is a copy of the data on a block device, in the form of a file (on another filesystem). Image files can have any extension; .img is common.

A .iso file is usually an image file of a block device containing an ISO9660 filesystem. It contains an exact representation of the data stored on a CD. Analogously, you could have a .img file (call it .ext3 if you prefer) that is an image file of a block device containing an ext3 filesystem.

The reason you cannot write to ISO is not because it is a mounted regular file instead of a device special file, but because the file system driver^ does not support it. If the image contained another filesystem like FAT32 or EXT2 instead of ISO9660, you would be able to read-write.

Is there a file system named qcow2?

No. Although there is an image file end with .qcow2, that doesn't mean there is a qcow2 file system in it.

It is a format for storing and managing disk images that can contain different types of file systems, such as ext4, NTFS, or FAT32, depending on the operating system installed on the virtual machine.

libguestfs, library for accessing and modifying VM disk images

What does mounting an image mean?

For an example, we can download an image from Ubuntu as ubuntu-16.04.1-server-amd64.iso.

create a mount point: mkdir -p /mnt/cdrom.

The file system of an ISO image is ISO9660, so we can use the following command to mount it:

sudo mount -o loop -t iso9660 ~/ubuntu-16.04.1-server-amd64.iso /mnt/cdrom

It is read-only, because when mount success, it will say:

mount: /mnt/cdrom: WARNING: source write-protected, mounted read-only.