2024-08 Monthly Archive

`malloc()` 是如何进行内存管理的 / How does `malloc()` implemented / How does kernel manage userspace memory

Linux 给了我们两种类型的系统调用来申请动态内存，分别是 brk() 和 mmap()，malloc() 仅仅是在这二者之上做了一些其他的事情而已。

The heap starts with one big free region, but after the program has been running for a while it will probably be divided into some regions that are being used by the program, and some that are not，这些信息（那部分在用哪部分没有在用）应该是记录在 Linux kernel 的 VMA 里的。

所以寻找从哪里开始分配内存是内核要做的事情（也就是 mmap 这个 syscall），不是 malloc() 这个用户空间的程序来决定的。malloc() 只是把 kernel 的决定（也就是最终分配到的内存的起始地址）返回了而已。

同样，free() 把虚拟地址传给了 kernel，kernel track 了每一次 malloc() 和 free() 的调用，并且根据调用的情况 track 了哪里是 free 的哪里是在用的，以便于下一次 malloc() 地址的决定（一般来说 kernel 不会去做 compact，原因很简单，一是开销比较大，二是 userspace 对应的指针地址也要更改，这种侵入式的实现方式对用户程序不友好）

可以看下这个 slide： Handout-6.pdf

堆 heap

在计算机科学中，堆有两种不同的含义：

一种排序算法，实现优先队列的一种方式。
进程内存空间的堆区，或者有时候也叫做数据段，用来动态分配内存用的。

都叫这个名字只是巧了而已，因为 heap 在英语里就是一堆东西的意思，所以无论是英文世界还是中文世界，都很容易搞混这两个东西，但是其实只是用了同一个名字而已。

The data structure used to provide general memory allocation such as this is called a heap. It’s not the same type of heap you learned about (or will learn about) in a data structures course (those heaps were used to implement a priority queue). Our heap is simply a data structure for keeping track of memory.

Handout-6.pdf

Firmware

根据定义：Firmware is a generic name for all the software that is embedded on non-volatile memory. BIOS is stored in ROM, so it is firmware，这么来说存储在 disk 上的 binary 也应该是 firmware，但是其实并不是 firmware，这是为什么呢？

我觉得应该是 CPU（或者 firmware 所在的处理器比如 GPU）能够直接执行 ROM 上的代码，但是 binary 需要 OS 的支持比如需要 Loader 什么的。

热升级

kexec 就是一种能够使能内核热升级（live update）的方案。

字节跳动提出KVM内核热升级方案，效率提升5.25倍_语言 & 开发_Fam Zheng_InfoQ精选文章

`SIGSEGV` / Segmentation fault / 数组越界

首先，它的确是硬件可能报出来的一种异常（但应该是一些比较老的硬件了）。同时它也是一个信号，如果 process 有自己的 handler 的话也可以 handle，否则的话操作系统默认的 handler 会将其 kill 掉。

假如有一个用户程序，我们要访问一个指针，指向一个还没有被分配的内存地址，那么当 code 访问这个指针的时候，实际上硬件报出来的并不是 segment fault，其实是 page fault，只不过 OS 帮我们转成了 SIGSEGV 报告给了 user 程序。

在编程语言中，数组越界并不一定会被发现。

如果越界访问的区域正好是申请的另一片区域（虽然不是这个数组的区域），这种情况会有错误值，不一定会被发现；
如果越界访问的区域是还没有向操作系统申请的一片区域，这种情况硬件会返回 page fault，OS 可以转成 SIGSEGV 信号。

当然，这和编程语言的设计也有关系，像 C 语言这种没有数组越界检查的语言可能会有数组越界发现不了的情况，但是像 Rust 这种会编译一些附加检查代码在 Rust 程序中的，相当于运行时会进行检查数组的访问有没有越界（毕竟在编译期数组长度是知道的，只需要在运行时将这个值和 index 值对比一下即可判断）。

VTune 的使用

VTune 的主要工作模式就是监控一个正在运行的进程。你可以指定这个进程的 PID 或者进程的名字（exe）来使用。（注意，VTune 不仅仅只有 Windows 版本的，其也有 Linux 版本的）。

在 profile 阶段，我们可以选择不同的侧重点，也就是右边 how 的那一部分。主要有以下几部分：

Performance Snapshot：性能的大概测量，用来帮助下一步的分析；
Hotspots：用来找到花时间最多的函数，可以看到每行代码的运行时间；
Anomaly Detection：找到性能上的异常点；
Threading…
HPC Performance Characterization…
Microarchitecture Exploration：可以用来分析是 Frontend Bound 还是 Backend Bound 的。
Memory Access…
还有一些其他的 accelerator 比如 GPU/NPU 相关的。

在分析阶段，我们有很多的 Tab 可以看：

Flame Graph：可以看火焰图。

如果想要看一个进程里面调用的所有 DLL 占用的时间，可以选择 hotspot 的统计方式，在结果那看 bottom-up，然后 Grouping 哪里选 Module / Code Location / Call Stack。对应的 command line 是：

vtune --collect hotspots target-process="geekbench_avx2.exe" 

ETL 格式

Windows 才有的文件。是 Windows 操作系统内核生成的事件日志的日志文件。

CMake, Meson, Ninja

首先要区分 Build System 以及 Build System Generator：

Build System Generator：就是生成类似 Makefile 文件的元文件，比如对应 CMake 就是 CMakeLists.txt；Meson 也包含在内。
Build System：执行 Makefile 来进行 build 的工具。比如 Ninja。

Meson 看起来比 CMake 更好用：

[C++ build systems: our transition from CMake to Meson | by Niek J. Bouman

Medium](https://niekbouman.medium.com/c-build-systems-our-transition-from-cmake-to-meson-2c043e93822f)

这里也有一些说 Meson 的缺点的：

I really don't think Meson and CMake should be mentioned in the same breath here... | Hacker News

QMK, VIA 键盘客制化

QMK 是一系列键盘客制化工具的集合，包含更改键位，编译成 binary，将 binary 刷到键盘上的工具等等。

VIA 是一个对用户更加友好的软件，是一个可视化的软件变成界面，支持从改键到烧录一条龙的服务。

一步步做个最适合自己的键盘固件：QMK 固件制作全解少数派会员 π+Prime

指令级并行（ILP）、数据级并行（DLP）、线程级并行（TLP）、请求级并行（RLP）

指令级并行是指在一个处理器内部，利用流水线、超标量、乱序执行等技术，使得多条指令可以同时或部分重叠地执行，从而提高指令的执行速度。

数据级并行是指在一个处理器或多个处理器之间，利用单指令多数据（SIMD）或多指令多数据（MIMD）等技术，使得一条或多条指令可以同时对多个数据进行相同或不同的操作，从而提高数据的处理能力。

线程级并行是指在一个处理器或多个处理器之间，利用多线程、多核、多处理器等技术，使得多个线程或进程可以并发或并行地执行，从而提高程序的吞吐量。

If these requests are independent of each other, they can be processed in parallel. Requests are independent of they do not use the same resources. Synchronization of a shared state is one of the biggest obstacles in parallel request processing. 缺点是需要系统或网络支持分布式或集群式的架构。

Windows API / `windows.h`

windows.h is a Windows-specific header file for the C and C++ programming languages which contains declarations for

all of the functions in the Windows API,
all the common macros used by Windows programmers, and
all the data types used by the various functions and subsystems.

Kernel32.dll 和 User32.dll 的区别

这三个都属于 Windows API，属于里面不同的模块。

user32.dll：是 Windows 用户界面相关应用程序接口，用于包括 Windows 处理，基本用户界面等特性，如创建窗口和发送消息
gdi32.dll：gdi32.dll 是 Windows GDI 图形用户界面相关程序，包含的函数用来绘制图像和显示文字
kernel32.dll：控制着系统的内存管理、数据的输入输出操作和中断处理

微软就是靠这三个模块起家的，Windows SDK 只利用这三个模块就能构建基本的 Windows 程序。Now the windows API is now far FAR larger than the 3 subsystems Windows 3.x offered (kernel, user and gdi).

注意不要把 kernel32.dll 和内核弄混，内核是 ntoskrnl.exe。ntoskrnl.exe also known as the kernel image, contains the kernel and executive layers of the Microsoft Windows NT kernel。

kernel32.dll exposes to applications most of the Win32 base APIs, such as memory management, input/output (I/O) operations, process and thread creation, and synchronization functions.

On system boot up, kernel32.dll is loaded into a protected memory so that it is not corrupted by other system or user processes. It runs as a background process and carries out important functions like memory management, input/output operations and interrupts.

注意，kernel32.dll 是跑在 user mode 而不是 kernel mode 下的。kernel32.dll 有点像 glibc 的功能？对 syscall 进行了一层封装。

Why is kernel32.dll running in user mode and not kernel mode, like its name implies? - The Old New Thing

`AppInit_DLLs`, `LoadAppInit_DLLs`, `RequireSignedAppInit_DLLs`

我们可以通过设置：

AppInit_DLLs String -> path to dll
LoadAppInit_DLLs Dword -> 1
RequireSignedAppInit_DLLs Dword -> 0

的方式来让打开一个 binary 的时候先启动我们自己的 DLL 来截获。

AppInit_DLLs 注入：User32.dll 被加载到进程时，会加载 HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows 中 Appinit_Dlls 的值，修改其值可以使其加载恶意的脚本。

DLL 被加载到进程后会自动运行 DllMain() 函数，用户可以把想执行的代码放到 DllMain() 函数，每当加载 DLL 时，添加的代码就会自然而然得到执行。利用该特性可修复程序 Bug，或向程序添加新功能。

How to use VSCode to run Visual Studio projects?

插件：VS Code .csproj

Manage C# projects in Visual Studio Code

Pip proxy on windows

在系统的环境变量中添加一项 PIP_PROXY，指向 http://child-prc.sh.intel.com:913

Disable VBS on Windows

最关键的一步是：

bcdedit /set hypervisorlaunchtype off

上面命令会导致 WSL 不可用，所以 Disable VBS 和 WSL 是冲突的。

永久关闭vbs提升性能，同时保留WSL/WSA的方法 - 哔哩哔哩

下面方案 reboot 后会失效：如何科学地关闭VBS并与Windows Subsystem全家桶共存 - 哔哩哔哩

gpedit.msc 打不开

如何打开计算机本地组策略编辑器_本地组策略编辑器怎么打开-CSDN博客

How to setup RDP for Windows

Windows 家庭版没有办法使用 RDP，所以我们需要使用下面的项目。

https://github.com/sebaxakerhtc/rdpwrap

Failed to launch AutoHotKey script

AutoHotKey 检测到我们的脚本是 v1 的，其实是 v2 的，我们需要把脚本的打开程序设置为 AutoHotkey 64-bit 而不是 AutoHotKey Launcher。

VSCode remote develop on a remote Windows server

使用 Remote SSH 连到 remote Windows server。

我们需要在 remote Windows server 执行命令来进行编译和测试。

下载的可执行文件被自动删除

需要关闭实时保护。

关闭 Windows 安全中心中的Defender 防病毒保护 - Microsoft 支持

注意，每次重启的时候都会自动打开实时保护，我们需要采取下面的方式来永久的关闭。

ionuttbara/windows-defender-remover: A tool which is uses to remove Windows Defender in Windows 8.x, Windows 10 (every version) and Windows 11.

RDP 调整远端 Windows 分辨率缩放

Linux kernel interfaces

可以分为两种：

给 usersapce 提供的 API（主要就是 syscall，即使是 libc 其实也是对于 syscall 的封装）以及，
给 kernel 其他模块提供的 API。

Linux kernel interfaces - Wikipedia

malloc() 是如何进行内存管理的 / How does malloc() implemented / How does kernel manage userspace memory