名词:

  • AMD-SP (PSP): AMD Secure Processor 和 PSP (Platform Secure Processor) 的区别是什么?以前叫 PSP,现在叫 AMD-SP,其实是一个东西。

在 SEV 架构下,每个机密虚机都关联了一个密钥,密钥管理由 PSP 负责;CPU 与 PSP 之间基于厂商提供的 SEV API 实现 SEV Driver,Hypervisor (KVM) 集成 SEV Dirver 后,进一步通过调用 SEV API 实现对机密虚拟机的生命周期管理。

SEV 允许 page 被 swap to disk。

AMD 硬件逻辑中为每个机密虚机都关联一个密钥,作为加密该虚机内存的密钥,这是 PSP 的内部逻辑,那么 Hypervisor 软件如何指定虚机关联的密钥呢?AMD 将 VMCB 结构中的 ASID 字段作为了机密虚机关联的密钥 ID,当执行 VMRUN 指令运行虚机时,硬件读取 VMCB 结构中的 ASID 字段的值,作为密钥 ID。

  • ✓:Mitigated
  • ★:Optionally mitigated
  • ⊗:Not mitigated
Potential Threats SEV SEV-ES SEV-SNP
Confidentiality      
VM Memory
Example attack: Hypervisor reads private VM memory
VM Register State
Example attack: Read VM register state after VMEXIT
DMA Protection
Example attack: Device attempts to read VM memory
Integrity      
Replay Protection
Example attack: Replace VM memory with an old copy
Data Corruption
Example attack: Replace VM memory with junk data
Memory Aliasing
Example attack: Map two guest pages to same DRAM page
Memory Re-Mapping
Example attack: Switch DRAM page mapped to a guest page
Availability      
Denial of Service on Hypervisor
Example attack: Malicious guest refuses to yield/exit
Denial of Service on Guest
Example attack: Malicious hypervisor refuses to run guest
Physical Access Attacks      
Offline DRAM analysis
Example attack: Cold boot
Active DRAM corruption
Example attack: Manipulate DDR bus while VM is running
Misc.      
TCB Rollback
Example attack: Revert AMD-SP firmware to old version
Malicious Interrupt/Exception Injection
Example attack: Inject interrupt while RFLAGS.IF=0
Indirect Branch Predictor Poisoning
Example attack: Poison BTB from hypervisor
Secure Hardware Debug Registers
Example attack: Change breakpoints during debug
Trusted CPUID Information
Example attack: Hypervisors lies about platform capabilities
Architectural Side Channels
Example attack: PRIME+PROBE to track VM accesses
Page-level Side Channels
Example attack: Track VM access patterns through page tables
Performance Counter Tracking
Example attack: Fingerprint VM apps by performance data

SEV, SEV-ES (Encrypted State), SEV-SNP (Secure Nested Paging)

  • SEV 主要是能够对内存做加密了。
  • SEV-ES 加入了对于寄存器的加密。
  • SEV-SNP 加入对于内存完整性,

安全能力依次增强,目前来说一般都会采用 SNP 的这个。

SME (Secure Memory Encryption), SME-MK

SME 对标 TME 的。SME-MK 对标 MKTME,原理一模一样。

Guest Owner

传统的虚拟化架构中有两个角色,分别是 Hypervisor 和 Guest,Hypervisor 负责了所有 Guest 的权限管理,可以对虚机的代码和数据进行查看和修改。具体表现为:

  • Hypervisor 可以访问虚机 GPA 对应的 HPA 内存内容。
  • Hypervisor 可以访问虚机使用的 CPU 寄存器内容。
  • Hypervisor 可以篡改虚机使用的镜像文件。
  • Hypervisor 可以根据需要修改启动虚机所使用的命令行参数。

SEV 架构下新增了一个角色 Guest Owner,将 Hypervisor 权限中的上述部分剥离出来,赋给了新角色 Guest Owner。Hypervisor 的 CPU 计算能力仍然被使用,但管理属性被削弱。SEV 架构下,可以将 Hypervisor 的看成是提供计算虚拟化能力的虚机管理代理角色。

Guest Owner 是不是就是 TDX Module?

安全处理器(PSP - Platform Secure Processor, AMD-SP)

相比于原来的 CPU,引入的一个新 CPU,而且还是 ARM 架构的。

The PSP is an ARM Cortex A5 with some private SRAM, a hardware crypto engine (a rather impressive one, if I may add), and on chip boot rom.

SEV-ES

SEV-ES 特性还可以让 Guest 决定哪些寄存器可以暴露给 hypervisor,哪些不可以暴露。

TDX 也是有类似的功能的。

SEV-SNP

The basic principle of SEV-SNP integrity is that if a VM is able to read a private (encrypted) page of memory, it must always read the value it last wrote.

SEV-SNP 没有办法直接 IO 到 private memory,还是需要一个 shared memory 作为中转站。

Under SEV-SNP, all other CPU software components and PCI devices are treated as fully untrusted. This includes the BIOS on the host system, the hypervisor, device drivers, other VMs, etc.

SEV and SEV-ES use the threat model of a “benign but vulnerable” hypervisor. In this threat model, the hypervisor is not believed to be 100% secure, but it is trusted to act with benign intent. Meaning that while the hypervisor was not actively trying to compromise the SEV VMs underneath it, it could itself have exploitable vulnerabilities. SEV-SNP 则假设其实 hypervisor 也有可能是恶意的。

SNP 能保证 guest 不会 DDoS to VMM,但是没有办法保证 VMM DDoS to guest。VMM 可以选择一直不运行一个 guest。

Reverse map table

解决以下三种攻击:

  1. 重放攻击:修改内存页内容为一个之前的内容,依赖于对于内存页的修改能力;
  2. Corruption 攻击:修改内存页为一个 corrupted 的值,依赖于对于内存页的修改能力。
  3. Memory aliasing:两个 guest 页同时映射到同一个物理页;

The RMP is a single data structure shared across the system that contains one entry for every 4k page of DRAM that may be used by VMs. The goal of the RMP is simple: it tracks the owner for each page of memory. Pages of memory can be

  • owned by the hypervisor,
  • owned by a specific VM,
  • or owned by the AMD-SP

The RMP is indexed by system physical address and is checked at the end of CPU and IOMMU table-walks. For example, in native (non-VM) mode, VAs are translated into PAs using the standard x86 page tables. After that translation, the final PA is used to index the RMP. The RMP entry is read out and checked.

  • If the RMP entry indicates that the page is a hypervisor-owned page, then the checks pass and a new TLB entry is created.
  • If the RMP entry indicates that the page is not a hypervisor-owned page though, the table-walk faults (#PF) and the access is denied.

下面这个图非常直观表示了对于一个 HVA 翻译以及一个 GVA 的翻译中,RMP 的检查过程(RMP 中保存了 GPA 信息,这就是其被称为 reverse 的由来,因为有 HPA -> GPA 的映射,这么设计的目的是为了防止上述第三种攻击,因为前两种攻击不需要 GPA,只需要记录 owner 是谁就可以了):

Page validation

解决攻击:

  • 把一个 guest 页映射到两个或多个物理页(page remapping attack),In this threat, the guest might see an inconsistent view of memory where only a subset of data it wrote appears in memory. 因为可能每次写的时候,映射到的页都不一样。虽然 nested page table 已经能够让一个 GPA 映射到一个 HPA 上,但是 VMM 可以随时更改它。有了 page validation,当更改映射的时候,guest 能够知道自己 under attack 并采取一些措施。

给 RMP entry 加了个 Validated bit,从而使得一个物理页的状态如下进行转换(RMPUPDATE, PVALIDATE 是指令):

When a new RMP entry is created for a guest, automatically clear to 0 by CPU,这种页(clear to 0)不能被 hypervisor 使用,也不能当作 private page 被自己使用,因为还没有被 validate。只有当 guest 执行了(且只有 guest 能执行)PVALIDATE 指令之后,转换为 validated = 1 时才可以使用。

  • First, the hypervisor assigns the page to the guest using the new RMPUPDATE instruction. This transitions the page into the Guest-Invalid state.
  • Second, the guest validates the page using the new PVALIDATE instruction to transition the page to the Guest-Valid state, from where it can be used.

关键点在于,对于同一个 GPA,guest 不应该 PVALIDATE 两次。这样就避免了把一个 GPA 映射到不同物理页上的可能。This can be accomplished simply by the guest VM validating all of its memory at boot and refusing to ever validate additional memory,这就看 guest 怎么去实现这个了。

一个防止攻击的例子:GPA A is initially mapped to SPA X. The guest does a PVALIDATE to validate this translation, which causes the Validated bit to be set in the RMP entry corresponding to SPA X. If the hypervisor then maliciously attempts to remap A to a different SPA Y, it will start by creating an RMP entry for SPA Y attempting to map the same GPA A using the RMPUPDATE instruction. The hypervisor then maliciously modifies the nested page table (NPT) to re-map GPA A to Y. When the guest accesses Y however it will get a #VC (VMM Communication) exception. This exception occurs because the Validated bit in the RMP entry corresponding to SPA Y was clear (as when RMPUPDATE was executed to assign a new page to the guest, it initially cleared the Validated bit). As the guest knows it had already validated GPA A, it knows it should not be receiving a validation error and therefore it is under attack and the hypervisor is not behaving correctly. In response, the guest can terminate or take other steps to protect itself.

Page states

RMP 表里面记录了每一个物理 page 的状态。

  • Pages in the Hypervisor state can be read/written by the hypervisor, or by SEV-SNP VMs accessing the memory with C=0 (shared pages).
  • Pages in the Guest-Valid state by contrast can be read/written by SEV-SNP VMs but cannot be written by the hypervisor.

三个基本页面状态:Hypervisor, Guest-Valid, and Guest-Invalid states。但是其实总共有 8 个状态。所有的状态转移都是通过以下三种方式驱动的:RMPUPDATE, PVALIDATE, VM management API in the AMD-SP (由 hypervisor 来调用的)。五个 immutable 的状态:

  • PRE-GUEST:
  • PRE-SWAP:
  • FIRMWARE:
  • METADATA:
  • CONTEXT:

For security reasons, any pages the AMD-SP will manipulate must be placed into special states, called Immutable states, prior to issuing the necessary API call. Immutable 的 page 不能被 VMM/Guest 写,其 RMP entry 只能被 AMD-SP 操作。谁负责把一个 page state 改到 immutable state 呢?

For example, ‘Metadata’ pages are a type of Immutable pages. These pages are only writeable by the AMD-SP and are used to hold metadata entries associated with guest pages that have been swapped to disk. Because of the SEV-SNP integrity guarantees, any pages that are swapped to disk must have their integrity confirmed before they can be swapped back into memory. When a page is swapped to disk, the AMD-SP creates a metadata entry containing an authentication tag (from AES-GCM), as well as data from the page’s RMP entry, such as the GPA where it was located. As the Metadata page itself is not writeable by the hypervisor, the integrity of this information is guaranteed. When a page is swapped back into memory, the AMD-SP verifies the contents were unchanged and ensures the page enters the guest address space at the same location as it was before. Metadata pages themselves can also be swapped to disk in a similar fashion, allowing for the entire guest to be saved to disk if desired.

VMPLs (Virtual Machine Privilege Levels)

是一个 Optional feature,allows a guest VM to divide its address space into four levels. VMPL0 is the highest privilege and VMPL3 is the least privileged.

When this feature is enabled, every vCPU of a VM is assigned a VMPL. 当创建 VCPU 实例时,其 VMPL 被分配到创建的 VMSA 中,并在整个 VCPU 生命周期内保持不变。

每个页对于不同的 VMPL 设置了不同的 access right,记录在 RMP 中。Each RMP entry contains a set of permission masks, one mask for each implemented VMPL. 也就是一个页对于每一个 VMPL 都有对应的权限设置。这样当访问发生时会将访问页所对应的 vCPU 作为 index 找到 RMP entry 对应的 mark 中对应的访问权限,来决定能不能访问。

When a page is first validated by a guest, VMPL0 is granted full permissions to the page and all other VMPLs are granted no permissions. The guest can choose to modify any VMPL permissions for any page via the new RMPADJUST instruction.

The RMPADJUST instruction allows a given VMPL to modify permissions for a less privileged VMPL. 也就是说运行在 VMPL0 vCPU 上的程序可以执行命令 RMPADJUST 来调整任何一个页上小于它的 VMPL 比如 VMPL1~3 上的访问权限。

VMPL 和 ring 是正交的。

CPUID filtering

SEV-SNP supports an optional capability to filter CPUID results through the AMD-SP. The AMD-SP will verify that the CPUID results that the hypervisor is reporting are no greater than the capabilities of the platform and that security sensitive information, such as the x86 Extended Save Area size, is correct.

业界对于 SEV 安全性的 concern

攻击类型

完整性攻击

  • Attacker 直接更改内存中的值,改成一个随机值;
  • Replay attack: Attacker 记录下来之前的值,然后把内存中的值改为之前记录的这个值。