APIC
Every logical core has a LAPIC.
The APIC can be accessed through memory-mapped registers. APIC registers are memory-mapped to a 4-KByte region of the processor’s physical address space with an initial starting address of FEE00000H
. In MP system configurations, the APIC registers are initially mapped to the same 4-KByte region of the physical address space. Software has the option of changing initial mapping to a different 4-KByte region for all the local APICs or of mapping the APIC registers for each local APIC to its own 4-KByte region.
https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
IO APIC 可以一个系统里面有多个,但是 KVM 中只实现了一个,可以根据自己的需要进行添加。
x86 CPU 中一共有三种访问 APIC 的方式:
- 通过 CR8 来访问 TPR 寄存器(仅 IA32-e 即 64 位模式)
- 通过 MMIO 来访问 APIC 的寄存器,这是 xAPIC 模式提供的访问方式
- 通过 MSR 来访问 APIC 的寄存器,这是 x2APIC 模式提供的访问方式
LAPIC 给它所 attach 的那个处理器发中断,处理器不一定会照单全收,会有 Acceptance 的问题。如果处理器不收,会把这个中断退回 LAPIC 并 retire。
What's the difference between APIC, the XAPIC, and the X2APIC?
The primary difference between the APIC and xAPIC architectures is that with the xAPIC architecture, the local APICs and the I/O APIC communicate through the system bus (not the APIC bus anymore). 当然,除此之外也加了一些其他的 feature。The basic operating mode of the xAPIC is xAPIC mode.
The x2APIC architecture is an extension of the xAPIC architecture, primarily to increase processor addressability. The x2APIC architecture provides backward compatibility to the xAPIC architecture and forward extendability for future Intel platform innovations. These extensions and modifications are supported by a new mode of execution (x2APIC mode). On processors supporting x2APIC architecture, the local APIC supports operation both in xAPIC mode and in x2APIC mode. Uses MSR programming interface to access APIC registers in x2APIC mode instead of memory-mapped interfaces. Memory-mapped interface is supported when operating in xAPIC mode. In x2APIC mode, system software uses RDMSR
and WRMSR
to access the APIC registers.
The MSR address range 800H through 8FFH is architecturally reserved and dedicated for accessing APIC registers in x2APIC mode.
PRT, RTE
IO APIC 中存在着一个 PRT 表,每个 PRT 的表项成为 RTE,RTE 是每个引脚一个,IOAPIC 引脚收到中断消息后,根据 RTE 得到目标 LAPIC,并格式化出一个中断消息发送给 LAPIC。
LAPIC / Handling Local Interrupts
Architectures define 256 vector numbers, ranging from 0 through 255. Local and I/O APICs support 240 of these vectors (in the range of 16 to 255) as valid interrupts.
Presence of the Local APIC is indicated by CPUID.
Software interacts with the LAPIC by reading and writing its registers. 所有的 LAPIC 寄存器可以在
- SDM Figure 11-4. Local APIC Structure 或者
- Table 11-1. Local APIC Register Address Map
里看到。The local APIC registers listed in it are not MSRs. The only MSR associated with the programming of the local APIC is the IA32_APIC_BASE
.
The local APIC's registers are memory-mapped in physical page FEE00xxx
, This address is the same for each local APIC that exists in a configuration, meaning you are only able to directly access the registers of the local APIC of the core that your code is currently executing on. (From CPU point of view)
The APIC registers are memory mapped and can be read and written to using the MOV
instruction.
Local APICs can receive interrupts from the following sources:
- 与这个 LAPIC 直接通过引脚连接的设备
- 设备先发送到 I/O APIC,再由 I/O APIC 发送 interrupt 过来
- IPI:从一个 LAPIC 发到另一个 LAPIC
- APIC timer generated interrupts
- Performance monitoring counter interrupts / Thermal Sensor interrupts / APIC internal error interrupts
APIC registers are memory-mapped to a 4-KByte region(一个 page 的大小)of the processor’s physical address space with an initial starting address of FEE00000H.
In MP system configurations, the APIC registers for Intel 64 or IA-32 processors on the system bus are initially mapped to the same 4-KByte region of the physical address space. Software has the option of changing initial mapping to a different 4-KByte region for all the local APICs or of mapping the APIC registers for each local APIC to its own 4-KByte region.
The only MSR associated with the programming of the local APIC is the IA32_APIC_BASE
MSR. 这个 MSR 好像没有 mmap 到上面说的 4-KByte region 里面。
When to deliver the interrupt and when to accept it?
Local APIC deliver interrupts to processor core.
Processor core accepts the interrupt from the local APIC.
When a local interrupt is sent to the processor core, it is subject to the acceptance criteria specified in the interrupt acceptance flow chart in Figure 11-17. If the interrupt is accepted, it is logged into the IRR register and handled by the processor according to its priority. If the interrupt is not accepted, it is sent back to the local APIC and retried.
IRR, IMR, ISR in Local APIC
这三个本来是 PIC(8259A)^ 里面的概念。APIC 里有 IRR、ISR,但是没有 IMR。这两个寄存器是只读的。
长度都是 256bit,The 256 bits in these registers represent the 256 possible vectors
IRR bit set 表示一个 pending 的 interrupt 还没有 deliver 给 CPU。
When the local APIC accepts an interrupt, it sets the bit in the IRR that corresponds the vector of the accepted interrupt. When the processor core is ready to handle the next interrupt, the local APIC clears the highest priority IRR bit that is set and sets the corresponding ISR bit. The vector for the highest priority bit set in the ISR is then dispatched to the processor core for servicing.
While the processor is servicing the highest priority interrupt, the local APIC can send additional fixed interrupts by setting bits in the IRR. When the interrupt service routine issues a write to the EOI register, the local APIC responds by clearing the highest priority ISR bit that is set. It then repeats the process of clearing the highest priority bit in the IRR and setting the corresponding bit in the ISR. The processor core then begins executing the service routing for the highest priority bit set in the ISR. 一个示例:
- 中断到达 APIC,APIC 置上 IRR;
- 对于最高优先级的 bit,APIC clear 其 IRR,set 其 ISR,CPU 的 routine 开始处理 ISR 里对应 bit 的中断;
- CPU 发送 EOI,表示处理完成;处理的这段时间 APIC 可以接受新的中断,设置新的 IRR bits;
- APIC clear 其 ISR bits,然后从其 IRR 那里 poll 新的 bits 让 CPU 处理。
如果对于一个 vector 有多个 interrupt 同时产生怎么办? 我们的 IRR 对于每一个 vector 就一个 bit,那这样岂不是中断会漏掉?The local APIC can set the bit for the vector both in the IRR and the ISR. This means that the IRR and ISR can queue two interrupts for each interrupt vector: one in the IRR and one in the ISR. Any additional interrupts issued for the same interrupt vector are collapsed into the single bit in the IRR.
中断抢占/中断嵌套:If the local APIC receives an interrupt with an interrupt-priority class higher than that of the interrupt currently in service, and interrupts are enabled in the processor core, the local APIC dispatches the higher priority interrupt to the processor immediately (without waiting for a write to the EOI register). The currently executing interrupt handler is then interrupted so the higher-priority interrupt can be handled. When the handling of the higher-priority interrupt has been completed, the servicing of the interrupted interrupt is resumed.
Trigger Mode Register / TMR
The trigger mode register (TMR) indicates the trigger mode of the interrupt. Upon acceptance of an interrupt into the IRR, the corresponding TMR bit is cleared for edge-triggered interrupts and set for level-triggered interrupts. If a TMR bit is set when an EOI cycle for its corresponding interrupt vector is generated, an EOI message is sent to all I/O APICs.
EOI / End-of-Interrupt Register
The interrupt handler software in OS must include a write to the EOI register. This write must occur at the end of the handler routine. This action indicates that the servicing of the current interrupt is complete and the local APIC can issue the next interrupt from the ISR.
Upon receiving an EOI, the APIC clears the highest priority bit in the ISR and dispatches the next highest priority interrupt to the processor.
软件直接写 IOAPIC 的 EOI 而不是 LAPIC 的。System software may prefer to direct EOIs to specific I/O APICs rather than having the local APIC send end-ofinterrupt messages to all I/O APICs.
LVT (Local Vector Table) registers
Local APICs can receive interrupts from the following sources, several of them are local interrupt sources.
The local vector table (LVT) allows software to specify the manner the local interrupts are delivered to the processor core.
LVT 不是一个寄存器,它是包含了下面所有的寄存器:
- LVT CMCI Register
- LVT Timer Register
- LVT Thermal Monitor Register
- LVT Performance Counter Register
- LVT LINT0 Register
- LVT LINT1 Register
- LVT Error Register
请把这些 register 和 ICR 区分开,这些是提前配置好之后按照一定条件触发的,而 ICR 是用户主动去写来发送 IPI 的,本质上不一样。
这些寄存器指明了在特定的情况下如何进行 interrupt delivery,比如 LVT Timer Register 就描述了当一个 APIC timer 中断发生时应该如何对其进行 deliver,这些都是软件可以进行配置的。这些寄存器每一位代表的格式都是类似的,可以参考 Figure 11-8. Local Vector Table (LVT) 来看这些寄存器每一位的意思。
Interrupt Vector (bit0-7)
See Interrupt, Task, and Processor Priority^.
Thus, each interrupt vector comprises two parts, with the high 4 bits indicating its interrupt-priority class and the low 4 bits indicating its ranking within the interrupt-priority class.
LVT Delivery Mode (bit8-bit10, 3bit)
指明了发送给处理器的中断的类型。除了第一种类型(Fixed),其他情况 vector field 的信息 (bit0-7) 都会被忽略。
- 000 (Fixed): Delivers the interrupt specified in the vector field (bit0-7).
- 010 (SMI): Delivers an SMI interrupt to the core, the vector field (bit0-7) should be set to 00H.
- 100 (NMI): Delivers an NMI interrupt to the processor. The vector information (bit0-7) is ignored.
- 101 (INIT): Delivers an INIT request to the core, the vector field (bit0-7) should be set to 00H.
- …
LVT Delivery Status (Read Only) (bit 12)
Indicates the interrupt delivery status:
- 0 (Idle): 表示中断源的中断已经 deliver 了并且被 CPU accepted 了。
- 1 (Send Pending): 表示 deliver 了,但是还没有 accepted。
LVT Mast Bit (bit 16)
表示是否接收这个 source 的 interrupt。比如当 local APIC 处理 performance-monitoring counters interrupt 的时候,它会自动置上 LVT performance counter register 的 mask bit。
Error handling in local interrupts / ESR
The local APIC records errors detected during interrupt handling in the error status register (ESR).
The ESR is a write/read register. Before attempt to read from the ESR, software should first write to it. This write clears any previously logged errors and updates the ESR with any errors detected since the last write to the ESR. This write also rearms the APIC error interrupt triggering mechanism.
不要把它和 LVT Error Register 搞混了。
Enabling or Disabling the Local APIC
Chapter 10 Advanced Programmable Interrupt Controller (APIC)
10.4 Local APIC
SDM 10.4.3 Enabling or Disabling the Local APIC
APIC Timer
APIC timer 包含四个 register,这些 register 也是在 local APIC 里面的:
- Divide configuration register
- Initial-count register(这个寄存器是可写的)
- Current-count register(这个寄存器是只读的)
- LVT timer register(这个寄存器只有 3 个 bit 是 meaningful 的,其他 bit 都是 reserved 的,这三个 bit 来决定要除的值从 1 到 $2^7$)。
APIC timer modes / one-shot mode / periodic mode / TSC-deadline mode
APIC timer 有三种触发模式。这三种触发模式都是通过配置 LVT timer register 来设置的。
- One-shot 模式:initial-count register 被设置时开始计时,同时这个值会被拷贝到 current-count register 中,计时过程中 current-count register 的值会减小,变为 0 时计时结束并产生一个 timer interrupt,当被再次设置时重新开始计时;
- Periodic 模式:和 One-shot 一样,只不过在变为 0 后产生一个中断,同时 current-count register 会被自动置为 initial-count register 的值并且重新开始计时;
-
TSC-deadline 模式:允许在一个绝对的时间产生中断,写
IA32_TSC_DEADLINE
来决定在什么时候产生中断。当 TSC 的值大于此 MSR 的值时,产生中断。与 One-shot 类似,中断仅产生一次。
A write of 0 to the initial-count register effectively stops the local APIC timer, in both one-shot and periodic mode.
IA32_TSC_DEADLINE
See SDM: 10.5.4.1 TSC-Deadline Mode
Per-logical processor MSR that specifies the time at which a timer interrupt should occur. Writing a value into it arms the timer. An interrupt is generated when the logical processor’s tsc equals or exceeds the target value in it.
When the interrupt is generated, it disarms itself and clears it. Thus, each write to the it generates at most one timer interrupt. In TSC-deadline mode, writing 0 to it disarms the local-APIC timer. Transitioning between TSC-deadline mode and other timer modes also disarms the timer.
APIC timer rate/frequency
APIC timer 和 TSC 一样,有一个类似 invtsc 的东西:
- If CPUID.06H:EAX.ARAT[bit 2] = 1, the processor’s APIC timer runs at a constant rate regardless of P-state transitions and it continues to run at the same rate in deep C-states.
- If CPUID.06H:EAX.ARAT[bit 2] = 0 or if CPUID 06H is not supported, the APIC timer may temporarily stop while the processor is in deep C-states or during transitions caused by Enhanced Intel SpeedStep® Technology.
APIC timer frequency: The APIC timer frequency will be the processor’s bus clock or core crystal clock frequency (when TSC/core crystal clock ratio is enumerated in CPUID leaf 0x15) divided by the value specified in the divide configuration register.
Sending IPI using local APIC / Interrupt Command Register (ICR)
The primary local APIC facility for issuing IPIs is the interrupt command register (ICR).
SDM 10.6.1 Interrupt Command Register (ICR)
10.6 ISSUING INTERPROCESSOR INTERRUPTS
CHAPTER 10 ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC)
it is a 64-bit local APIC register.
To send an IPI, software must set up the ICR to indicate the type of IPI message to be sent and the destination processors(说明可以一次给多个 processor 发送 IPI)。
Fields' R/W permission of ICR
All fields of the ICR are read-write by software except the delivery status field, which is read-only.
How to send the IPI?
Writing to the low doubleword of the ICR causes the IPI to be sent.
ICR Delivery Mode / IPI Message Type
可以用来发送 IPI。和其他 LVT^ 的 delivery mode 一样,也是这几种模式。
Specifies the type of IPI to be sent. This field is also know as the IPI message type field.
- Fixed: Delivers the interrupt specified in the vector field to the target processor or processors.
- Lowest Priority: Same as fixed mode, except that the interrupt is delivered to the processor executing at the lowest priority among the set of processors specified in the destination field. 也就是说只给 CPL 最低的 processor 发?
- SMI: …
- NMI: Delivers an NMI interrupt to the target processor or processors.
- INIT:
- INIT Level De-assert … …
ICR Destination Mode
2 modes: physical (0) or logical (1).
ICR Trigger Mode
只有当 Delivery Mode 是 INIT Level De-assert 才有效(注意不是 INIT)。
edge (0) or level (1).
It is ignored for all other delivery modes.
LDR (Local destination register) / DFR (Destination format register) / APR (Arbitration Priority Register)
这些都是发送 IPI 相关的寄存器。
Interrupt vector priority
一言以蔽之,谁的号越大,谁的优先级就越高。
SDM: 11.8.3
Each interrupt delivered to the processor through the local APIC has a priority based on its vector number. The local APIC uses this priority to determine when to service the interrupt relative to the other activities of the processor, including the servicing of other interrupts.
Each interrupt vector is an 8-bit value. The interrupt-priority class is the value of bits 7:4 of the interrupt vector. The lowest interrupt-priority class is 1 and the highest is 15. software should configure interrupt vectors to use interrupt-priority classes in the range 2–15.
The relative priority of interrupts within an interrupt-priority class is determined by the value of bits 3:0 of the vector number.
Thus, each interrupt vector comprises two parts, with the high 4 bits indicating its interrupt-priority class and the low 4 bits indicating its ranking within the interrupt-priority class.
Task priorities / TPR (Task-Priority Register)
也是包含在 APIC 的 4K 页面里众多 register 中的一个。
The task-priority class is the value of bits 7:4 (16 classes) of the task-priority register (TPR), which can be written by software (TPR is a read/write register);
The task priority allows software to set a priority threshold for interrupting the processor. This mechanism enables the OS to temporarily block low priority interrupts from disturbing high-priority work that the processor is doing.
The ability to block such interrupts using task priority results from the way that the TPR controls the value of the processor-priority register (PPR)。这句话表示,TPR 会影响 PPR 的值,PPR 才会影响最后 block 哪些 interrupt,所以 TPR 相当于是间接影响的。
这里也谈到 TPR 就是间接影响的。
比如在处理一个中断的时候,设置 TPR 为 8,这样小于等于 8 的中断就没有办法来抢占我们现在这个中断的处理函数了。
Processor priorities / PPR (Processor-Priority Register)
也是包含在 APIC 的 4K 页面里众多 register 中的一个。
The PPR is a read-only register. The processor-priority class represents the current priority at which the processor is executing.
The processor-priority class is a value in the range 0–15 that is maintained in bits 7:4 of the PPR.
The value of the PPR is based on the value of TPR and the value ISRV; ISRV is the vector number of the highest priority bit that is set in the ISR or 00H if no bit is set in the ISR. 其实就是取最大值。
PPR block 说明了一个中断能否被 preempt 取决于两个值:
- 如果新的 interrupt 的优先级低于正在处理的 interrupt,那么不能 preempt
- 如果新的 interrupt 优先级更高,但是仍然低于 TPR,那么也不能 preempt。
这里的 preempt 指的是中断是否会 deliver 给处理器。
Why?
The processor-priority class determines the priority threshold for interrupting the processor. The processor will deliver only those interrupts that have an interrupt-priority class higher than the processor-priority class in the PPR:
- If the processor-priority class is 0, the PPR does not inhibit the delivery any interrupt;
- if it is 15, the processor inhibits the delivery of all interrupts. (The processor-priority mechanism does not affect the delivery of interrupts with the NMI, SMI, INIT, ExtINT, INIT-deassert, and start-up delivery modes.)
The processor does not use the processor-priority sub-class to determine which interrupts to deliver and which to inhibit. (The processor uses the processor-priority sub-class only to satisfy reads of the PPR.)
Spurious Interrupt / 假中断 / spurious-interrupt vector register / SVR
只需知道有的操作组合会导致一个假中断发生。
Dispensing the spurious-interrupt vector does not affect the ISR, so the handler for this vector should return without an EOI.
对于假中断,也有对应的中断向量号,并且这个号是可以被软件进行设置的。
In-kernel APIC and Userspace APIC
I/O APIC
The IO APIC is incorporated within the chipset.
There is typically one I/O APIC for each peripheral bus in the system.
Basically, PCI devices produce interrupts. The IOAPIC receives the interrupt signal and sends it to a destination determined by its settings.