For more, you can refer to Intel ISE: INTEL® AMX INSTRUCTION SET REFERENCE, A-Z

two components:

  • A set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image,
  • and an accelerator able to operate on tiles, the first implementation is called TMUL (tile matrix multiply unit).

TILECFG: The palette value (palette_id) and metadata are held internally in a tile related control register. TILECFG is programmed using the LDTILECFG instruction.

TILECFG is programmed using the LDTILECFG instruction. The selected palette defines the available storage and general configuration while the rest of the memory data specifies the number of rows and column bytes for each tile.

TILECFG

It is a register.

Is each tile has a TILECFG register?

Exiting a tile region is done with the TILERELEASE instruction. It takes no parameters and invalidates all tiles (indicating that the data no longer needs any saving or restoring). Essentially, it is an optimization of LDTILECFG with an implicit palette of 0.

AMX Palette

我的理解是,有几个固定的值可以让我们来选?

What's the relationship between TILE, TILECFG and palette?

AMX Instructions

Advanced Matrix Extension - x86 - WikiChip

AMX introduces 12 new instructions:

Configuration:

  • LDTILECFG - Load tile configuration, loads the tile configuration from the 64-byte memory location specified.
  • STTILECFG - Store tile configuration, stores the tile configuration in the 64-byte memory location specified.

Data:

  • TILELOADD/TILELOADDT1 - Load tile
  • TILESTORED - Store tile
  • TILERELEASE - Release tile, returns TILECFG and TILEDATA to the INIT state
  • TILEZERO - Zero tile, zeroes the destination tile

Operation:

  • TDPBF16PS - Perform a dot-product of BF16 tiles and accumulate the result. Packed Single Accumulation.
  • TDPB[XX]D - Perform a dot-product of Int8 tiles and accumulate the result. Dword Accumulation.
    • Where XX can be: SU = Signed/Unsigned, US = Unsigned/Signed, SS = Signed/Signed, and UU = Unsigned/Unsigned pairs.

Examples:

TDPBUSD tmm1, tmm2, tmm3: Matrix multiply unsigned byte elements from tmm2 by signed byte elements from tmm3 and accumulate the dword elements in tmm1.

AMX related to XSAVE

Intel AMX is XSAVE supported, meaning that it defines processor registers that can be saved and restored using instructions of the XSAVE feature set. Intel AMX is also XSAVE enabled, meaning that it must be enabled by the XSAVE feature set before it can be used.

Intel AMX is associated with two state components, XTILECFG and XTILEDATA.

Intel AMX defines bits 18:17 for its state components:

  • State component 17 is used for the 64-byte TILECFG register (XTILECFG state).
  • State component 18 is used for the 8192 bytes of tile data (XTILEDATA state).