Matmul Micro-kernels F32/F16 <- (QSI8D32) LHS x (QAI4C32) RHS (!311) · Merge requests · Kleidi / KleidiAI

Anitha Raj requested to merge int4_asym into main Feb 21, 2025

Micro-kernels to compute the matrix multiplication of dynamically quantized symmetric signed 8-bit integer with per-block quantization (QSI8D32) LHS matrix and quantized asymmetric 4-bit signed integer with per-block quantization (QAI4C32) RHS matrix and the accumulation of the result into a single-precision (F32) and half-precision (F16) output:

Matrix multiplication (MxN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F32 output, optimized for FEAT_I8MM.
Matrix multiplication (1xN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F32 output, optimized for FEAT_DotProd.
Matrix multiplication (MxN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F16 output, optimized for FEAT_I8MM.
Matrix multiplication (1xN) Micro-kernels of QSI8D32 LHS and QAI4C32 RHS with F16 output, optimized for FEAT_DotProd.

Signed-off-by: Anitha Raj anitha.raj@arm.com

Matmul Micro-kernels F32/F16 <- (QSI8D32) LHS x (QAI4C32) RHS

Merge request reports