Fix for Int4 per-channel SME GEMM kernel failing with n > 64
In kai_matmul_clamp_f32_qai8dxp1vlx8_qsi4cxp4vlx8_1vlx4vl_sme2_mopa:
- Fix the offset calculation
- Fix pointer increments in the matmul
Add new shapes to unit tests, to test n > 64
Resolves: #KLEIDIAI-405, #COMPMID-7918
Signed-off-by: Anitha Raj anitha.raj@arm.com