Enable and, add, sub, mul, absdiff in OpenCV HAL
Experiments have shown that although there's little scope for optimizing such simple operations, the KleidiCV implementations are marginally faster than OpenCV.
Don't enable 32-bit operations in HAL because OpenCV requires that these do not saturate, unlike the KleidiCV implementations.