Skip to content

Adjust Separable Filter 2D for performance

Igor Podgainoi requested to merge fix-sepfilter2d-perf into main

The following optimizations were carried out:

  • Using kernel vectors instead of immediate values in vector intrinsics
  • Using combined widen/multiply and widen/multiply-accumulate intrinsics (now possible due to the previous change)
  • Prefer "high" versions of these intrinsics (for NEON)
  • Using a bigger intermediate type (uint16_t), thus avoiding extra narrowing inbetween vertical and horizontal code paths
Edited by Igor Podgainoi

Merge request reports