Fix allocation size in SeparableFilterWorkspace
Some extra space is allocated to account for SVE interleaving stores, but the amount of space needed depends on the element size of the intermediate buffer. With this change 3 extra elements are allocated, not just 3 more bytes. (This is true for single-channel input, for multi-channel input more data was allocated.)
So far, it was not a problem, as in the worst case we are using 32bit intermediate type with svst4, where 12 bytes of extra space is needed. But, the size of the allocation is also extended by kAlignment-1, which equals to 15. So, in total 18 more bytes were allocated for single-channel input.