Speed up benchmarks
Much of the benchmark time was spent initializing buffers. This change reduces that time significantly. Additionally, to ensure more consistent benchmark results exclude the first iteration from measurements.
While refactoring, errors were found in many stride arguments. These have been fixed.