fix(sl): Fix alignment mask when userBufferOffset is true
TENSOR_BUFFER_FORCE_ALIGNTMENT was being 0 extended. This meant when we attempted to align values > 32bits we were zeroing out the high bits. That forced the reserved size to always be <2gb which meant that large flatbuffers were under-reserving size and then re-allocating dynamically during tensor appending. That dramatically slowed down serialization. Graphs under 2gb were unaffected, even when using buffer offsets.
Signed-off-by: Ryan O'Shea ryan.oshea3@arm.com