Override asm-generic implementations. We basically gain on 2 fronts
* checks for alignment no longer needed as we are only doing "unit"
sized copies.
(Careful observer could argue that While the kernel buffers are aligned,
the user buffer in theory might not be - however in that case the
user space is already broken when it tries to deref a hword/word
straddling word boundary - so we are not making it any worse).
* __copy_{to,from}_user( ) returns bytes that couldn't be copied,
whereas get_user() returns 0 for success or -EFAULT (not size). Thus
the code to do leftover bytes calculation can be avoided as well.