Because the search and moving is combined in the first case and the
search is just extra work in the second case.
Comparing integers is cheap, compared to moving integers around. Account
for divisions, loop overhead, taken conditional jumps in every loop
iteration vs. non-taken cond jumps, etc ...
PS. Indeed, in the linear search version, the inner loop is typical to
leaq -1(%rcx), %rsi
movl 4(%rdi,%rsi,4), %eax
cmpl %eax, %r9d
movq %rcx, %r8
movq %rsi, %rcx
subl $1, %edx
movl %eax, 4(%rdi,%r8,4)
cmpl $-1, %edx
movq $-1, %rcx
jge .L3 is executed only once and one can
reasonably expect that branch to be predicted non-taken and not have a
detrimental effect on the pipeline.
As for the inner loop in the other version, I don't want to look at it
PS. BTW, the linear search has also somewhat better locality, whereas
the binary search jumps all over the place.