Which is faster?

memcpy(b,a,sizeof(int)*n) or the following code:

WORD *a;

WORD *b;

for (int i=0; i<n; i++)

*a++ = *b++;

It turns out that Intel microprocessors have an instruction called repz movs[bdw], which when given the source, destination and count in esi, edi and ecx respectively, it will copy the memory in a single go. bdw stand for byte, word and double word. How will the execution performance vary if we give b or d or w, given that count is properly set in ecx? Given the fact that the data bus width is 32 bit (atleast on IA-32 architectures), if count is greater than 4, it would always make sense to move double words at once.

Update: Even otherwise, it looks to me like the book keeping overhead will be less if the memory copy is implemented as a single instruction. Got this thought when I was reading this.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s