Annotation of OpenXM_contrib/gmp/mpn/x86/copyi.asm, Revision 1.1.1.2
1.1 maekawa 1: dnl x86 mpn_copyi -- copy limb vector, incrementing.
2:
1.1.1.2 ! ohara 3: dnl Copyright 1999, 2000, 2001, 2002 Free Software Foundation, Inc.
1.1 maekawa 4: dnl
5: dnl This file is part of the GNU MP Library.
6: dnl
7: dnl The GNU MP Library is free software; you can redistribute it and/or
8: dnl modify it under the terms of the GNU Lesser General Public License as
9: dnl published by the Free Software Foundation; either version 2.1 of the
10: dnl License, or (at your option) any later version.
11: dnl
12: dnl The GNU MP Library is distributed in the hope that it will be useful,
13: dnl but WITHOUT ANY WARRANTY; without even the implied warranty of
14: dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15: dnl Lesser General Public License for more details.
16: dnl
17: dnl You should have received a copy of the GNU Lesser General Public
18: dnl License along with the GNU MP Library; see the file COPYING.LIB. If
19: dnl not, write to the Free Software Foundation, Inc., 59 Temple Place -
20: dnl Suite 330, Boston, MA 02111-1307, USA.
21:
22: include(`../config.m4')
23:
24:
1.1.1.2 ! ohara 25: C cycles/limb startup (approx)
! 26: C P5: 1.0 35
! 27: C P6 0.75 45
! 28: C K6 1.0 30
! 29: C K7: 1.3 65
! 30: C P4: 1.0 120
! 31: C
! 32: C (Startup time includes some function call overheads.)
! 33:
! 34:
1.1 maekawa 35: C void mpn_copyi (mp_ptr dst, mp_srcptr src, mp_size_t size);
36: C
37: C Copy src,size to dst,size, working from low to high addresses.
38: C
39: C The code here is very generic and can be expected to be reasonable on all
40: C the x86 family.
41: C
1.1.1.2 ! ohara 42: C P6 - An MMX based copy was tried, but was found to be slower than a rep
! 43: C movs in all cases. The fastest MMX found was 0.8 cycles/limb (when
! 44: C fully aligned). A rep movs seems to have a startup time of about 15
! 45: C cycles, but doing something special for small sizes could lead to a
! 46: C branch misprediction that would destroy any saving. For now a plain
! 47: C rep movs seems ok.
1.1 maekawa 48: C
1.1.1.2 ! ohara 49: C K62 - We used to have a big chunk of code doing an MMX copy at 0.56 c/l if
! 50: C aligned or a 1.0 rep movs if not. But that seemed excessive since
! 51: C it only got an advantage half the time, and even then only showed it
! 52: C above 50 limbs or so.
1.1 maekawa 53:
54: defframe(PARAM_SIZE,12)
55: defframe(PARAM_SRC, 8)
56: defframe(PARAM_DST, 4)
57: deflit(`FRAME',0)
58:
1.1.1.2 ! ohara 59: TEXT
1.1 maekawa 60: ALIGN(32)
61:
62: C eax saved esi
63: C ebx
64: C ecx counter
65: C edx saved edi
66: C esi src
67: C edi dst
68: C ebp
69:
70: PROLOGUE(mpn_copyi)
71:
72: movl PARAM_SIZE, %ecx
73: movl %esi, %eax
74:
75: movl PARAM_SRC, %esi
76: movl %edi, %edx
77:
78: movl PARAM_DST, %edi
79:
1.1.1.2 ! ohara 80: cld C better safe than sorry, see mpn/x86/README
1.1 maekawa 81:
82: rep
83: movsl
84:
85: movl %eax, %esi
86: movl %edx, %edi
87:
88: ret
89:
90: EPILOGUE()
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>