Annotation of OpenXM_contrib/gmp/mpn/x86/p6/diveby3.asm, Revision 1.1.1.2
1.1 maekawa 1: dnl Intel P6 mpn_divexact_by3 -- mpn division by 3, expecting no remainder.
2:
1.1.1.2 ! ohara 3: dnl Copyright 2000, 2002 Free Software Foundation, Inc.
1.1 maekawa 4: dnl
5: dnl This file is part of the GNU MP Library.
6: dnl
7: dnl The GNU MP Library is free software; you can redistribute it and/or
8: dnl modify it under the terms of the GNU Lesser General Public License as
9: dnl published by the Free Software Foundation; either version 2.1 of the
10: dnl License, or (at your option) any later version.
11: dnl
12: dnl The GNU MP Library is distributed in the hope that it will be useful,
13: dnl but WITHOUT ANY WARRANTY; without even the implied warranty of
14: dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15: dnl Lesser General Public License for more details.
16: dnl
17: dnl You should have received a copy of the GNU Lesser General Public
18: dnl License along with the GNU MP Library; see the file COPYING.LIB. If
19: dnl not, write to the Free Software Foundation, Inc., 59 Temple Place -
20: dnl Suite 330, Boston, MA 02111-1307, USA.
21:
1.1.1.2 ! ohara 22: include(`../config.m4')
1.1 maekawa 23:
24:
1.1.1.2 ! ohara 25: C P6: 8.5 cycles/limb
! 26:
! 27:
! 28: C The P5 code runs well on P6, in fact better than anything else found so
! 29: C far. An imul is 4 cycles, meaning the two cmp/sbbl pairs on the dependent
! 30: C path are taking 4.5 cycles.
! 31: C
! 32: C The destination cache line prefetching is unnecessary on P6, but removing
! 33: C it is a 2 cycle slowdown (approx), so it must be inducing something good
! 34: C in the out of order execution.
1.1 maekawa 35:
36: MULFUNC_PROLOGUE(mpn_divexact_by3c)
37: include_mpn(`x86/pentium/diveby3.asm')
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>