Annotation of OpenXM_contrib/gmp/mpn/sparc32/README, Revision 1.1.1.2
1.1.1.2 ! ohara 1: Copyright 1996, 2001 Free Software Foundation, Inc.
! 2:
! 3: This file is part of the GNU MP Library.
! 4:
! 5: The GNU MP Library is free software; you can redistribute it and/or modify
! 6: it under the terms of the GNU Lesser General Public License as published by
! 7: the Free Software Foundation; either version 2.1 of the License, or (at your
! 8: option) any later version.
! 9:
! 10: The GNU MP Library is distributed in the hope that it will be useful, but
! 11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
! 12: or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
! 13: License for more details.
! 14:
! 15: You should have received a copy of the GNU Lesser General Public License
! 16: along with the GNU MP Library; see the file COPYING.LIB. If not, write to
! 17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
! 18: 02111-1307, USA.
! 19:
! 20:
! 21:
! 22:
! 23:
1.1 maekawa 24: This directory contains mpn functions for various SPARC chips. Code that
25: runs only on version 8 SPARC implementations, is in the v8 subdirectory.
26:
27: RELEVANT OPTIMIZATION ISSUES
28:
29: Load and Store timing
30:
31: On most early SPARC implementations, the ST instructions takes multiple
32: cycles, while a STD takes just a single cycle more than an ST. For the CPUs
33: in SPARCstation I and II, the times are 3 and 4 cycles, respectively.
1.1.1.2 ! ohara 34: Therefore, combining two ST instructions into a STD when possible is a
! 35: significant optimization.
1.1 maekawa 36:
37: Later SPARC implementations have single cycle ST.
38:
39: For SuperSPARC, we can perform just one memory instruction per cycle, even
40: if up to two integer instructions can be executed in its pipeline. For
41: programs that perform so many memory operations that there are not enough
42: non-memory operations to issue in parallel with all memory operations, using
43: LDD and STD when possible helps.
44:
1.1.1.2 ! ohara 45: UltraSPARC-1/2 has very slow integer multiplication. In the v9 subdirectory,
! 46: we therefore use floating-point multiplication.
! 47:
1.1 maekawa 48: STATUS
49:
50: 1. On a SuperSPARC, mpn_lshift and mpn_rshift run at 3 cycles/limb, or 2.5
51: cycles/limb asymptotically. We could optimize speed for special counts
52: by using ADDXCC.
53:
54: 2. On a SuperSPARC, mpn_add_n and mpn_sub_n runs at 2.5 cycles/limb, or 2
55: cycles/limb asymptotically.
56:
57: 3. mpn_mul_1 runs at what is believed to be optimal speed.
58:
59: 4. On SuperSPARC, mpn_addmul_1 and mpn_submul_1 could both be improved by a
1.1.1.2 ! ohara 60: cycle by avoiding one of the add instructions. See a29k/addmul_1.
1.1 maekawa 61:
62: The speed of the code for other SPARC implementations is uncertain.
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>