Annotation of OpenXM_contrib/gmp/mpn/pa64/README, Revision 1.1.1.2
1.1.1.2 ! ohara 1: Copyright 1999, 2001, 2002 Free Software Foundation, Inc.
! 2:
! 3: This file is part of the GNU MP Library.
! 4:
! 5: The GNU MP Library is free software; you can redistribute it and/or modify
! 6: it under the terms of the GNU Lesser General Public License as published by
! 7: the Free Software Foundation; either version 2.1 of the License, or (at your
! 8: option) any later version.
! 9:
! 10: The GNU MP Library is distributed in the hope that it will be useful, but
! 11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
! 12: or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
! 13: License for more details.
! 14:
! 15: You should have received a copy of the GNU Lesser General Public License
! 16: along with the GNU MP Library; see the file COPYING.LIB. If not, write to
! 17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
! 18: 02111-1307, USA.
! 19:
1.1 maekawa 20:
21:
22:
1.1.1.2 ! ohara 23: This directory contains mpn functions for 64-bit PA-RISC 2.0.
! 24:
! 25: PIPELINE SUMMARY
! 26:
! 27: The PA8x00 processors have an orthogonal 4-way out-of-order pipeline. Each
! 28: cycle two ALU operations and two MEM operations can issue, but just one of the
! 29: MEM operations may be a store. The two ALU operations can be almost any
! 30: combination of non-memory operations. Unlike every other processor, integer
! 31: and fp operations are completely equal here; they both count as just ALU
! 32: operations.
! 33:
! 34: Unfortunately, some operations cause hickups in the pipeline. Combining
! 35: carry-consuming operations like ADD,DC with operations that does not set carry
! 36: like ADD,L cause long delays. Skip operations also seem to cause hickups. If
! 37: several ADD,DC are issued consecutively, or if plain carry-generating ADD feed
! 38: ADD,DC, stalling does not occur. We can effectively issue two ADD,DC
! 39: operations/cycle.
! 40:
! 41: Latency scheduling is not as important as making sure to have a mix of ALU and
! 42: MEM operations, but for full pipeline utilization, it is still a good idea to
! 43: do some amount of latency scheduling.
! 44:
! 45: Like for all other processors, RAW memory scheduling is critically important.
! 46: Since integer multiplication takes place in the floating-point unit, the GMP
! 47: code needs to handle this problem frequently.
1.1 maekawa 48:
49: STATUS
50:
1.1.1.2 ! ohara 51: * mpn_lshift and mpn_rshift run at 1.5 cycles/limb on PA8000 and at 1.0
! 52: cycles/limb on PA8500. With latency scheduling, the numbers could be
! 53: improved to 1.0 cycles/limb for all PA8x00 chips.
! 54:
! 55: * mpn_add_n and mpn_sub_n run at 2.0 cycles/limb on PA8000 and at about 1.9
! 56: cycles/limb on PA8500. With latency scheduling, this could be improved to
! 57: 1.5 cycles/limb.
! 58:
! 59: * The mpn_addmul_1 run at 6.25 cycles/limb. The current code uses ADD,DC for
! 60: adjacent limbs, and relies heavily on reordering.
1.1 maekawa 61:
1.1.1.2 ! ohara 62: * Both mpn_mul_1 and mpn_submul_1 run at around 11 cycles/limb. There is
! 63: obviously room for improving these along the lines of mpn_addmul_1.
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>