Annotation of OpenXM_contrib/gmp/mpn/pa32/README, Revision 1.1
1.1 ! ohara 1: Copyright 1996, 1999, 2001 Free Software Foundation, Inc.
! 2:
! 3: This file is part of the GNU MP Library.
! 4:
! 5: The GNU MP Library is free software; you can redistribute it and/or modify
! 6: it under the terms of the GNU Lesser General Public License as published by
! 7: the Free Software Foundation; either version 2.1 of the License, or (at your
! 8: option) any later version.
! 9:
! 10: The GNU MP Library is distributed in the hope that it will be useful, but
! 11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
! 12: or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
! 13: License for more details.
! 14:
! 15: You should have received a copy of the GNU Lesser General Public License
! 16: along with the GNU MP Library; see the file COPYING.LIB. If not, write to
! 17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
! 18: 02111-1307, USA.
! 19:
! 20:
! 21:
! 22:
! 23:
! 24:
! 25: This directory contains mpn functions for various HP PA-RISC chips. Code
! 26: that runs faster on the PA7100 and later implementations, is in the pa7100
! 27: directory.
! 28:
! 29: RELEVANT OPTIMIZATION ISSUES
! 30:
! 31: Load and Store timing
! 32:
! 33: On the PA7000 no memory instructions can issue the two cycles after a store.
! 34: For the PA7100, this is reduced to one cycle.
! 35:
! 36: The PA7100 has a lookup-free cache, so it helps to schedule loads and the
! 37: dependent instruction really far from each other.
! 38:
! 39: STATUS
! 40:
! 41: 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
! 42: instructions below (but some sw pipelining is needed to avoid the
! 43: xmpyu-fstds delay):
! 44:
! 45: fldds s1_ptr
! 46:
! 47: xmpyu
! 48: fstds N(%r30)
! 49: xmpyu
! 50: fstds N(%r30)
! 51:
! 52: ldws N(%r30)
! 53: ldws N(%r30)
! 54: ldws N(%r30)
! 55: ldws N(%r30)
! 56:
! 57: addc
! 58: stws res_ptr
! 59: addc
! 60: stws res_ptr
! 61:
! 62: addib Loop
! 63:
! 64: 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
! 65: (asymptotically) on the PA7100, using the instructions below. With proper
! 66: sw pipelining and the unrolling level below, the speed becomes 8
! 67: cycles/limb.
! 68:
! 69: fldds s1_ptr
! 70: fldds s1_ptr
! 71:
! 72: xmpyu
! 73: fstds N(%r30)
! 74: xmpyu
! 75: fstds N(%r30)
! 76: xmpyu
! 77: fstds N(%r30)
! 78: xmpyu
! 79: fstds N(%r30)
! 80:
! 81: ldws N(%r30)
! 82: ldws N(%r30)
! 83: ldws N(%r30)
! 84: ldws N(%r30)
! 85: ldws N(%r30)
! 86: ldws N(%r30)
! 87: ldws N(%r30)
! 88: ldws N(%r30)
! 89: addc
! 90: addc
! 91: addc
! 92: addc
! 93: addc %r0,%r0,cy-limb
! 94:
! 95: ldws res_ptr
! 96: ldws res_ptr
! 97: ldws res_ptr
! 98: ldws res_ptr
! 99: add
! 100: stws res_ptr
! 101: addc
! 102: stws res_ptr
! 103: addc
! 104: stws res_ptr
! 105: addc
! 106: stws res_ptr
! 107:
! 108: addib
! 109:
! 110: 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
! 111: support emerges. But we want to use 64-bit operations whenever possible,
! 112: in particular for loads and stores. It is possible to handle mpn_add_n
! 113: efficiently by rotating (when s1/s2 are aligned), masking+bit field
! 114: inserting when (they are not). The speed should double compared to the
! 115: code used today.
! 116:
! 117:
! 118:
! 119:
! 120: LABEL SYNTAX
! 121:
! 122: The HP-UX assembler takes labels starting in column 0 with no colon,
! 123:
! 124: L$loop ldws,mb -4(0,%r25),%r22
! 125:
! 126: Gas on hppa GNU/Linux however requires a colon,
! 127:
! 128: L$loop: ldws,mb -4(0,%r25),%r22
! 129:
! 130: Fortunately both accept a ".label" pseudo-op,
! 131:
! 132: .label L$loop
! 133: ldws,mb -4(0,%r25),%r22
! 134:
! 135:
! 136:
! 137:
! 138:
! 139: ----------------
! 140: Local variables:
! 141: mode: text
! 142: fill-column: 76
! 143: End:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>