Annotation of OpenXM_contrib/gmp/mpn/powerpc64/README, Revision 1.1.1.2
1.1.1.2 ! ohara 1: Copyright 1999, 2000, 2001 Free Software Foundation, Inc.
! 2:
! 3: This file is part of the GNU MP Library.
! 4:
! 5: The GNU MP Library is free software; you can redistribute it and/or modify
! 6: it under the terms of the GNU Lesser General Public License as published by
! 7: the Free Software Foundation; either version 2.1 of the License, or (at your
! 8: option) any later version.
! 9:
! 10: The GNU MP Library is distributed in the hope that it will be useful, but
! 11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
! 12: or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
! 13: License for more details.
! 14:
! 15: You should have received a copy of the GNU Lesser General Public License
! 16: along with the GNU MP Library; see the file COPYING.LIB. If not, write to
! 17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
! 18: 02111-1307, USA.
! 19:
! 20:
! 21:
! 22:
! 23:
1.1 maekawa 24: PPC630 (aka Power3) pipeline information:
25:
26: Decoding is 4-way and issue is 8-way with some out-of-order capability.
1.1.1.2 ! ohara 27: Branches are handled separately, and are not part of the 4-way issue limit.
! 28:
! 29: Functional units:
1.1 maekawa 30: LS1 - ld/st unit 1
31: LS2 - ld/st unit 2
1.1.1.2 ! ohara 32: FXU1 - integer unit 1, handles any simple integer instruction
! 33: FXU2 - integer unit 2, handles any simple integer instruction
1.1 maekawa 34: FXU3 - integer unit 3, handles integer multiply and divide
35: FPU1 - floating-point unit 1
36: FPU2 - floating-point unit 2
37:
38: Memory: Any two memory operations can issue, but memory subsystem
1.1.1.2 ! ohara 39: can sustain just one store per cycle. No need for data
! 40: prefetch; the hardware has very sophisticated prefetch logic.
1.1 maekawa 41: Simple integer: 2 operations (such as add, rl*)
42: Integer multiply: 1 operation every 9th cycle worst case; exact timing depends
43: on 2nd operand most significant bit position (10 bits per
44: cycle). Multiply unit is not pipelined, only one multiply
45: operation in progress is allowed.
46: Integer divide: ?
47: Floating-point: Any plain 2 arithmetic instructions (such as fmul, fadd, fmadd)
48: Latency = 4.
49: Floating-point divide:
50: ?
51: Floating-point square root:
52: ?
53:
54: Best possible times for the main loops:
55: shift: 1.5 cycles limited by integer unit contention.
56: With 63 special loops, one for each shift count, we could
57: reduce the needed integer instructions to 2, which would
58: reduce the best possible time to 1 cycle.
59: add/sub: 1.5 cycles, limited by ld/st unit contention.
60: mul: 18 cycles (average) unless floating-point operations are used,
61: but that would only help for multiplies of perhaps 10 and more
62: limbs.
63: addmul/submul:Same situation as for mul.
1.1.1.2 ! ohara 64:
! 65:
! 66: IDEAS
! 67:
! 68: *mul_1: Handling one limb using mulld/mulhdu and two limbs using
! 69: floating-point operations should give a performance of about 20 cycles
! 70: for 3 limbs, or 7 cycles/limb.
! 71:
! 72: We should probably split the single-limb operand in 32-bit chunks, and
! 73: the multi-limb operand in 16-bit chunks, allowing us to accumulate
! 74: well in fp registers.
! 75:
! 76: Problem is to get 32-bit or 16-bit words to the fp registers. Only
! 77: 64-bit fp memops copies bits without fiddling with them. We might
! 78: therefore need to load to integer registers with zero extension, store
! 79: as 64 bits into temp space, and then load to fp regs. Alternatively,
! 80: load directly to fp space and add well-chosen constants to get
! 81: cancelation. (Other part after given by subsequent subtraction.)
! 82:
! 83: Possible code mix for load-via-intregs variant:
! 84:
! 85: lwz,std,lfd
! 86: fmadd,fmadd,fmul,fmul
! 87: fctidz,stfd,ld,fctidz,stfd,ld
! 88: add,adde
! 89: lwz,std,lfd
! 90: fmadd,fmadd,fmul,fmul
! 91: fctidz,stfd,ld,fctidz,stfd,ld
! 92: add,adde
! 93: srd,sld,add,adde,add,adde
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>