OpenXM_contrib/gmp/mpn/powerpc64/README - diff

Return to README CVS log

Up to [local] / OpenXM_contrib / gmp / mpn / powerpc64

Diff for /OpenXM_contrib/gmp/mpn/powerpc64/Attic/README between version 1.1.1.1 and 1.1.1.2

-version 1.1.1.1, 2000/09/09 14:12:38
+version 1.1.1.2, 2003/08/25 16:06:24
 Line 1
 Line 1
 Line 1
+ Copyright 1999, 2000, 2001 Free Software Foundation, Inc.
+ This file is part of the GNU MP Library.
+ The GNU MP Library is free software; you can redistribute it and/or modify
+ it under the terms of the GNU Lesser General Public License as published by
+ the Free Software Foundation; either version 2.1 of the License, or (at your
+ option) any later version.
+ The GNU MP Library is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+ License for more details.
+ You should have received a copy of the GNU Lesser General Public License
+ along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
+ the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+-1307, USA.
  PPC630 (aka Power3) pipeline information:
  Decoding is 4-way and issue is 8-way with some out-of-order capability.
+ Branches are handled separately, and are not part of the 4-way issue limit.
+ Functional units:
  LS1  - ld/st unit 1
  LS2  - ld/st unit 2
- FXU1 - integer unit 1, handles any simple integer instructions
+ FXU1 - integer unit 1, handles any simple integer instruction
- FXU2 - integer unit 2, handles any simple integer instructions
+ FXU2 - integer unit 2, handles any simple integer instruction
  FXU3 - integer unit 3, handles integer multiply and divide
  FPU1 - floating-point unit 1
  FPU2 - floating-point unit 2
  Memory:           Any two memory operations can issue, but memory subsystem
-                   can sustain just one store per cycle.
+                   can sustain just one store per cycle.  No need for data
+                   prefetch; the hardware has very sophisticated prefetch logic.
  Simple integer:   2 operations (such as add, rl*)
  Integer multiply: 1 operation every 9th cycle worst case; exact timing depends
                    on 2nd operand most significant bit position (10 bits per
-Line 34  mul:       18 cycles (average) unless floating-point o
+Line 61  mul:       18 cycles (average) unless floating-point o
 Line 34  mul:       18 cycles (average) unless floating-point o
 Line 61  mul:       18 cycles (average) unless floating-point o
                but that would only help for multiplies of perhaps 10 and more
                limbs.
  addmul/submul:Same situation as for mul.
+ IDEAS
+ *mul_1: Handling one limb using mulld/mulhdu and two limbs using
+ floating-point operations should give a performance of about 20 cycles
+ for 3 limbs, or 7 cycles/limb.
+ We should probably split the single-limb operand in 32-bit chunks, and
+ the multi-limb operand in 16-bit chunks, allowing us to accumulate
+ well in fp registers.
+ Problem is to get 32-bit or 16-bit words to the fp registers.  Only
+-bit fp memops copies bits without fiddling with them.  We might
+ therefore need to load to integer registers with zero extension, store
+ as 64 bits into temp space, and then load to fp regs.  Alternatively,
+ load directly to fp space and add well-chosen constants to get
+ cancelation.  (Other part after given by subsequent subtraction.)
+ Possible code mix for load-via-intregs variant:
+ lwz,std,lfd
+ fmadd,fmadd,fmul,fmul
+ fctidz,stfd,ld,fctidz,stfd,ld
+ add,adde
+ lwz,std,lfd
+ fmadd,fmadd,fmul,fmul
+ fctidz,stfd,ld,fctidz,stfd,ld
+ add,adde
+ srd,sld,add,adde,add,adde

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>