OpenXM_contrib/gmp/mpn/powerpc64/README - annotate

Return to README CVS log
Up to [local] / OpenXM_contrib / gmp / mpn / powerpc64
Annotation of OpenXM_contrib/gmp/mpn/powerpc64/README, Revision 1.1.1.2

1.1.1.2 ! ohara       1: Copyright 1999, 2000, 2001 Free Software Foundation, Inc.
        !             2:
        !             3: This file is part of the GNU MP Library.
        !             4:
        !             5: The GNU MP Library is free software; you can redistribute it and/or modify
        !             6: it under the terms of the GNU Lesser General Public License as published by
        !             7: the Free Software Foundation; either version 2.1 of the License, or (at your
        !             8: option) any later version.
        !             9:
        !            10: The GNU MP Library is distributed in the hope that it will be useful, but
        !            11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
        !            12: or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
        !            13: License for more details.
        !            14:
        !            15: You should have received a copy of the GNU Lesser General Public License
        !            16: along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
        !            17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
        !            18: 02111-1307, USA.
        !            19:
        !            20:
        !            21:
        !            22:
        !            23:
1.1       maekawa    24: PPC630 (aka Power3) pipeline information:
                     25:
                     26: Decoding is 4-way and issue is 8-way with some out-of-order capability.
1.1.1.2 ! ohara      27: Branches are handled separately, and are not part of the 4-way issue limit.
        !            28:
        !            29: Functional units:
1.1       maekawa    30: LS1  - ld/st unit 1
                     31: LS2  - ld/st unit 2
1.1.1.2 ! ohara      32: FXU1 - integer unit 1, handles any simple integer instruction
        !            33: FXU2 - integer unit 2, handles any simple integer instruction
1.1       maekawa    34: FXU3 - integer unit 3, handles integer multiply and divide
                     35: FPU1 - floating-point unit 1
                     36: FPU2 - floating-point unit 2
                     37:
                     38: Memory:                  Any two memory operations can issue, but memory subsystem
1.1.1.2 ! ohara      39:                  can sustain just one store per cycle.  No need for data
        !            40:                  prefetch; the hardware has very sophisticated prefetch logic.
1.1       maekawa    41: Simple integer:          2 operations (such as add, rl*)
                     42: Integer multiply: 1 operation every 9th cycle worst case; exact timing depends
                     43:                  on 2nd operand most significant bit position (10 bits per
                     44:                  cycle).  Multiply unit is not pipelined, only one multiply
                     45:                  operation in progress is allowed.
                     46: Integer divide:          ?
                     47: Floating-point:          Any plain 2 arithmetic instructions (such as fmul, fadd, fmadd)
                     48:                  Latency = 4.
                     49: Floating-point divide:
                     50:                  ?
                     51: Floating-point square root:
                     52:                  ?
                     53:
                     54: Best possible times for the main loops:
                     55: shift:       1.5 cycles limited by integer unit contention.
                     56:              With 63 special loops, one for each shift count, we could
                     57:              reduce the needed integer instructions to 2, which would
                     58:              reduce the best possible time to 1 cycle.
                     59: add/sub:      1.5 cycles, limited by ld/st unit contention.
                     60: mul:         18 cycles (average) unless floating-point operations are used,
                     61:              but that would only help for multiplies of perhaps 10 and more
                     62:              limbs.
                     63: addmul/submul:Same situation as for mul.
1.1.1.2 ! ohara      64:
        !            65:
        !            66: IDEAS
        !            67:
        !            68: *mul_1: Handling one limb using mulld/mulhdu and two limbs using
        !            69: floating-point operations should give a performance of about 20 cycles
        !            70: for 3 limbs, or 7 cycles/limb.
        !            71:
        !            72: We should probably split the single-limb operand in 32-bit chunks, and
        !            73: the multi-limb operand in 16-bit chunks, allowing us to accumulate
        !            74: well in fp registers.
        !            75:
        !            76: Problem is to get 32-bit or 16-bit words to the fp registers.  Only
        !            77: 64-bit fp memops copies bits without fiddling with them.  We might
        !            78: therefore need to load to integer registers with zero extension, store
        !            79: as 64 bits into temp space, and then load to fp regs.  Alternatively,
        !            80: load directly to fp space and add well-chosen constants to get
        !            81: cancelation.  (Other part after given by subsequent subtraction.)
        !            82:
        !            83: Possible code mix for load-via-intregs variant:
        !            84:
        !            85: lwz,std,lfd
        !            86: fmadd,fmadd,fmul,fmul
        !            87: fctidz,stfd,ld,fctidz,stfd,ld
        !            88: add,adde
        !            89: lwz,std,lfd
        !            90: fmadd,fmadd,fmul,fmul
        !            91: fctidz,stfd,ld,fctidz,stfd,ld
        !            92: add,adde
        !            93: srd,sld,add,adde,add,adde
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>