[BACK]Return to README CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / mpn / pa64

Annotation of OpenXM_contrib/gmp/mpn/pa64/README, Revision 1.1.1.2

1.1.1.2 ! ohara       1: Copyright 1999, 2001, 2002 Free Software Foundation, Inc.
        !             2:
        !             3: This file is part of the GNU MP Library.
        !             4:
        !             5: The GNU MP Library is free software; you can redistribute it and/or modify
        !             6: it under the terms of the GNU Lesser General Public License as published by
        !             7: the Free Software Foundation; either version 2.1 of the License, or (at your
        !             8: option) any later version.
        !             9:
        !            10: The GNU MP Library is distributed in the hope that it will be useful, but
        !            11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
        !            12: or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
        !            13: License for more details.
        !            14:
        !            15: You should have received a copy of the GNU Lesser General Public License
        !            16: along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
        !            17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
        !            18: 02111-1307, USA.
        !            19:
1.1       maekawa    20:
                     21:
                     22:
1.1.1.2 ! ohara      23: This directory contains mpn functions for 64-bit PA-RISC 2.0.
        !            24:
        !            25: PIPELINE SUMMARY
        !            26:
        !            27: The PA8x00 processors have an orthogonal 4-way out-of-order pipeline.  Each
        !            28: cycle two ALU operations and two MEM operations can issue, but just one of the
        !            29: MEM operations may be a store.  The two ALU operations can be almost any
        !            30: combination of non-memory operations.  Unlike every other processor, integer
        !            31: and fp operations are completely equal here; they both count as just ALU
        !            32: operations.
        !            33:
        !            34: Unfortunately, some operations cause hickups in the pipeline.  Combining
        !            35: carry-consuming operations like ADD,DC with operations that does not set carry
        !            36: like ADD,L cause long delays.  Skip operations also seem to cause hickups.  If
        !            37: several ADD,DC are issued consecutively, or if plain carry-generating ADD feed
        !            38: ADD,DC, stalling does not occur.  We can effectively issue two ADD,DC
        !            39: operations/cycle.
        !            40:
        !            41: Latency scheduling is not as important as making sure to have a mix of ALU and
        !            42: MEM operations, but for full pipeline utilization, it is still a good idea to
        !            43: do some amount of latency scheduling.
        !            44:
        !            45: Like for all other processors, RAW memory scheduling is critically important.
        !            46: Since integer multiplication takes place in the floating-point unit, the GMP
        !            47: code needs to handle this problem frequently.
1.1       maekawa    48:
                     49: STATUS
                     50:
1.1.1.2 ! ohara      51: * mpn_lshift and mpn_rshift run at 1.5 cycles/limb on PA8000 and at 1.0
        !            52:   cycles/limb on PA8500.  With latency scheduling, the numbers could be
        !            53:   improved to 1.0 cycles/limb for all PA8x00 chips.
        !            54:
        !            55: * mpn_add_n and mpn_sub_n run at 2.0 cycles/limb on PA8000 and at about 1.9
        !            56:   cycles/limb on PA8500.  With latency scheduling, this could be improved to
        !            57:   1.5 cycles/limb.
        !            58:
        !            59: * The mpn_addmul_1 run at 6.25 cycles/limb.  The current code uses ADD,DC for
        !            60:   adjacent limbs, and relies heavily on reordering.
1.1       maekawa    61:
1.1.1.2 ! ohara      62: * Both mpn_mul_1 and mpn_submul_1 run at around 11 cycles/limb.  There is
        !            63:   obviously room for improving these along the lines of mpn_addmul_1.

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>