OpenXM_contrib/gmp/mpn/pa32/README - annotate

Return to README CVS log
Up to [local] / OpenXM_contrib / gmp / mpn / pa32
Annotation of OpenXM_contrib/gmp/mpn/pa32/README, Revision 1.1.1.1

1.1       ohara       1: Copyright 1996, 1999, 2001 Free Software Foundation, Inc.
                      2:
                      3: This file is part of the GNU MP Library.
                      4:
                      5: The GNU MP Library is free software; you can redistribute it and/or modify
                      6: it under the terms of the GNU Lesser General Public License as published by
                      7: the Free Software Foundation; either version 2.1 of the License, or (at your
                      8: option) any later version.
                      9:
                     10: The GNU MP Library is distributed in the hope that it will be useful, but
                     11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
                     12: or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
                     13: License for more details.
                     14:
                     15: You should have received a copy of the GNU Lesser General Public License
                     16: along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
                     17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
                     18: 02111-1307, USA.
                     19:
                     20:
                     21:
                     22:
                     23:
                     24:
                     25: This directory contains mpn functions for various HP PA-RISC chips.  Code
                     26: that runs faster on the PA7100 and later implementations, is in the pa7100
                     27: directory.
                     28:
                     29: RELEVANT OPTIMIZATION ISSUES
                     30:
                     31:   Load and Store timing
                     32:
                     33: On the PA7000 no memory instructions can issue the two cycles after a store.
                     34: For the PA7100, this is reduced to one cycle.
                     35:
                     36: The PA7100 has a lookup-free cache, so it helps to schedule loads and the
                     37: dependent instruction really far from each other.
                     38:
                     39: STATUS
                     40:
                     41: 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
                     42:    instructions below (but some sw pipelining is needed to avoid the
                     43:    xmpyu-fstds delay):
                     44:
                     45:        fldds   s1_ptr
                     46:
                     47:        xmpyu
                     48:        fstds   N(%r30)
                     49:        xmpyu
                     50:        fstds   N(%r30)
                     51:
                     52:        ldws    N(%r30)
                     53:        ldws    N(%r30)
                     54:        ldws    N(%r30)
                     55:        ldws    N(%r30)
                     56:
                     57:        addc
                     58:        stws    res_ptr
                     59:        addc
                     60:        stws    res_ptr
                     61:
                     62:        addib   Loop
                     63:
                     64: 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
                     65:    (asymptotically) on the PA7100, using the instructions below.  With proper
                     66:    sw pipelining and the unrolling level below, the speed becomes 8
                     67:    cycles/limb.
                     68:
                     69:        fldds   s1_ptr
                     70:        fldds   s1_ptr
                     71:
                     72:        xmpyu
                     73:        fstds   N(%r30)
                     74:        xmpyu
                     75:        fstds   N(%r30)
                     76:        xmpyu
                     77:        fstds   N(%r30)
                     78:        xmpyu
                     79:        fstds   N(%r30)
                     80:
                     81:        ldws    N(%r30)
                     82:        ldws    N(%r30)
                     83:        ldws    N(%r30)
                     84:        ldws    N(%r30)
                     85:        ldws    N(%r30)
                     86:        ldws    N(%r30)
                     87:        ldws    N(%r30)
                     88:        ldws    N(%r30)
                     89:        addc
                     90:        addc
                     91:        addc
                     92:        addc
                     93:        addc    %r0,%r0,cy-limb
                     94:
                     95:        ldws    res_ptr
                     96:        ldws    res_ptr
                     97:        ldws    res_ptr
                     98:        ldws    res_ptr
                     99:        add
                    100:        stws    res_ptr
                    101:        addc
                    102:        stws    res_ptr
                    103:        addc
                    104:        stws    res_ptr
                    105:        addc
                    106:        stws    res_ptr
                    107:
                    108:        addib
                    109:
                    110: 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
                    111:    support emerges.  But we want to use 64-bit operations whenever possible,
                    112:    in particular for loads and stores.  It is possible to handle mpn_add_n
                    113:    efficiently by rotating (when s1/s2 are aligned), masking+bit field
                    114:    inserting when (they are not).  The speed should double compared to the
                    115:    code used today.
                    116:
                    117:
                    118:
                    119:
                    120: LABEL SYNTAX
                    121:
                    122: The HP-UX assembler takes labels starting in column 0 with no colon,
                    123:
                    124:        L$loop  ldws,mb -4(0,%r25),%r22
                    125:
                    126: Gas on hppa GNU/Linux however requires a colon,
                    127:
                    128:        L$loop: ldws,mb -4(0,%r25),%r22
                    129:
                    130: Fortunately both accept a ".label" pseudo-op,
                    131:
                    132:                .label  L$loop
                    133:                ldws,mb -4(0,%r25),%r22
                    134:
                    135:
                    136:
                    137:
                    138:
                    139: ----------------
                    140: Local variables:
                    141: mode: text
                    142: fill-column: 76
                    143: End:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>