[BACK]Return to README CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / mpn / hppa

Annotation of OpenXM_contrib/gmp/mpn/hppa/README, Revision 1.1.1.2

1.1       maekawa     1: This directory contains mpn functions for various HP PA-RISC chips.  Code
                      2: that runs faster on the PA7100 and later implementations, is in the pa7100
                      3: directory.
                      4:
                      5: RELEVANT OPTIMIZATION ISSUES
                      6:
                      7:   Load and Store timing
                      8:
                      9: On the PA7000 no memory instructions can issue the two cycles after a store.
                     10: For the PA7100, this is reduced to one cycle.
                     11:
                     12: The PA7100 has a lookup-free cache, so it helps to schedule loads and the
                     13: dependent instruction really far from each other.
                     14:
                     15: STATUS
                     16:
                     17: 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
1.1.1.2 ! maekawa    18:    instructions below (but some sw pipelining is needed to avoid the
1.1       maekawa    19:    xmpyu-fstds delay):
                     20:
                     21:        fldds   s1_ptr
                     22:
                     23:        xmpyu
                     24:        fstds   N(%r30)
                     25:        xmpyu
                     26:        fstds   N(%r30)
                     27:
                     28:        ldws    N(%r30)
                     29:        ldws    N(%r30)
                     30:        ldws    N(%r30)
                     31:        ldws    N(%r30)
                     32:
                     33:        addc
                     34:        stws    res_ptr
                     35:        addc
                     36:        stws    res_ptr
                     37:
                     38:        addib   Loop
                     39:
                     40: 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
                     41:    (asymptotically) on the PA7100, using the instructions below.  With proper
                     42:    sw pipelining and the unrolling level below, the speed becomes 8
                     43:    cycles/limb.
                     44:
                     45:        fldds   s1_ptr
                     46:        fldds   s1_ptr
                     47:
                     48:        xmpyu
                     49:        fstds   N(%r30)
                     50:        xmpyu
                     51:        fstds   N(%r30)
                     52:        xmpyu
                     53:        fstds   N(%r30)
                     54:        xmpyu
                     55:        fstds   N(%r30)
                     56:
                     57:        ldws    N(%r30)
                     58:        ldws    N(%r30)
                     59:        ldws    N(%r30)
                     60:        ldws    N(%r30)
                     61:        ldws    N(%r30)
                     62:        ldws    N(%r30)
                     63:        ldws    N(%r30)
                     64:        ldws    N(%r30)
                     65:        addc
                     66:        addc
                     67:        addc
                     68:        addc
                     69:        addc    %r0,%r0,cy-limb
                     70:
                     71:        ldws    res_ptr
                     72:        ldws    res_ptr
                     73:        ldws    res_ptr
                     74:        ldws    res_ptr
                     75:        add
                     76:        stws    res_ptr
                     77:        addc
                     78:        stws    res_ptr
                     79:        addc
                     80:        stws    res_ptr
                     81:        addc
                     82:        stws    res_ptr
                     83:
                     84:        addib
1.1.1.2 ! maekawa    85:
        !            86: 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
        !            87:    support emerges.  But we want to use 64-bit operations whenever possible,
        !            88:    in particular for loads and stores.  It is possible to handle mpn_add_n
        !            89:    efficiently by rotating (when s1/s2 are aligned), masking+bit field
        !            90:    inserting when (they are not).  The speed should double compared to the
        !            91:    code used today.

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>