[BACK]Return to README CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / mpn / x86 / pentium

Annotation of OpenXM_contrib/gmp/mpn/x86/pentium/README, Revision 1.1.1.2

1.1.1.2 ! maekawa     1:
        !             2:                    INTEL PENTIUM P5 MPN SUBROUTINES
        !             3:
        !             4:
        !             5: This directory contains mpn functions optimized for Intel Pentium (P5,P54)
        !             6: processors.  The mmx subdirectory has code for Pentium with MMX (P55).
        !             7:
        !             8:
        !             9: STATUS
        !            10:
        !            11:                                 cycles/limb
        !            12:
        !            13:        mpn_add_n/sub_n            2.375
        !            14:
        !            15:        mpn_copyi/copyd            1.0
        !            16:
        !            17:        mpn_divrem_1              44.0
        !            18:        mpn_mod_1                 44.0
        !            19:        mpn_divexact_by3          15.0
        !            20:
        !            21:        mpn_l/rshift               5.375 normal (6.0 on P54)
        !            22:                                   1.875 special shift by 1 bit
        !            23:
        !            24:        mpn_mul_1                 13.0
        !            25:        mpn_add/submul_1          14.0
        !            26:
        !            27:        mpn_mul_basecase          14.2 cycles/crossproduct (approx)
        !            28:
        !            29:        mpn_sqr_basecase           8 cycles/crossproduct (approx)
        !            30:                                    or 15.5 cycles/triangleproduct (approx)
        !            31:
        !            32: Pentium MMX gets the following improvements
        !            33:
        !            34:        mpn_l/rshift               1.75
        !            35:
        !            36:
        !            37: 1. mpn_lshift and mpn_rshift run at about 6 cycles/limb on P5 and P54, but the
        !            38: documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
        !            39: or 5 cycles/limb asymptotically.  The P55 runs them at the expected speed.
        !            40:
        !            41: 2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb.  Due to loop
        !            42: overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
        !            43:
        !            44: 3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
        !            45: should.  Intel documentation says a mul instruction is 10 cycles, but it
        !            46: measures 9 and the routines using it run with it as 9.
        !            47:
        !            48:
1.1       maekawa    49:
                     50: RELEVANT OPTIMIZATION ISSUES
                     51:
                     52: 1. Pentium doesn't allocate cache lines on writes, unlike most other modern
                     53: processors.  Since the functions in the mpn class do array writes, we have to
                     54: handle allocating the destination cache lines by reading a word from it in the
                     55: loops, to achieve the best performance.
                     56:
                     57: 2. Pairing of memory operations requires that the two issued operations refer
                     58: to different cache banks.  The simplest way to insure this is to read/write
                     59: two words from the same object.  If we make operations on different objects,
                     60: they might or might not be to the same cache bank.
                     61:
                     62:
                     63:
1.1.1.2 ! maekawa    64: REFERENCES
1.1       maekawa    65:
1.1.1.2 ! maekawa    66: "Intel Architecture Optimization Manual", 1997, order number 242816.  This
        !            67: is mostly about P5, the parts about P6 aren't relevant.  Available on-line:
        !            68:
        !            69:         http://download.intel.com/design/PentiumII/manuals/242816.htm
        !            70:
        !            71:
        !            72:
        !            73: ----------------
        !            74: Local variables:
        !            75: mode: text
        !            76: fill-column: 76
        !            77: End:

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>