OpenXM_contrib/gmp/mpn/x86/pentium/README - annotate

Return to README CVS log

Up to [local] / OpenXM_contrib / gmp / mpn / x86 / pentium

Annotation of OpenXM_contrib/gmp/mpn/x86/pentium/README, Revision 1.1

1.1     ! maekawa     1: This directory contains mpn functions optimized for Intel Pentium
        !             2: processors.
        !             3:
        !             4: RELEVANT OPTIMIZATION ISSUES
        !             5:
        !             6: 1. Pentium doesn't allocate cache lines on writes, unlike most other modern
        !             7: processors.  Since the functions in the mpn class do array writes, we have to
        !             8: handle allocating the destination cache lines by reading a word from it in the
        !             9: loops, to achieve the best performance.
        !            10:
        !            11: 2. Pairing of memory operations requires that the two issued operations refer
        !            12: to different cache banks.  The simplest way to insure this is to read/write
        !            13: two words from the same object.  If we make operations on different objects,
        !            14: they might or might not be to the same cache bank.
        !            15:
        !            16: STATUS
        !            17:
        !            18: 1. mpn_lshift and mpn_rshift run at about 6 cycles/limb, but the Pentium
        !            19: documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
        !            20: or 5 cycles/limb asymptotically.
        !            21:
        !            22: 2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb.  Due to loop
        !            23: overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
        !            24:
        !            25: 3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
        !            26: should...

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>