Annotation of OpenXM_contrib/gmp/mpn/x86/pentium/README, Revision 1.1
1.1 ! maekawa 1: This directory contains mpn functions optimized for Intel Pentium
! 2: processors.
! 3:
! 4: RELEVANT OPTIMIZATION ISSUES
! 5:
! 6: 1. Pentium doesn't allocate cache lines on writes, unlike most other modern
! 7: processors. Since the functions in the mpn class do array writes, we have to
! 8: handle allocating the destination cache lines by reading a word from it in the
! 9: loops, to achieve the best performance.
! 10:
! 11: 2. Pairing of memory operations requires that the two issued operations refer
! 12: to different cache banks. The simplest way to insure this is to read/write
! 13: two words from the same object. If we make operations on different objects,
! 14: they might or might not be to the same cache bank.
! 15:
! 16: STATUS
! 17:
! 18: 1. mpn_lshift and mpn_rshift run at about 6 cycles/limb, but the Pentium
! 19: documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
! 20: or 5 cycles/limb asymptotically.
! 21:
! 22: 2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb. Due to loop
! 23: overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
! 24:
! 25: 3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
! 26: should...
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>