Annotation of OpenXM_contrib/gmp/mpn/x86/pentium/README, Revision 1.1.1.2
1.1.1.2 ! maekawa 1:
! 2: INTEL PENTIUM P5 MPN SUBROUTINES
! 3:
! 4:
! 5: This directory contains mpn functions optimized for Intel Pentium (P5,P54)
! 6: processors. The mmx subdirectory has code for Pentium with MMX (P55).
! 7:
! 8:
! 9: STATUS
! 10:
! 11: cycles/limb
! 12:
! 13: mpn_add_n/sub_n 2.375
! 14:
! 15: mpn_copyi/copyd 1.0
! 16:
! 17: mpn_divrem_1 44.0
! 18: mpn_mod_1 44.0
! 19: mpn_divexact_by3 15.0
! 20:
! 21: mpn_l/rshift 5.375 normal (6.0 on P54)
! 22: 1.875 special shift by 1 bit
! 23:
! 24: mpn_mul_1 13.0
! 25: mpn_add/submul_1 14.0
! 26:
! 27: mpn_mul_basecase 14.2 cycles/crossproduct (approx)
! 28:
! 29: mpn_sqr_basecase 8 cycles/crossproduct (approx)
! 30: or 15.5 cycles/triangleproduct (approx)
! 31:
! 32: Pentium MMX gets the following improvements
! 33:
! 34: mpn_l/rshift 1.75
! 35:
! 36:
! 37: 1. mpn_lshift and mpn_rshift run at about 6 cycles/limb on P5 and P54, but the
! 38: documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
! 39: or 5 cycles/limb asymptotically. The P55 runs them at the expected speed.
! 40:
! 41: 2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb. Due to loop
! 42: overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
! 43:
! 44: 3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
! 45: should. Intel documentation says a mul instruction is 10 cycles, but it
! 46: measures 9 and the routines using it run with it as 9.
! 47:
! 48:
1.1 maekawa 49:
50: RELEVANT OPTIMIZATION ISSUES
51:
52: 1. Pentium doesn't allocate cache lines on writes, unlike most other modern
53: processors. Since the functions in the mpn class do array writes, we have to
54: handle allocating the destination cache lines by reading a word from it in the
55: loops, to achieve the best performance.
56:
57: 2. Pairing of memory operations requires that the two issued operations refer
58: to different cache banks. The simplest way to insure this is to read/write
59: two words from the same object. If we make operations on different objects,
60: they might or might not be to the same cache bank.
61:
62:
63:
1.1.1.2 ! maekawa 64: REFERENCES
1.1 maekawa 65:
1.1.1.2 ! maekawa 66: "Intel Architecture Optimization Manual", 1997, order number 242816. This
! 67: is mostly about P5, the parts about P6 aren't relevant. Available on-line:
! 68:
! 69: http://download.intel.com/design/PentiumII/manuals/242816.htm
! 70:
! 71:
! 72:
! 73: ----------------
! 74: Local variables:
! 75: mode: text
! 76: fill-column: 76
! 77: End:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>