[BACK]Return to README CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / mpn / x86 / pentium

Diff for /OpenXM_contrib/gmp/mpn/x86/pentium/Attic/README between version 1.1.1.1 and 1.1.1.2

version 1.1.1.1, 2000/01/10 15:35:26 version 1.1.1.2, 2000/09/09 14:12:44
Line 1 
Line 1 
 This directory contains mpn functions optimized for Intel Pentium  
 processors.  
   
                      INTEL PENTIUM P5 MPN SUBROUTINES
   
   
   This directory contains mpn functions optimized for Intel Pentium (P5,P54)
   processors.  The mmx subdirectory has code for Pentium with MMX (P55).
   
   
   STATUS
   
                                   cycles/limb
   
           mpn_add_n/sub_n            2.375
   
           mpn_copyi/copyd            1.0
   
           mpn_divrem_1              44.0
           mpn_mod_1                 44.0
           mpn_divexact_by3          15.0
   
           mpn_l/rshift               5.375 normal (6.0 on P54)
                                      1.875 special shift by 1 bit
   
           mpn_mul_1                 13.0
           mpn_add/submul_1          14.0
   
           mpn_mul_basecase          14.2 cycles/crossproduct (approx)
   
           mpn_sqr_basecase           8 cycles/crossproduct (approx)
                                      or 15.5 cycles/triangleproduct (approx)
   
   Pentium MMX gets the following improvements
   
           mpn_l/rshift               1.75
   
   
   1. mpn_lshift and mpn_rshift run at about 6 cycles/limb on P5 and P54, but the
   documentation indicates that they should take only 43/8 = 5.375 cycles/limb,
   or 5 cycles/limb asymptotically.  The P55 runs them at the expected speed.
   
   2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb.  Due to loop
   overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.
   
   3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they
   should.  Intel documentation says a mul instruction is 10 cycles, but it
   measures 9 and the routines using it run with it as 9.
   
   
   
 RELEVANT OPTIMIZATION ISSUES  RELEVANT OPTIMIZATION ISSUES
   
 1. Pentium doesn't allocate cache lines on writes, unlike most other modern  1. Pentium doesn't allocate cache lines on writes, unlike most other modern
Line 13  to different cache banks.  The simplest way to insure 
Line 59  to different cache banks.  The simplest way to insure 
 two words from the same object.  If we make operations on different objects,  two words from the same object.  If we make operations on different objects,
 they might or might not be to the same cache bank.  they might or might not be to the same cache bank.
   
 STATUS  
   
 1. mpn_lshift and mpn_rshift run at about 6 cycles/limb, but the Pentium  
 documentation indicates that they should take only 43/8 = 5.375 cycles/limb,  
 or 5 cycles/limb asymptotically.  
   
 2. mpn_add_n and mpn_sub_n run at asymptotically 2 cycles/limb.  Due to loop  REFERENCES
 overhead and other delays (cache refill?), they run at or near 2.5 cycles/limb.  
   
 3. mpn_mul_1, mpn_addmul_1, mpn_submul_1 all run 1 cycle faster than they  "Intel Architecture Optimization Manual", 1997, order number 242816.  This
 should...  is mostly about P5, the parts about P6 aren't relevant.  Available on-line:
   
           http://download.intel.com/design/PentiumII/manuals/242816.htm
   
   
   
   ----------------
   Local variables:
   mode: text
   fill-column: 76
   End:

Legend:
Removed from v.1.1.1.1  
changed lines
  Added in v.1.1.1.2

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>