[BACK]Return to README CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / mpn / x86 / k6

Diff for /OpenXM_contrib/gmp/mpn/x86/k6/Attic/README between version 1.1.1.1 and 1.1.1.2

version 1.1.1.1, 2000/09/09 14:12:42 version 1.1.1.2, 2003/08/25 16:06:27
Line 1 
Line 1 
   Copyright 2000, 2001 Free Software Foundation, Inc.
   
   This file is part of the GNU MP Library.
   
   The GNU MP Library is free software; you can redistribute it and/or modify
   it under the terms of the GNU Lesser General Public License as published by
   the Free Software Foundation; either version 2.1 of the License, or (at your
   option) any later version.
   
   The GNU MP Library is distributed in the hope that it will be useful, but
   WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
   License for more details.
   
   You should have received a copy of the GNU Lesser General Public License
   along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
   the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
   02111-1307, USA.
   
   
   
   
                         AMD K6 MPN SUBROUTINES                          AMD K6 MPN SUBROUTINES
   
   
Line 6 
Line 27 
 This directory contains code optimized for AMD K6 CPUs, meaning K6, K6-2 and  This directory contains code optimized for AMD K6 CPUs, meaning K6, K6-2 and
 K6-3.  K6-3.
   
 The mmx and k62mmx subdirectories have routines using MMX instructions.  All  The mmx subdirectory has MMX code suiting plain K6, the k62mmx subdirectory
 K6s have MMX, the separate directories are just so that ./configure can omit  has MMX code suiting K6-2 and K6-3.  All chips in the K6 family have MMX,
 them if the assembler doesn't support MMX.  the separate directories are just so that ./configure can omit them if the
   assembler doesn't support MMX.
   
   
   
Line 28  Times for the loops, with all code and data in L1 cach
Line 50  Times for the loops, with all code and data in L1 cach
         mpn_sqr_basecase           4.7  cycles/crossproduct (approx)          mpn_sqr_basecase           4.7  cycles/crossproduct (approx)
                                    or 9.2 cycles/triangleproduct (approx)                                     or 9.2 cycles/triangleproduct (approx)
   
           mpn_l/rshift               3.0
   
         mpn_divrem_1              20.0          mpn_divrem_1              20.0
         mpn_mod_1                 20.0          mpn_mod_1                 20.0
         mpn_divexact_by3          11.0          mpn_divexact_by3          11.0
   
         mpn_l/rshift               3.0          mpn_copyi                  1.0
           mpn_copyd                  1.0
   
         mpn_copyi/copyd            1.0  
   
         mpn_com_n                  1.5-1.85  \  
         mpn_and/andn/ior/xor_n     1.5-1.75  | varying with  
         mpn_iorn/xnor_n            2.0-2.25  | data alignment  
         mpn_nand/nior_n            2.0-2.25  /  
   
         mpn_popcount              12.5  
         mpn_hamdist               13.0  
   
   
 K6-2 and K6-3 have dual-issue MMX and get the following improvements.  K6-2 and K6-3 have dual-issue MMX and get the following improvements.
   
         mpn_l/rshift               1.75          mpn_l/rshift               1.75
   
         mpn_copyi/copyd            0.56 or 1.0  \  
                                                 |  
         mpn_com_n                  1.0-1.2      | varying with  
         mpn_and/andn/ior/xor_n     1.2-1.5      | data alignment  
         mpn_iorn/xnor_n            1.5-2.0      |  
         mpn_nand/nior_n            1.75-2.0     /  
   
         mpn_popcount               9.0  
         mpn_hamdist               11.5  
   
   
 Prefetching of sources hasn't yet given any joy.  With the 3DNow "prefetch"  Prefetching of sources hasn't yet given any joy.  With the 3DNow "prefetch"
 instruction, code seems to run slower, and with just "mov" loads it doesn't  instruction, code seems to run slower, and with just "mov" loads it doesn't
 seem faster.  Results so far are inconsistent.  The K6 does a hardware  seem faster.  Results so far are inconsistent.  The K6 does a hardware
Line 74  NOTES
Line 79  NOTES
 All K6 family chips have MMX, but only K6-2 and K6-3 have 3DNow.  All K6 family chips have MMX, but only K6-2 and K6-3 have 3DNow.
   
 Plain K6 executes MMX instructions only in the X pipe, but K6-2 and K6-3 can  Plain K6 executes MMX instructions only in the X pipe, but K6-2 and K6-3 can
 execute them in both X and Y (and together).  execute them in both X and Y (and in both together).
   
 Branch misprediction penalty is 1 to 4 cycles (Optimization Manual  Branch misprediction penalty is 1 to 4 cycles (Optimization Manual
 chapter 6 table 12).  chapter 6 table 12).
Line 163  Addressing modes
Line 168  Addressing modes
   happens with forms like "0F opcode mod/rm" with mod/rm=00-xxx-100 since    happens with forms like "0F opcode mod/rm" with mod/rm=00-xxx-100 since
   with mod=00 the sib determines whether there's a displacement.    with mod=00 the sib determines whether there's a displacement.
   
   This affects all MMX and 3DNow instructions, and others with an 0F prefix    This affects all MMX and 3DNow instructions, and others with an 0F prefix,
   like movzbl.  The modes affected are anything with an index and no    like movzbl.  The modes affected are anything with an index and no
   displacement, or an index but no base, and this includes (%esp) which is    displacement, or an index but no base, and this includes (%esp) which is
   really (,%esp,1).    really (,%esp,1).
Line 188  Various
Line 193  Various
 - femms     3 cycles  - femms     3 cycles
 - jecxz     2 cycles taken, 13 not taken (optimization manual says 7 not taken)  - jecxz     2 cycles taken, 13 not taken (optimization manual says 7 not taken)
 - divl      20 cycles back-to-back  - divl      20 cycles back-to-back
 - imull     2 decode, 2 execute  - imull     2 decode, 3 execute
 - mull      2 decode, 3 execute (optimization manual decoding sample)  - mull      2 decode, 3 execute (optimization manual decoding sample)
 - prefetch  2 cycles  - prefetch  2 cycles
 - rcll/rcrl implicit by one bit: 2 cycles  - rcll/rcrl implicit by one bit: 2 cycles

Legend:
Removed from v.1.1.1.1  
changed lines
  Added in v.1.1.1.2

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>