version 1.1, 2000/09/09 14:12:42 |
version 1.1.1.2, 2003/08/25 16:06:27 |
|
|
|
Copyright 2000, 2001 Free Software Foundation, Inc. |
|
|
|
This file is part of the GNU MP Library. |
|
|
|
The GNU MP Library is free software; you can redistribute it and/or modify |
|
it under the terms of the GNU Lesser General Public License as published by |
|
the Free Software Foundation; either version 2.1 of the License, or (at your |
|
option) any later version. |
|
|
|
The GNU MP Library is distributed in the hope that it will be useful, but |
|
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY |
|
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public |
|
License for more details. |
|
|
|
You should have received a copy of the GNU Lesser General Public License |
|
along with the GNU MP Library; see the file COPYING.LIB. If not, write to |
|
the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA |
|
02111-1307, USA. |
|
|
|
|
|
|
|
|
AMD K6 MPN SUBROUTINES |
AMD K6 MPN SUBROUTINES |
|
|
|
|
|
|
This directory contains code optimized for AMD K6 CPUs, meaning K6, K6-2 and |
This directory contains code optimized for AMD K6 CPUs, meaning K6, K6-2 and |
K6-3. |
K6-3. |
|
|
The mmx and k62mmx subdirectories have routines using MMX instructions. All |
The mmx subdirectory has MMX code suiting plain K6, the k62mmx subdirectory |
K6s have MMX, the separate directories are just so that ./configure can omit |
has MMX code suiting K6-2 and K6-3. All chips in the K6 family have MMX, |
them if the assembler doesn't support MMX. |
the separate directories are just so that ./configure can omit them if the |
|
assembler doesn't support MMX. |
|
|
|
|
|
|
Line 28 Times for the loops, with all code and data in L1 cach |
|
Line 50 Times for the loops, with all code and data in L1 cach |
|
mpn_sqr_basecase 4.7 cycles/crossproduct (approx) |
mpn_sqr_basecase 4.7 cycles/crossproduct (approx) |
or 9.2 cycles/triangleproduct (approx) |
or 9.2 cycles/triangleproduct (approx) |
|
|
|
mpn_l/rshift 3.0 |
|
|
mpn_divrem_1 20.0 |
mpn_divrem_1 20.0 |
mpn_mod_1 20.0 |
mpn_mod_1 20.0 |
mpn_divexact_by3 11.0 |
mpn_divexact_by3 11.0 |
|
|
mpn_l/rshift 3.0 |
mpn_copyi 1.0 |
|
mpn_copyd 1.0 |
|
|
mpn_copyi/copyd 1.0 |
|
|
|
mpn_com_n 1.5-1.85 \ |
|
mpn_and/andn/ior/xor_n 1.5-1.75 | varying with |
|
mpn_iorn/xnor_n 2.0-2.25 | data alignment |
|
mpn_nand/nior_n 2.0-2.25 / |
|
|
|
mpn_popcount 12.5 |
|
mpn_hamdist 13.0 |
|
|
|
|
|
K6-2 and K6-3 have dual-issue MMX and get the following improvements. |
K6-2 and K6-3 have dual-issue MMX and get the following improvements. |
|
|
mpn_l/rshift 1.75 |
mpn_l/rshift 1.75 |
|
|
mpn_copyi/copyd 0.56 or 1.0 \ |
|
| |
|
mpn_com_n 1.0-1.2 | varying with |
|
mpn_and/andn/ior/xor_n 1.2-1.5 | data alignment |
|
mpn_iorn/xnor_n 1.5-2.0 | |
|
mpn_nand/nior_n 1.75-2.0 / |
|
|
|
mpn_popcount 9.0 |
|
mpn_hamdist 11.5 |
|
|
|
|
|
Prefetching of sources hasn't yet given any joy. With the 3DNow "prefetch" |
Prefetching of sources hasn't yet given any joy. With the 3DNow "prefetch" |
instruction, code seems to run slower, and with just "mov" loads it doesn't |
instruction, code seems to run slower, and with just "mov" loads it doesn't |
seem faster. Results so far are inconsistent. The K6 does a hardware |
seem faster. Results so far are inconsistent. The K6 does a hardware |
|
|
All K6 family chips have MMX, but only K6-2 and K6-3 have 3DNow. |
All K6 family chips have MMX, but only K6-2 and K6-3 have 3DNow. |
|
|
Plain K6 executes MMX instructions only in the X pipe, but K6-2 and K6-3 can |
Plain K6 executes MMX instructions only in the X pipe, but K6-2 and K6-3 can |
execute them in both X and Y (and together). |
execute them in both X and Y (and in both together). |
|
|
Branch misprediction penalty is 1 to 4 cycles (Optimization Manual |
Branch misprediction penalty is 1 to 4 cycles (Optimization Manual |
chapter 6 table 12). |
chapter 6 table 12). |
Line 163 Addressing modes |
|
Line 168 Addressing modes |
|
happens with forms like "0F opcode mod/rm" with mod/rm=00-xxx-100 since |
happens with forms like "0F opcode mod/rm" with mod/rm=00-xxx-100 since |
with mod=00 the sib determines whether there's a displacement. |
with mod=00 the sib determines whether there's a displacement. |
|
|
This affects all MMX and 3DNow instructions, and others with an 0F prefix |
This affects all MMX and 3DNow instructions, and others with an 0F prefix, |
like movzbl. The modes affected are anything with an index and no |
like movzbl. The modes affected are anything with an index and no |
displacement, or an index but no base, and this includes (%esp) which is |
displacement, or an index but no base, and this includes (%esp) which is |
really (,%esp,1). |
really (,%esp,1). |
|
|
- femms 3 cycles |
- femms 3 cycles |
- jecxz 2 cycles taken, 13 not taken (optimization manual says 7 not taken) |
- jecxz 2 cycles taken, 13 not taken (optimization manual says 7 not taken) |
- divl 20 cycles back-to-back |
- divl 20 cycles back-to-back |
- imull 2 decode, 2 execute |
- imull 2 decode, 3 execute |
- mull 2 decode, 3 execute (optimization manual decoding sample) |
- mull 2 decode, 3 execute (optimization manual decoding sample) |
- prefetch 2 cycles |
- prefetch 2 cycles |
- rcll/rcrl implicit by one bit: 2 cycles |
- rcll/rcrl implicit by one bit: 2 cycles |