OpenXM_contrib/gmp/mpn/x86/k6/README - diff

Return to README CVS log

Up to [local] / OpenXM_contrib / gmp / mpn / x86 / k6

Diff for /OpenXM_contrib/gmp/mpn/x86/k6/Attic/README between version 1.1.1.1 and 1.1.1.2

version 1.1.1.1, 2000/09/09 14:12:42

version 1.1.1.2, 2003/08/25 16:06:27

Line 1

This file is part of the GNU MP Library.

The GNU MP Library is free software; you can redistribute it and/or modify

it under the terms of the GNU Lesser General Public License as published by

the Free Software Foundation; either version 2.1 of the License, or (at your

option) any later version.

The GNU MP Library is distributed in the hope that it will be useful, but

WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY

or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public

License for more details.

You should have received a copy of the GNU Lesser General Public License

along with the GNU MP Library; see the file COPYING.LIB. If not, write to

the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA

02111-1307, USA.

AMD K6 MPN SUBROUTINES

Line 6

Line 27

This directory contains code optimized for AMD K6 CPUs, meaning K6, K6-2 and

K6-3.

The mmx and k62mmx subdirectories have routines using MMX instructions. All

The mmx subdirectory has MMX code suiting plain K6, the k62mmx subdirectory

K6s have MMX, the separate directories are just so that ./configure can omit

has MMX code suiting K6-2 and K6-3. All chips in the K6 family have MMX,

them if the assembler doesn't support MMX.

the separate directories are just so that ./configure can omit them if the

assembler doesn't support MMX.

Line 28 Times for the loops, with all code and data in L1 cach

Line 50 Times for the loops, with all code and data in L1 cach

mpn_sqr_basecase 4.7 cycles/crossproduct (approx)

or 9.2 cycles/triangleproduct (approx)

mpn_l/rshift 3.0

mpn_divrem_1 20.0

mpn_mod_1 20.0

mpn_divexact_by3 11.0

mpn_l/rshift 3.0

mpn_copyi 1.0

mpn_copyd 1.0

mpn_copyi/copyd 1.0

mpn_com_n 1.5-1.85 \

mpn_and/andn/ior/xor_n 1.5-1.75 | varying with

mpn_iorn/xnor_n 2.0-2.25 | data alignment

mpn_nand/nior_n 2.0-2.25 /

mpn_popcount 12.5

mpn_hamdist 13.0

K6-2 and K6-3 have dual-issue MMX and get the following improvements.

mpn_l/rshift 1.75

mpn_copyi/copyd 0.56 or 1.0 \

mpn_com_n 1.0-1.2 | varying with

mpn_and/andn/ior/xor_n 1.2-1.5 | data alignment

mpn_iorn/xnor_n 1.5-2.0 |

mpn_nand/nior_n 1.75-2.0 /

mpn_popcount 9.0

mpn_hamdist 11.5

Prefetching of sources hasn't yet given any joy. With the 3DNow "prefetch"

instruction, code seems to run slower, and with just "mov" loads it doesn't

seem faster. Results so far are inconsistent. The K6 does a hardware

Line 74 NOTES

Line 79 NOTES

All K6 family chips have MMX, but only K6-2 and K6-3 have 3DNow.

Plain K6 executes MMX instructions only in the X pipe, but K6-2 and K6-3 can

execute them in both X and Y (and together).

execute them in both X and Y (and in both together).

Branch misprediction penalty is 1 to 4 cycles (Optimization Manual

chapter 6 table 12).

Line 163 Addressing modes

Line 168 Addressing modes

happens with forms like "0F opcode mod/rm" with mod/rm=00-xxx-100 since

with mod=00 the sib determines whether there's a displacement.

This affects all MMX and 3DNow instructions, and others with an 0F prefix

This affects all MMX and 3DNow instructions, and others with an 0F prefix,

like movzbl. The modes affected are anything with an index and no

displacement, or an index but no base, and this includes (%esp) which is

really (,%esp,1).

Line 188 Various

Line 193 Various

- femms 3 cycles

- jecxz 2 cycles taken, 13 not taken (optimization manual says 7 not taken)

- divl 20 cycles back-to-back

- imull 2 decode, 2 execute

- imull 2 decode, 3 execute

- mull 2 decode, 3 execute (optimization manual decoding sample)

- prefetch 2 cycles

- rcll/rcrl implicit by one bit: 2 cycles

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>