[BACK]Return to SPEED CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp

File: [local] / OpenXM_contrib / gmp / Attic / SPEED (download)

Revision 1.1.1.1 (vendor branch), Mon Jan 10 15:35:21 2000 UTC (24 years, 4 months ago) by maekawa
Branch: GMP
CVS Tags: VERSION_2_0_2, RELEASE_20000124, RELEASE_1_1_2
Changes since 1.1: +0 -0 lines

Import gmp 2.0.2

==============================================================================
Cycle counts and throughput for low-level routines in GNU MP as currently
implemented.

A range means that the timing is data-dependent.  The slower number of such
an interval is usually the best performance estimate.

The throughput value, measured in Gb/s (gigabits per second) has a meaning
only for comparison between CPUs.

A star before a line means that all values on that line are estimates.  A
star before a number means that that number is an estimate.  A `p' before a
number means that the code is not complete, but the timing is believed to be
accurate.

	    |	mpn_lshift	mpn_add_n	mpn_mul_1	mpn_addmul_1
	    |	mpn_rshift	mpn_sub_n			mpn_submul_1
------------+-----------------------------------------------------------------
DEC/Alpha   |
EV4	    |	4.75 cycles/64b	7.75 cycles/64b	42 cycles/64b	42 cycles/64b
  200MHz    |	2.7 Gb/s	1.65 Gb/s	20 Gb/s		20 Gb/s
EV5 old code|	4.0 cycles/64b	5.5 cycles/64b	18 cycles/64b	18 cycles/64b
  267MHz    |	4.27 Gb/s	3.10 Gb/s	61 Gb/s		61 Gb/s
  417MHz    |	6.67 Gb/s	4.85 Gb/s	95 Gb/s		95 Gb/s
EV5 tuned   |	3.25 cycles/64b	4.75 cycles/64b
  267MHz    |	5.25 Gb/s	3.59 Gb/s		as above
  417MHz    |	8.21 Gb/s	5.61 Gb/s
------------+-----------------------------------------------------------------
Sun/SPARC   |
SPARC v7    |	14.0 cycles/32b	8.5 cycles/32b	37-54 cycl/32b	37-54 cycl/32b
SuperSPARC  |	3 cycles/32b	2.5 cycles/32b	8.2 cycles/32b	10.8 cycles/32b
  50MHz	    |	0.53 Gb/s	0.64 Gb/s	6.2 Gb/s	4.7 Gb/s
**SuperSPARC|		tuned addmul and submul will take:	9.25 cycles/32b
MicroSPARC2 |	?		6.65 cycles/32b	30 cycles/32b	31.5 cycles/32b
  110MHz    |	?		0.53 Gb/s	3.75 Gb/s	3.58 Gb/s
SuperSPARC2 |	?		?		?		?
Ultra/32 (4)|	2.5 cycles/32b	6.5 cycles/32b	13-27 cyc/32b	16-30 cyc/32b
  182MHz    |	2.33 Gb/s	0.896 Gb/s	14.3-6.9 Gb/s
Ultra/64 (5)|	2.5 cycles/64b	10 cycles/64b	40-70 cyc/64b	46-76 cyc/64b
  182MHz    |	4.66 Gb/s	1.16 Gb/s	18.6-11 Gb/s
HalSPARC64  |	?		?		?		?
------------+-----------------------------------------------------------------
SGI/MIPS    |
R3000	    |	6 cycles/32b	9.25 cycles/32b	16 cycles/32b	16 cycles/32b
  40MHz     |	0.21 Gb/s	0.14 Gb/s	2.56 Gb/s	2.56 Gb/s
R4400/32    |	8.6 cycles/32b	10 cycles/32b	16-18		19-21
  200MHz    |	0.74 Gb/s	0.64 Gb/s	13-11 Gb/s	11-9.6 Gb/s
*R4400/64   |	8.6 cycles/64b	10 cycles/64b	22 cycles/64b	22 cycles/64b
  *200MHz   |	1.48 Gb/s	1.28 Gb/s	37 Gb/s		37 Gb/s
R4600/32    |	6 cycles/64b	9.25 cycles/32b	15 cycles/32b	19 cycles/32b
  134MHz    |	0.71 Gb/s	0.46 Gb/s	9.1 Gb/s	7.2 Gb/s
R4600/64    |	6 cycles/64b	9.25 cycles/64b	?		?
  134MHz    |	1.4 Gb/s	0.93 Gb/s	?		?
R8000/64    |	3 cycles/64b	4.6 cycles/64b	8 cycles/64b	8 cycles/64b
  75MHz	    |	1.6 Gb/s	1.0 Gb/s	38 Gb/s		38 Gb/s
*R10000/64  |	2 cycles/64b	3 cycles/64b	11 cycles/64b	11 cycles/64b
  *200MHz   |	6.4 Gb/s	4.27 Gb/s	74 Gb/s		74 Gb/s
  *250MHz   |	8.0 Gb/s	5.33 Gb/s	93 Gb/s		93 Gb/s
------------+-----------------------------------------------------------------
Motorola    |
MC68020     |	?		24 cycles/32b	62 cycles/32b	70 cycles/32b
MC68040     |	?		6 cycles/32b	24 cycles/32b	25 cycles/32b
MC88100	    |	>5 cycles/32b	4.6 cycles/32b	16/21 cyc/32b	p 18/23 cyc/32b
MC88110  wt |	?		3.75 cycles/32b	6 cycles/32b	8.5 cyc/32b
*MC88110 wb |	?		2.25 cycles/32b	4 cycles/32b	5 cycles/32b
------------+-----------------------------------------------------------------
HP/PA-RISC  |
PA7000	    |	4 cycles/32b	5 cycles/32b	9 cycles/32b	11 cycles/32b
  67MHz	    |	0.53 Gb/s	0.43 Gb/s	7.6 Gb/s	6.2 Gb/s
PA7100	    |	3.25 cycles/32b	4.25 cycles/32b	7 cycles/32b	8 cycles/32b
  99MHz	    |	0.97 Gb/s	0.75 Gb/s	14 Gb/s		12.8 Gb/s
PA7100LC    |	?		?		?		?
PA7200  (3) |	3 cycles/32b	4 cycles/32b	7 cycles/32b	6.5 cycles/32b
  100MHz    |	1.07 Gb/s	0.80		14 Gb/s		15.8 Gb/s
PA7300LC    |	?		?		?		?
*PA8000	    |	3 cycles/64b	4 cycles/64b	7 cycles/64b	6.5 cycles/64b
  180MHz    |	3.84 Gb/s	2.88 Gb/s	105 Gb/s	113 Gb/s
------------+-----------------------------------------------------------------
Intel/x86   |
386DX	    |	20 cycles/32b	17 cycles/32b	41-70 cycl/32b	50-79 cycl/32b
  16.7MHz   |	0.027 Gb/s	0.031 Gb/s	0.42-0.24 Gb/s	0.34-0.22 Gb/s
486DX	    |	?		?		?		?
486DX4	    |	9.5 cycles/32b	9.25 cycles/32b	17-23 cycl/32b	20-26 cycl/32b
  100MHz    |	0.34 Gb/s	0.35 Gb/s	6.0-4.5 Gb/s	5.1-3.9 Gb/s
Pentium     |	2/6 cycles/32b	2.5 cycles/32b	13 cycles/32b	14 cycles/32b
  167MHz    |	2.7/0.89 Gb/s	2.1 Gb/s	13.1 Gb/s	12.2 Gb/s
Pentium Pro |	2.5 cycles/32b	3.5 cycles/32b	6 cycles/32b	9 cycles/32b
  200MHz    |	2.6 Gb/s	1.8 Gb/s	34 Gb/s		23 Gb/s
------------+-----------------------------------------------------------------
IBM/POWER   |
RIOS 1	    |	3 cycles/32b	4 cycles/32b	11.5-12.5 c/32b	14.5/15.5 c/32b
RIOS 2	    |	2 cycles/32b	2 cycles/32b	7 cycles/32b	8.5 cycles/32b
------------+-----------------------------------------------------------------
PowerPC	    |
PPC601  (1) |	3 cycles/32b	6 cycles/32b	11-16 cycl/32b	14-19 cycl/32b
PPC601  (2) |	5 cycles/32b	6 cycles/32b	13-22 cycl/32b	16-25 cycl/32b
  67MHz (2) |	0.43 Gb/s	0.36 Gb/s	5.3-3.0 Gb/s	4.3-2.7 Gb/s
PPC603	    |	?		?		?		?
*PPC604	    |	2		3		2		3
  *167MHz   |							57 Gb/s
PPC620	    |	?		?		?		?
------------+-----------------------------------------------------------------
Tege	    |
Model 1	    |	2 cycles/64b	3 cycles/64b	2 cycles/64b	3 cycles/64b
  250MHz    |	8 Gb/s		5.3 Gb/s	500 Gb/s	340 Gb/s
  500MHz    |	16 Gb/s		11 Gb/s		1000 Gb/s	680 Gb/s
____________|_________________________________________________________________
(1) Using POWER and PowerPC instructions
(2) Using only PowerPC instructions
(3) Actual timing for shift/add/sub depends on code alignment.  PA7000 code
    is smaller and therefore often faster on this CPU.
(4) Multiplication routines modified for bogus UltraSPARC early-out
    optimization.  Smaller operand is put in rs1, not rs2 as it should
    according to the SPARC architecture manuals.
(5) Preliminary timings, since there is no stable 64-bit environment.
(6) Use mulu.d at least for mpn_lshift.  With mak/extu/or, we can only get
    to 2 cycles/32b.

=============================================================================
Estimated theoretical asymptotic cycle counts for low-level routines:

	    |	mpn_lshift	mpn_add_n	mpn_mul_1	mpn_addmul_1
	    |	mpn_rshift	mpn_sub_n			mpn_submul_1
------------+-----------------------------------------------------------------
DEC/Alpha   |
EV4	    |	3 cycles/64b	5 cycles/64b	42 cycles/64b	42 cycles/64b
EV5	    |	3 cycles/64b	4 cycles/64b	18 cycles/64b	18 cycles/64b
------------+-----------------------------------------------------------------
Sun/SPARC   |
SuperSPARC  |	2.5 cycles/32b	2 cycles/32b	8 cycles/32b	9 cycles/32b
------------+-----------------------------------------------------------------
SGI/MIPS    |
R4400/32    |	5 cycles/64b	8 cycles/64b	16 cycles/64b	16 cycles/64b
R4400/64    |	5 cycles/64b	8 cycles/64b	22 cycles/64b	22 cycles/64b
R4600	    |
------------+-----------------------------------------------------------------
HP/PA-RISC  |
PA7100	    |	3 cycles/32b	4 cycles/32b	6.5 cycles/32b	7.5 cycles/32b
PA7100LC    |
------------+-----------------------------------------------------------------
Motorola    |
MC88110	    |	1.5 cyc/32b (6)	1.5 cycle/32b	1.5 cycles/32b	2.25 cycles/32b
------------+-----------------------------------------------------------------
Intel/x86   |
486DX4	    |
Pentium P5x |	5 cycles/32b	2 cycles/32b	11.5 cycles/32b	13 cycles/32b
Pentium Pro |	2 cycles/32b	3 cycles/32b	4 cycles/32b	6 cycles/32b
------------+-----------------------------------------------------------------
IBM/POWER   |
RIOS 1	    |	3 cycles/32b	4 cycles/32b
RIOS 2	    |	1.5 cycles/32b	2 cycles/32b	4.5 cycles/32b	5.5 cycles/32b
------------+-----------------------------------------------------------------
PowerPC	    |
PPC601  (1) |	3 cycles/32b	?4 cycles/32b
PPC601  (2) |	4 cycles/32b	?4 cycles/32b
____________|_________________________________________________________________