Annotation of OpenXM_contrib/gmp/mpn/pa32/README, Revision 1.1.1.1
1.1 ohara 1: Copyright 1996, 1999, 2001 Free Software Foundation, Inc.
2:
3: This file is part of the GNU MP Library.
4:
5: The GNU MP Library is free software; you can redistribute it and/or modify
6: it under the terms of the GNU Lesser General Public License as published by
7: the Free Software Foundation; either version 2.1 of the License, or (at your
8: option) any later version.
9:
10: The GNU MP Library is distributed in the hope that it will be useful, but
11: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
12: or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
13: License for more details.
14:
15: You should have received a copy of the GNU Lesser General Public License
16: along with the GNU MP Library; see the file COPYING.LIB. If not, write to
17: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
18: 02111-1307, USA.
19:
20:
21:
22:
23:
24:
25: This directory contains mpn functions for various HP PA-RISC chips. Code
26: that runs faster on the PA7100 and later implementations, is in the pa7100
27: directory.
28:
29: RELEVANT OPTIMIZATION ISSUES
30:
31: Load and Store timing
32:
33: On the PA7000 no memory instructions can issue the two cycles after a store.
34: For the PA7100, this is reduced to one cycle.
35:
36: The PA7100 has a lookup-free cache, so it helps to schedule loads and the
37: dependent instruction really far from each other.
38:
39: STATUS
40:
41: 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
42: instructions below (but some sw pipelining is needed to avoid the
43: xmpyu-fstds delay):
44:
45: fldds s1_ptr
46:
47: xmpyu
48: fstds N(%r30)
49: xmpyu
50: fstds N(%r30)
51:
52: ldws N(%r30)
53: ldws N(%r30)
54: ldws N(%r30)
55: ldws N(%r30)
56:
57: addc
58: stws res_ptr
59: addc
60: stws res_ptr
61:
62: addib Loop
63:
64: 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
65: (asymptotically) on the PA7100, using the instructions below. With proper
66: sw pipelining and the unrolling level below, the speed becomes 8
67: cycles/limb.
68:
69: fldds s1_ptr
70: fldds s1_ptr
71:
72: xmpyu
73: fstds N(%r30)
74: xmpyu
75: fstds N(%r30)
76: xmpyu
77: fstds N(%r30)
78: xmpyu
79: fstds N(%r30)
80:
81: ldws N(%r30)
82: ldws N(%r30)
83: ldws N(%r30)
84: ldws N(%r30)
85: ldws N(%r30)
86: ldws N(%r30)
87: ldws N(%r30)
88: ldws N(%r30)
89: addc
90: addc
91: addc
92: addc
93: addc %r0,%r0,cy-limb
94:
95: ldws res_ptr
96: ldws res_ptr
97: ldws res_ptr
98: ldws res_ptr
99: add
100: stws res_ptr
101: addc
102: stws res_ptr
103: addc
104: stws res_ptr
105: addc
106: stws res_ptr
107:
108: addib
109:
110: 3. For the PA8000 we have to stick to using 32-bit limbs before compiler
111: support emerges. But we want to use 64-bit operations whenever possible,
112: in particular for loads and stores. It is possible to handle mpn_add_n
113: efficiently by rotating (when s1/s2 are aligned), masking+bit field
114: inserting when (they are not). The speed should double compared to the
115: code used today.
116:
117:
118:
119:
120: LABEL SYNTAX
121:
122: The HP-UX assembler takes labels starting in column 0 with no colon,
123:
124: L$loop ldws,mb -4(0,%r25),%r22
125:
126: Gas on hppa GNU/Linux however requires a colon,
127:
128: L$loop: ldws,mb -4(0,%r25),%r22
129:
130: Fortunately both accept a ".label" pseudo-op,
131:
132: .label L$loop
133: ldws,mb -4(0,%r25),%r22
134:
135:
136:
137:
138:
139: ----------------
140: Local variables:
141: mode: text
142: fill-column: 76
143: End:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>