Annotation of OpenXM_contrib/gmp/doc/projects.html, Revision 1.1.1.2
1.1 maekawa 1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
2: <html>
3: <head>
4: <title>
5: GMP Development Projects
6: </title>
7: </head>
8: <body bgcolor=pink>
9:
10: <center>
11: <h1>
12: GMP Development Projects
13: </h1>
14: </center>
15:
1.1.1.2 ! ohara 16: <font size=-1>
! 17: Copyright 2000, 2001, 2002 Free Software Foundation, Inc. <br><br>
! 18: This file is part of the GNU MP Library. <br><br>
! 19: The GNU MP Library is free software; you can redistribute it and/or modify
! 20: it under the terms of the GNU Lesser General Public License as published
! 21: by the Free Software Foundation; either version 2.1 of the License, or (at
! 22: your option) any later version. <br><br>
! 23: The GNU MP Library is distributed in the hope that it will be useful, but
! 24: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
! 25: or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
! 26: License for more details. <br><br>
! 27: You should have received a copy of the GNU Lesser General Public License
! 28: along with the GNU MP Library; see the file COPYING.LIB. If not, write to
! 29: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
! 30: MA 02111-1307, USA.
! 31: </font>
! 32:
! 33: <hr>
! 34: <!-- NB. timestamp updated automatically by emacs -->
1.1 maekawa 35: <comment>
1.1.1.2 ! ohara 36: This file current as of 14 May 2002. An up-to-date version is available at
1.1 maekawa 37: <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
1.1.1.2 ! ohara 38: Please send comments about this page to
! 39: <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
1.1 maekawa 40: </comment>
41:
42: <p> This file lists projects suitable for volunteers. Please see the
43: <a href="tasks.html">tasks file</a> for smaller tasks.
44:
45: <p> If you want to work on any of the projects below, please let tege@swox.com
46: know. If you want to help with a project that already somebody else is
47: working on, please talk to that person too, tege@swox.com can put you in
48: touch. (There are no email addresses of volunteers below, due to spamming
49: problems.)
50:
51: <ul>
1.1.1.2 ! ohara 52: <li> <strong>Faster multiplication</strong>
1.1 maekawa 53:
54: <p> The current multiplication code uses Karatsuba, 3-way Toom-Cook,
55: or Fermat FFT. Several new developments are desirable:
56:
57: <ol>
58: <li> Handle multiplication of operands with different digit count
59: better than today. We now split the operands in a very
60: inefficient way, see mpn/generic/mul.c.
61:
62: <li> Consider N-way Toom-Cook. See Knuth's Seminumerical
63: Algorithms for details on the method. Code implementing it
64: exists. This is asymptotically inferior to FFTs, but is finer
65: grained. A toom-4 might fit in between toom-3 and the FFTs
66: (or it might not).
67:
68: <li> It's possible CPU dependent effects like cache locality will
69: have a greater impact on speed than algorithmic improvements.
70:
1.1.1.2 ! ohara 71: <li> Add support for partial products, either a given number of low limbs
! 72: or high limbs of the result. A high partial product can be used by
! 73: <code>mpf_mul</code> now, a low half partial product might be of use
! 74: in a future sub-quadratic REDC. On small sizes a partial product
! 75: will be faster simply through fewer cross-products, similar to the
! 76: way squaring is faster. But work by Thom Mulders shows that for
! 77: Karatsuba and higher order algorithms the advantage is progressively
! 78: lost, so for large sizes partial products turn out to be no faster.
! 79:
1.1 maekawa 80: </ol>
81:
1.1.1.2 ! ohara 82: <p> Another possibility would be an optimized cube. In the basecase that
! 83: should definitely be able to save cross-products in a similar fashion to
! 84: squaring, but some investigation might be needed for how best to adapt
! 85: the higher-order algorithms. Not sure whether cubing or further small
! 86: powers have any particularly important uses though.
1.1 maekawa 87:
88: <p> <li> <strong>Assembly routines</strong>
89:
1.1.1.2 ! ohara 90: <p> Write new and improve existing assembly routines. The tests/devel
! 91: programs and the tune/speed.c and tune/many.pl programs are useful for
! 92: testing and timing the routines you write. See the README files in those
! 93: directories for more information.
1.1 maekawa 94:
95: <p> Please make sure your new routines are fast for these three situations:
96: <ol>
97: <li> Operands that fit into the cache.
98: <li> Small operands of less than, say, 10 limbs.
99: <li> Huge operands that does not fit into the cache.
100: </ol>
101:
102: <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
103: mpn_sqr_basecase. The latter two don't exist for all machines, while
104: mpn_addmul_1 exists for almost all machines.
105:
106: <p> Standard techniques for these routines are unrolling, software
107: pipelining, and specialization for common operand values. For machines with
108: poor integer multiplication, it is often possible to improve the performance
109: using floating-point operations, or SIMD operations such as MMX or Sun's VIS.
110:
111: <p> Using floating-point operations is interesting but somewhat tricky.
112: Since IEEE double has 53 bit of mantissa, one has to split the operands in
113: small prices, so that no result is greater than 2^53. For 32-bit computers,
114: splitting one operand into 16-bit pieces works. For 64-bit machines, one
115: operand can be split into 21-bit pieces and the other into 32-bit pieces. (A
116: 64-bit operand can be split into just three 21-bit pieces if one allows the
117: split operands to be negative!)
118:
119:
120: <p> <li> <strong>Faster extended GCD</strong>
121:
122: <p> We currently have extended GCD based on Lehmer's method.
123: But the binary method can quite easily be improved for bignums
124: in a similar way Lehmer improved Euclid's algorithm. The result should
125: beat Lehmer's method by about a factor of 2. Furthermore, the multiprecision
126: step of Lehmer's method and the binary method will be identical, so the
127: current code can mostly be reused. It should be possible to share code
128: between GCD and GCDEXT, and probably with Jacobi symbols too.
129:
1.1.1.2 ! ohara 130: <p> Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems
! 131: that the most likely algorithm doesn't kick in until about 3000 limbs.
! 132:
1.1 maekawa 133: <p> <li> <strong>Math functions for the mpf layer</strong>
134:
135: <p> Implement the functions of math.h for the GMP mpf layer! Check the book
136: "Pi and the AGM" by Borwein and Borwein for ideas how to do this. These
137: functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos,
138: cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
139:
1.1.1.2 ! ohara 140: <p> These are implemented in mpfr, redoing them in mpf might not be worth
! 141: bothering with, if the long term plan is to bring mpfr in as the new mpf.
! 142:
1.1 maekawa 143: <p> <li> <strong>Faster sqrt</strong>
144:
1.1.1.2 ! ohara 145: <p> The current code uses divisions, which are reasonably fast, but it'd be
! 146: possible to use only multiplications by computing 1/sqrt(A) using this
! 147: formula:
1.1 maekawa 148: <pre>
149: 2
150: x = x (3 - A x )/2.
151: i+1 i i </pre>
1.1.1.2 ! ohara 152: And the final result
1.1 maekawa 153: <pre>
154: sqrt(A) = A x .
155: n </pre>
1.1.1.2 ! ohara 156: That final multiply might be the full size of the input (though it might
! 157: only need the high half of that), so there may or may not be any speedup
! 158: overall.
1.1 maekawa 159:
160: <p> <li> <strong>Nth root</strong>
161:
162: <p> Implement, at the mpn level, a routine for computing the nth root of a
163: number. The routine should use Newton's method, preferably without using
164: division.
165:
166: <p> If the routine becomes fast enough, perhaps square roots could be computed
167: using this function.
168:
1.1.1.2 ! ohara 169: <p> <li> <strong>Quotient-Only Division</strong>
1.1 maekawa 170:
1.1.1.2 ! ohara 171: <p> Some work can be saved when only the quotient is required in a division,
! 172: basically the necessary correction -0, -1 or -2 to the estimated
! 173: quotient can almost always be determined from only a few limbs of
! 174: multiply and subtract, rather than forming a complete remainder. The
! 175: greatest savings are when the quotient is small compared to the dividend
! 176: and divisor.
! 177:
! 178: <p> Some code along these lines can be found in the current
! 179: <code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of
! 180: remainder than might be strictly necessary. That function in its
! 181: current form actually then always goes on to calculate a full remainder.
! 182: Burnikel and Zeigler describe a similar approach for the divide and
! 183: conquer case.
! 184:
! 185: <p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong>
! 186:
! 187: <p> <code>mpn_bdivmod</code> and the <code>redc</code> in
! 188: <code>mpz_powm</code> should use some sort of divide and conquer
! 189: algorithm. This would benefit <code>mpz_divexact</code>, and
! 190: <code>mpn_gcd</code> on large unequal size operands. See "Exact
! 191: Division with Karatsuba Complexity" by Jebelean for a (brief)
! 192: description.
! 193:
! 194: <p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be
! 195: added to control whether <code>mpz_divexact</code> uses
! 196: <code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter
! 197: is faster on large divisors.
! 198:
! 199: <p> For the REDC, basically all that's needed is Montgomery's algorithm done
! 200: in multi-limb integers. R is multiple limbs, and the inverse and
! 201: operands are multi-precision.
! 202:
! 203: <p> For exact division the time to calculate a multi-limb inverse is not
! 204: amortized across many modular operations, but instead will probably
! 205: create a threshold below which the current style
! 206: <code>mpn_bdivmod</code> is best. There's also Krandick and Jebelean,
! 207: "Bidirectional Exact Integer Division" to basically use a low to high
! 208: exact division for the low half quotient, and a quotient-only division
! 209: for the high half.
! 210:
! 211: <p> It will be noted that low-half and high-half multiplies, and a low-half
! 212: square, can be used. These ought to each take as little as half the
! 213: time of a full multiply, or square, though work by Thom Mulders shows
! 214: the advantage is progressively lost as Karatsuba and higher algorithms
! 215: are applied.
! 216:
! 217: <p> <li> <strong>Exceptions</strong>
! 218:
! 219: <p> Some sort of scheme for exceptions handling would be desirable.
! 220: Presently the only thing documented is that divide by zero in GMP
! 221: functions provokes a deliberate machine divide by zero (on those systems
! 222: where such a thing exists at least). The global <code>gmp_errno</code>
! 223: is not actually documented, except for the old <code>gmp_randinit</code>
! 224: function. Being currently just a plain global means it's not
! 225: thread-safe.
! 226:
! 227: <p> The basic choices for exceptions are returning an error code or having
! 228: a handler function to be called. The disadvantage of error returns is
! 229: they have to be checked, leading to tedious and rarely executed code,
! 230: and strictly speaking such a scheme wouldn't be source or binary
! 231: compatible. The disadvantage of a handler function is that a
! 232: <code>longjmp</code> or similar recovery from it may be difficult. A
! 233: combination would be possible, for instance by allowing the handler to
! 234: return an error code.
! 235:
! 236: <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
! 237: normally be detected at the start of functions, so exception handling
! 238: would have a clean state. What's worth considering though is that the
! 239: GMP function detecting the exception may have been called via some third
! 240: party library or self contained application module, and hence have
! 241: various bits of state to be cleaned up above it. It'd be highly
! 242: desirable for an exceptions scheme to allow for such cleanups.
! 243:
! 244: <p> The C++ destructor mechanism could help with cleanups both internally
! 245: and externally, but being a plain C library we don't want to depend on
! 246: that.
! 247:
! 248: <p> A C++ <code>throw</code> might be a good optional extra exceptions
! 249: mechanism, perhaps under a build option. For GCC
! 250: <code>-fexceptions</code> will add the necessary frame information to
! 251: plain C code, or GMP could be compiled as C++.
! 252:
! 253: <p> Out-of-memory exceptions are expected to be handled by the
! 254: <code>mp_set_memory_functions</code> routines, rather than being a
! 255: prospective part of divide-by-zero etc. Some similar considerations
! 256: apply but what differs is that out-of-memory can arise deep within GMP
! 257: internals. Even fundamental routines like <code>mpn_add_n</code> and
! 258: <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
! 259: vector systems). Allowing for an error code return would require an
! 260: awful lot of checking internally. Perhaps it'd still be worthwhile, but
! 261: it'd be a lot of changes and the extra code would probably be rather
! 262: rarely executed in normal usages.
! 263:
! 264: <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
! 265: general, lead to memory leaks and may leave GMP variables operated on in
! 266: inconsistent states. Maybe it'd be possible to record recovery
! 267: information for use by the relevant allocate or reallocate function, but
! 268: that too would be a lot of changes.
! 269:
! 270: <p> One scheme for out-of-memory would be to note that all GMP allocations
! 271: go through the <code>mp_set_memory_functions</code> routines. So if the
! 272: application has an intended <code>setjmp</code> recovery point it can
! 273: record memory activity by GMP and abandon space allocated and variables
! 274: initialized after that point. This might be as simple as directing the
! 275: allocation functions to a separate pool, but in general would have the
! 276: disadvantage of needing application-level bookkeeping on top of the
! 277: normal system <code>malloc</code>. An advantage however is that it
! 278: needs nothing from GMP itself and on that basis doesn't burden
! 279: applications not needing recovery. Note that there's probably some
! 280: details to be worked out here about reallocs of existing variables, and
! 281: perhaps about copying or swapping between "permanent" and "temporary"
! 282: variables.
! 283:
! 284: <p> Applications desiring a fine-grained error control, for instance a
! 285: language interpreter, would very possibly not be well served by a scheme
! 286: requiring <code>longjmp</code>. Wrapping every GMP function call with a
! 287: <code>setjmp</code> would be very inconvenient.
! 288:
! 289: <p> Stack overflow is another possible exception, but perhaps not one that
! 290: can be easily detected in general. On i386 GNU/Linux for instance GCC
! 291: normally doesn't generate stack probes for an <code>alloca</code>, but
! 292: merely adjusts <code>%esp</code>. A big enough <code>alloca</code> can
! 293: miss the stack redzone and hit arbitrary data. GMP stack usage is
! 294: normally a function of operand size, knowing that might suffice for some
! 295: applications. Otherwise a fixed maximum usage can probably be obtained
! 296: by building with <code>--enable-alloca=malloc-reentrant</code> (or
! 297: <code>notreentrant</code>).
! 298:
! 299: <p> Actually recovering from stack overflow is of course another problem.
! 300: It might be possible to catch a <code>SIGSEGV</code> in the stack
! 301: redzone and do something in a <code>sigaltstack</code>, on systems which
! 302: have that, but recovery might otherwise not be possible. This is worth
! 303: bearing in mind because there's no point worrying about tight and
! 304: careful out-of-memory recovery if an out-of-stack is fatal.
1.1 maekawa 305:
306:
307: <p> <li> <strong>Test Suite</strong>
308:
309: <p> Add a test suite for old bugs. These tests wouldn't loop or use
310: random numbers, but just test for specific bugs found in the
311: past.
312:
313: <p> More importantly, improve the random number controls for test
314: collections:
315:
316: <ol>
317: <li> Use the new random number framework.
318: <li> Have every test accept a seed parameter.
319: <li> Allow `make check' to set the seed parameter.
320: <li> Add more tests for important, now untested functions.
321: </ol>
322:
323: <p> With this in place, it should be possible to rid GMP of
324: practically all bugs by having some dedicated GMP test machines
325: running "make check" with ever increasing seeds (and
326: periodically updating to the latest GMP).
327:
328: <p> If a few more ASSERTs were sprinkled through the code, running
329: some tests with --enable-assert might be useful, though not a
330: substitute for tests on the normal build.
331:
332: <p> An important feature of any random tests will be to record the
333: seeds used, and perhaps to snapshot operands before performing
334: each test, so any problem exposed can be reproduced.
335:
336:
1.1.1.2 ! ohara 337: <p> <li> <strong>Performance Tool</strong>
! 338:
! 339: <p> It'd be nice to have some sort of tool for getting an overview of
! 340: performance. Clearly a great many things could be done, but some
! 341: primary uses would be,
! 342:
! 343: <ol>
! 344: <li> Checking speed variations between compilers.
! 345: <li> Checking relative performance between systems or CPUs.
! 346: </ol>
! 347:
! 348: <p> A combination of measuring some fundamental routines and some
! 349: representative application routines might satisfy these.
1.1 maekawa 350:
1.1.1.2 ! ohara 351: <p> The tune/time.c routines would be the easiest way to get good
! 352: accurate measurements on lots of different systems. The high level
! 353: <code>speed_measure</code> may or may not suit, but the basic
! 354: <code>speed_starttime</code> and <code>speed_endtime</code> would cover
! 355: lots of portability and accuracy questions.
! 356:
! 357:
! 358: </ul>
! 359: <hr>
1.1 maekawa 360:
361: </body>
362: </html>
1.1.1.2 ! ohara 363:
! 364: <!--
! 365: Local variables:
! 366: eval: (add-hook 'write-file-hooks 'time-stamp)
! 367: time-stamp-start: "This file current as of "
! 368: time-stamp-format: "%:d %3b %:y"
! 369: time-stamp-end: "\\."
! 370: time-stamp-line-limit: 50
! 371: End:
! 372: -->
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>