[BACK]Return to projects.html CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / doc

Annotation of OpenXM_contrib/gmp/doc/projects.html, Revision 1.1.1.2

1.1       maekawa     1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
                      2: <html>
                      3: <head>
                      4:   <title>
                      5:     GMP Development Projects
                      6:   </title>
                      7: </head>
                      8: <body bgcolor=pink>
                      9:
                     10: <center>
                     11:   <h1>
                     12:     GMP Development Projects
                     13:   </h1>
                     14: </center>
                     15:
1.1.1.2 ! ohara      16: <font size=-1>
        !            17: Copyright 2000, 2001, 2002 Free Software Foundation, Inc. <br><br>
        !            18: This file is part of the GNU MP Library. <br><br>
        !            19: The GNU MP Library is free software; you can redistribute it and/or modify
        !            20: it under the terms of the GNU Lesser General Public License as published
        !            21: by the Free Software Foundation; either version 2.1 of the License, or (at
        !            22: your option) any later version. <br><br>
        !            23: The GNU MP Library is distributed in the hope that it will be useful, but
        !            24: WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
        !            25: or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
        !            26: License for more details. <br><br>
        !            27: You should have received a copy of the GNU Lesser General Public License
        !            28: along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
        !            29: the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
        !            30: MA 02111-1307, USA.
        !            31: </font>
        !            32:
        !            33: <hr>
        !            34: <!-- NB. timestamp updated automatically by emacs -->
1.1       maekawa    35: <comment>
1.1.1.2 ! ohara      36:   This file current as of 14 May 2002.  An up-to-date version is available at
1.1       maekawa    37:   <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
1.1.1.2 ! ohara      38:   Please send comments about this page to
        !            39:   <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
1.1       maekawa    40: </comment>
                     41:
                     42: <p> This file lists projects suitable for volunteers.  Please see the
                     43:     <a href="tasks.html">tasks file</a> for smaller tasks.
                     44:
                     45: <p> If you want to work on any of the projects below, please let tege@swox.com
                     46:     know.  If you want to help with a project that already somebody else is
                     47:     working on, please talk to that person too, tege@swox.com can put you in
                     48:     touch.  (There are no email addresses of volunteers below, due to spamming
                     49:     problems.)
                     50:
                     51: <ul>
1.1.1.2 ! ohara      52: <li> <strong>Faster multiplication</strong>
1.1       maekawa    53:
                     54:   <p> The current multiplication code uses Karatsuba, 3-way Toom-Cook,
                     55:       or Fermat FFT.  Several new developments are desirable:
                     56:
                     57:   <ol>
                     58:     <li> Handle multiplication of operands with different digit count
                     59:         better than today.  We now split the operands in a very
                     60:         inefficient way, see mpn/generic/mul.c.
                     61:
                     62:     <li> Consider N-way Toom-Cook.  See Knuth's Seminumerical
                     63:         Algorithms for details on the method.  Code implementing it
                     64:         exists.  This is asymptotically inferior to FFTs, but is finer
                     65:         grained.  A toom-4 might fit in between toom-3 and the FFTs
                     66:         (or it might not).
                     67:
                     68:     <li> It's possible CPU dependent effects like cache locality will
                     69:          have a greater impact on speed than algorithmic improvements.
                     70:
1.1.1.2 ! ohara      71:     <li> Add support for partial products, either a given number of low limbs
        !            72:          or high limbs of the result.  A high partial product can be used by
        !            73:          <code>mpf_mul</code> now, a low half partial product might be of use
        !            74:          in a future sub-quadratic REDC.  On small sizes a partial product
        !            75:          will be faster simply through fewer cross-products, similar to the
        !            76:          way squaring is faster.  But work by Thom Mulders shows that for
        !            77:          Karatsuba and higher order algorithms the advantage is progressively
        !            78:          lost, so for large sizes partial products turn out to be no faster.
        !            79:
1.1       maekawa    80:   </ol>
                     81:
1.1.1.2 ! ohara      82:   <p> Another possibility would be an optimized cube.  In the basecase that
        !            83:       should definitely be able to save cross-products in a similar fashion to
        !            84:       squaring, but some investigation might be needed for how best to adapt
        !            85:       the higher-order algorithms.  Not sure whether cubing or further small
        !            86:       powers have any particularly important uses though.
1.1       maekawa    87:
                     88: <p> <li> <strong>Assembly routines</strong>
                     89:
1.1.1.2 ! ohara      90:   <p> Write new and improve existing assembly routines.  The tests/devel
        !            91:   programs and the tune/speed.c and tune/many.pl programs are useful for
        !            92:   testing and timing the routines you write.  See the README files in those
        !            93:   directories for more information.
1.1       maekawa    94:
                     95:   <p> Please make sure your new routines are fast for these three situations:
                     96:   <ol>
                     97:     <li> Operands that fit into the cache.
                     98:     <li> Small operands of less than, say, 10 limbs.
                     99:     <li> Huge operands that does not fit into the cache.
                    100:   </ol>
                    101:
                    102:   <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
                    103:   mpn_sqr_basecase.  The latter two don't exist for all machines, while
                    104:   mpn_addmul_1 exists for almost all machines.
                    105:
                    106:   <p> Standard techniques for these routines are unrolling, software
                    107:   pipelining, and specialization for common operand values.  For machines with
                    108:   poor integer multiplication, it is often possible to improve the performance
                    109:   using floating-point operations, or SIMD operations such as MMX or Sun's VIS.
                    110:
                    111:   <p> Using floating-point operations is interesting but somewhat tricky.
                    112:   Since IEEE double has 53 bit of mantissa, one has to split the operands in
                    113:   small prices, so that no result is greater than 2^53.  For 32-bit computers,
                    114:   splitting one operand into 16-bit pieces works.  For 64-bit machines, one
                    115:   operand can be split into 21-bit pieces and the other into 32-bit pieces.  (A
                    116:   64-bit operand can be split into just three 21-bit pieces if one allows the
                    117:   split operands to be negative!)
                    118:
                    119:
                    120: <p> <li> <strong>Faster extended GCD</strong>
                    121:
                    122:   <p> We currently have extended GCD based on Lehmer's method.
                    123:   But the binary method can quite easily be improved for bignums
                    124:   in a similar way Lehmer improved Euclid's algorithm.  The result should
                    125:   beat Lehmer's method by about a factor of 2.  Furthermore, the multiprecision
                    126:   step of Lehmer's method and the binary method will be identical, so the
                    127:   current code can mostly be reused.  It should be possible to share code
                    128:   between GCD and GCDEXT, and probably with Jacobi symbols too.
                    129:
1.1.1.2 ! ohara     130:   <p> Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems
        !           131:   that the most likely algorithm doesn't kick in until about 3000 limbs.
        !           132:
1.1       maekawa   133: <p> <li> <strong>Math functions for the mpf layer</strong>
                    134:
                    135:   <p> Implement the functions of math.h for the GMP mpf layer!  Check the book
                    136:   "Pi and the AGM" by Borwein and Borwein for ideas how to do this.  These
                    137:   functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos,
                    138:   cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
                    139:
1.1.1.2 ! ohara     140:   <p> These are implemented in mpfr, redoing them in mpf might not be worth
        !           141:   bothering with, if the long term plan is to bring mpfr in as the new mpf.
        !           142:
1.1       maekawa   143: <p> <li> <strong>Faster sqrt</strong>
                    144:
1.1.1.2 ! ohara     145:   <p> The current code uses divisions, which are reasonably fast, but it'd be
        !           146:   possible to use only multiplications by computing 1/sqrt(A) using this
        !           147:   formula:
1.1       maekawa   148:   <pre>
                    149:                                     2
                    150:                    x   = x  (3 - A x )/2.
                    151:                     i+1   i         i  </pre>
1.1.1.2 ! ohara     152:   And the final result
1.1       maekawa   153:   <pre>
                    154:                      sqrt(A) = A x .
                    155:                                   n  </pre>
1.1.1.2 ! ohara     156:   That final multiply might be the full size of the input (though it might
        !           157:   only need the high half of that), so there may or may not be any speedup
        !           158:   overall.
1.1       maekawa   159:
                    160: <p> <li> <strong>Nth root</strong>
                    161:
                    162:   <p> Implement, at the mpn level, a routine for computing the nth root of a
                    163:   number.  The routine should use Newton's method, preferably without using
                    164:   division.
                    165:
                    166:   <p> If the routine becomes fast enough, perhaps square roots could be computed
                    167:   using this function.
                    168:
1.1.1.2 ! ohara     169: <p> <li> <strong>Quotient-Only Division</strong>
1.1       maekawa   170:
1.1.1.2 ! ohara     171:   <p> Some work can be saved when only the quotient is required in a division,
        !           172:       basically the necessary correction -0, -1 or -2 to the estimated
        !           173:       quotient can almost always be determined from only a few limbs of
        !           174:       multiply and subtract, rather than forming a complete remainder.  The
        !           175:       greatest savings are when the quotient is small compared to the dividend
        !           176:       and divisor.
        !           177:
        !           178:   <p> Some code along these lines can be found in the current
        !           179:       <code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of
        !           180:       remainder than might be strictly necessary.  That function in its
        !           181:       current form actually then always goes on to calculate a full remainder.
        !           182:       Burnikel and Zeigler describe a similar approach for the divide and
        !           183:       conquer case.
        !           184:
        !           185: <p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong>
        !           186:
        !           187:   <p> <code>mpn_bdivmod</code> and the <code>redc</code> in
        !           188:       <code>mpz_powm</code> should use some sort of divide and conquer
        !           189:       algorithm.  This would benefit <code>mpz_divexact</code>, and
        !           190:       <code>mpn_gcd</code> on large unequal size operands.  See "Exact
        !           191:       Division with Karatsuba Complexity" by Jebelean for a (brief)
        !           192:       description.
        !           193:
        !           194:   <p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be
        !           195:       added to control whether <code>mpz_divexact</code> uses
        !           196:       <code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter
        !           197:       is faster on large divisors.
        !           198:
        !           199:   <p> For the REDC, basically all that's needed is Montgomery's algorithm done
        !           200:       in multi-limb integers.  R is multiple limbs, and the inverse and
        !           201:       operands are multi-precision.
        !           202:
        !           203:   <p> For exact division the time to calculate a multi-limb inverse is not
        !           204:       amortized across many modular operations, but instead will probably
        !           205:       create a threshold below which the current style
        !           206:       <code>mpn_bdivmod</code> is best.  There's also Krandick and Jebelean,
        !           207:       "Bidirectional Exact Integer Division" to basically use a low to high
        !           208:       exact division for the low half quotient, and a quotient-only division
        !           209:       for the high half.
        !           210:
        !           211:   <p> It will be noted that low-half and high-half multiplies, and a low-half
        !           212:       square, can be used.  These ought to each take as little as half the
        !           213:       time of a full multiply, or square, though work by Thom Mulders shows
        !           214:       the advantage is progressively lost as Karatsuba and higher algorithms
        !           215:       are applied.
        !           216:
        !           217: <p> <li> <strong>Exceptions</strong>
        !           218:
        !           219:   <p> Some sort of scheme for exceptions handling would be desirable.
        !           220:       Presently the only thing documented is that divide by zero in GMP
        !           221:       functions provokes a deliberate machine divide by zero (on those systems
        !           222:       where such a thing exists at least).  The global <code>gmp_errno</code>
        !           223:       is not actually documented, except for the old <code>gmp_randinit</code>
        !           224:       function.  Being currently just a plain global means it's not
        !           225:       thread-safe.
        !           226:
        !           227:   <p> The basic choices for exceptions are returning an error code or having
        !           228:       a handler function to be called.  The disadvantage of error returns is
        !           229:       they have to be checked, leading to tedious and rarely executed code,
        !           230:       and strictly speaking such a scheme wouldn't be source or binary
        !           231:       compatible.  The disadvantage of a handler function is that a
        !           232:       <code>longjmp</code> or similar recovery from it may be difficult.  A
        !           233:       combination would be possible, for instance by allowing the handler to
        !           234:       return an error code.
        !           235:
        !           236:   <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
        !           237:       normally be detected at the start of functions, so exception handling
        !           238:       would have a clean state.  What's worth considering though is that the
        !           239:       GMP function detecting the exception may have been called via some third
        !           240:       party library or self contained application module, and hence have
        !           241:       various bits of state to be cleaned up above it.  It'd be highly
        !           242:       desirable for an exceptions scheme to allow for such cleanups.
        !           243:
        !           244:   <p> The C++ destructor mechanism could help with cleanups both internally
        !           245:       and externally, but being a plain C library we don't want to depend on
        !           246:       that.
        !           247:
        !           248:   <p> A C++ <code>throw</code> might be a good optional extra exceptions
        !           249:       mechanism, perhaps under a build option.  For GCC
        !           250:       <code>-fexceptions</code> will add the necessary frame information to
        !           251:       plain C code, or GMP could be compiled as C++.
        !           252:
        !           253:   <p> Out-of-memory exceptions are expected to be handled by the
        !           254:       <code>mp_set_memory_functions</code> routines, rather than being a
        !           255:       prospective part of divide-by-zero etc.  Some similar considerations
        !           256:       apply but what differs is that out-of-memory can arise deep within GMP
        !           257:       internals.  Even fundamental routines like <code>mpn_add_n</code> and
        !           258:       <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
        !           259:       vector systems).  Allowing for an error code return would require an
        !           260:       awful lot of checking internally.  Perhaps it'd still be worthwhile, but
        !           261:       it'd be a lot of changes and the extra code would probably be rather
        !           262:       rarely executed in normal usages.
        !           263:
        !           264:   <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
        !           265:       general, lead to memory leaks and may leave GMP variables operated on in
        !           266:       inconsistent states.  Maybe it'd be possible to record recovery
        !           267:       information for use by the relevant allocate or reallocate function, but
        !           268:       that too would be a lot of changes.
        !           269:
        !           270:   <p> One scheme for out-of-memory would be to note that all GMP allocations
        !           271:       go through the <code>mp_set_memory_functions</code> routines.  So if the
        !           272:       application has an intended <code>setjmp</code> recovery point it can
        !           273:       record memory activity by GMP and abandon space allocated and variables
        !           274:       initialized after that point.  This might be as simple as directing the
        !           275:       allocation functions to a separate pool, but in general would have the
        !           276:       disadvantage of needing application-level bookkeeping on top of the
        !           277:       normal system <code>malloc</code>.  An advantage however is that it
        !           278:       needs nothing from GMP itself and on that basis doesn't burden
        !           279:       applications not needing recovery.  Note that there's probably some
        !           280:       details to be worked out here about reallocs of existing variables, and
        !           281:       perhaps about copying or swapping between "permanent" and "temporary"
        !           282:       variables.
        !           283:
        !           284:   <p> Applications desiring a fine-grained error control, for instance a
        !           285:       language interpreter, would very possibly not be well served by a scheme
        !           286:       requiring <code>longjmp</code>.  Wrapping every GMP function call with a
        !           287:       <code>setjmp</code> would be very inconvenient.
        !           288:
        !           289:   <p> Stack overflow is another possible exception, but perhaps not one that
        !           290:       can be easily detected in general.  On i386 GNU/Linux for instance GCC
        !           291:       normally doesn't generate stack probes for an <code>alloca</code>, but
        !           292:       merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
        !           293:       miss the stack redzone and hit arbitrary data.  GMP stack usage is
        !           294:       normally a function of operand size, knowing that might suffice for some
        !           295:       applications.  Otherwise a fixed maximum usage can probably be obtained
        !           296:       by building with <code>--enable-alloca=malloc-reentrant</code> (or
        !           297:       <code>notreentrant</code>).
        !           298:
        !           299:   <p> Actually recovering from stack overflow is of course another problem.
        !           300:       It might be possible to catch a <code>SIGSEGV</code> in the stack
        !           301:       redzone and do something in a <code>sigaltstack</code>, on systems which
        !           302:       have that, but recovery might otherwise not be possible.  This is worth
        !           303:       bearing in mind because there's no point worrying about tight and
        !           304:       careful out-of-memory recovery if an out-of-stack is fatal.
1.1       maekawa   305:
                    306:
                    307: <p> <li> <strong>Test Suite</strong>
                    308:
                    309:   <p> Add a test suite for old bugs.  These tests wouldn't loop or use
                    310:       random numbers, but just test for specific bugs found in the
                    311:       past.
                    312:
                    313:   <p> More importantly, improve the random number controls for test
                    314:       collections:
                    315:
                    316:       <ol>
                    317:         <li> Use the new random number framework.
                    318:         <li> Have every test accept a seed parameter.
                    319:         <li> Allow `make check' to set the seed parameter.
                    320:         <li> Add more tests for important, now untested functions.
                    321:       </ol>
                    322:
                    323:   <p> With this in place, it should be possible to rid GMP of
                    324:       practically all bugs by having some dedicated GMP test machines
                    325:       running "make check" with ever increasing seeds (and
                    326:       periodically updating to the latest GMP).
                    327:
                    328:   <p> If a few more ASSERTs were sprinkled through the code, running
                    329:       some tests with --enable-assert might be useful, though not a
                    330:       substitute for tests on the normal build.
                    331:
                    332:   <p> An important feature of any random tests will be to record the
                    333:       seeds used, and perhaps to snapshot operands before performing
                    334:       each test, so any problem exposed can be reproduced.
                    335:
                    336:
1.1.1.2 ! ohara     337: <p> <li> <strong>Performance Tool</strong>
        !           338:
        !           339:   <p> It'd be nice to have some sort of tool for getting an overview of
        !           340:       performance.  Clearly a great many things could be done, but some
        !           341:       primary uses would be,
        !           342:
        !           343:       <ol>
        !           344:         <li> Checking speed variations between compilers.
        !           345:         <li> Checking relative performance between systems or CPUs.
        !           346:       </ol>
        !           347:
        !           348:   <p> A combination of measuring some fundamental routines and some
        !           349:       representative application routines might satisfy these.
1.1       maekawa   350:
1.1.1.2 ! ohara     351:   <p> The tune/time.c routines would be the easiest way to get good
        !           352:       accurate measurements on lots of different systems.  The high level
        !           353:       <code>speed_measure</code> may or may not suit, but the basic
        !           354:       <code>speed_starttime</code> and <code>speed_endtime</code> would cover
        !           355:       lots of portability and accuracy questions.
        !           356:
        !           357:
        !           358: </ul>
        !           359: <hr>
1.1       maekawa   360:
                    361: </body>
                    362: </html>
1.1.1.2 ! ohara     363:
        !           364: <!--
        !           365: Local variables:
        !           366: eval: (add-hook 'write-file-hooks 'time-stamp)
        !           367: time-stamp-start: "This file current as of "
        !           368: time-stamp-format: "%:d %3b %:y"
        !           369: time-stamp-end: "\\."
        !           370: time-stamp-line-limit: 50
        !           371: End:
        !           372: -->

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>