This file lists projects suitable for volunteers. Please see the tasks file for smaller tasks.
If you want to work on any of the projects below, please let tege@swox.com know. If you want to help with a project that already somebody else is working on, please talk to that person too, tege@swox.com can put you in touch. (There are no email addresses of volunteers below, due to spamming problems.)
The current multiplication code uses Karatsuba, 3-way Toom-Cook, or Fermat FFT. Several new developments are desirable:
mpf_mul
now, a low half partial product might be of use
in a future sub-quadratic REDC. On small sizes a partial product
will be faster simply through fewer cross-products, similar to the
way squaring is faster. But work by Thom Mulders shows that for
Karatsuba and higher order algorithms the advantage is progressively
lost, so for large sizes partial products turn out to be no faster.
Another possibility would be an optimized cube. In the basecase that should definitely be able to save cross-products in a similar fashion to squaring, but some investigation might be needed for how best to adapt the higher-order algorithms. Not sure whether cubing or further small powers have any particularly important uses though.
Write new and improve existing assembly routines. The tests/devel programs and the tune/speed.c and tune/many.pl programs are useful for testing and timing the routines you write. See the README files in those directories for more information.
Please make sure your new routines are fast for these three situations:
The most important routines are mpn_addmul_1, mpn_mul_basecase and mpn_sqr_basecase. The latter two don't exist for all machines, while mpn_addmul_1 exists for almost all machines.
Standard techniques for these routines are unrolling, software pipelining, and specialization for common operand values. For machines with poor integer multiplication, it is often possible to improve the performance using floating-point operations, or SIMD operations such as MMX or Sun's VIS.
Using floating-point operations is interesting but somewhat tricky. Since IEEE double has 53 bit of mantissa, one has to split the operands in small prices, so that no result is greater than 2^53. For 32-bit computers, splitting one operand into 16-bit pieces works. For 64-bit machines, one operand can be split into 21-bit pieces and the other into 32-bit pieces. (A 64-bit operand can be split into just three 21-bit pieces if one allows the split operands to be negative!)
We currently have extended GCD based on Lehmer's method. But the binary method can quite easily be improved for bignums in a similar way Lehmer improved Euclid's algorithm. The result should beat Lehmer's method by about a factor of 2. Furthermore, the multiprecision step of Lehmer's method and the binary method will be identical, so the current code can mostly be reused. It should be possible to share code between GCD and GCDEXT, and probably with Jacobi symbols too.
Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems that the most likely algorithm doesn't kick in until about 3000 limbs.
Implement the functions of math.h for the GMP mpf layer! Check the book "Pi and the AGM" by Borwein and Borwein for ideas how to do this. These functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
These are implemented in mpfr, redoing them in mpf might not be worth bothering with, if the long term plan is to bring mpfr in as the new mpf.
The current code uses divisions, which are reasonably fast, but it'd be possible to use only multiplications by computing 1/sqrt(A) using this formula:
2 x = x (3 - A x )/2. i+1 i iAnd the final result
sqrt(A) = A x . nThat final multiply might be the full size of the input (though it might only need the high half of that), so there may or may not be any speedup overall.
Implement, at the mpn level, a routine for computing the nth root of a number. The routine should use Newton's method, preferably without using division.
If the routine becomes fast enough, perhaps square roots could be computed using this function.
Some work can be saved when only the quotient is required in a division, basically the necessary correction -0, -1 or -2 to the estimated quotient can almost always be determined from only a few limbs of multiply and subtract, rather than forming a complete remainder. The greatest savings are when the quotient is small compared to the dividend and divisor.
Some code along these lines can be found in the current
mpn_tdiv_qr
, though perhaps calculating bigger chunks of
remainder than might be strictly necessary. That function in its
current form actually then always goes on to calculate a full remainder.
Burnikel and Zeigler describe a similar approach for the divide and
conquer case.
mpn_bdivmod
and the redc
in
mpz_powm
should use some sort of divide and conquer
algorithm. This would benefit mpz_divexact
, and
mpn_gcd
on large unequal size operands. See "Exact
Division with Karatsuba Complexity" by Jebelean for a (brief)
description.
Failing that, some sort of DIVEXACT_THRESHOLD
could be
added to control whether mpz_divexact
uses
mpn_bdivmod
or mpn_tdiv_qr
, since the latter
is faster on large divisors.
For the REDC, basically all that's needed is Montgomery's algorithm done in multi-limb integers. R is multiple limbs, and the inverse and operands are multi-precision.
For exact division the time to calculate a multi-limb inverse is not
amortized across many modular operations, but instead will probably
create a threshold below which the current style
mpn_bdivmod
is best. There's also Krandick and Jebelean,
"Bidirectional Exact Integer Division" to basically use a low to high
exact division for the low half quotient, and a quotient-only division
for the high half.
It will be noted that low-half and high-half multiplies, and a low-half square, can be used. These ought to each take as little as half the time of a full multiply, or square, though work by Thom Mulders shows the advantage is progressively lost as Karatsuba and higher algorithms are applied.
Some sort of scheme for exceptions handling would be desirable.
Presently the only thing documented is that divide by zero in GMP
functions provokes a deliberate machine divide by zero (on those systems
where such a thing exists at least). The global gmp_errno
is not actually documented, except for the old gmp_randinit
function. Being currently just a plain global means it's not
thread-safe.
The basic choices for exceptions are returning an error code or having
a handler function to be called. The disadvantage of error returns is
they have to be checked, leading to tedious and rarely executed code,
and strictly speaking such a scheme wouldn't be source or binary
compatible. The disadvantage of a handler function is that a
longjmp
or similar recovery from it may be difficult. A
combination would be possible, for instance by allowing the handler to
return an error code.
Divide-by-zero, sqrt-of-negative, and similar operand range errors can normally be detected at the start of functions, so exception handling would have a clean state. What's worth considering though is that the GMP function detecting the exception may have been called via some third party library or self contained application module, and hence have various bits of state to be cleaned up above it. It'd be highly desirable for an exceptions scheme to allow for such cleanups.
The C++ destructor mechanism could help with cleanups both internally and externally, but being a plain C library we don't want to depend on that.
A C++ throw
might be a good optional extra exceptions
mechanism, perhaps under a build option. For GCC
-fexceptions
will add the necessary frame information to
plain C code, or GMP could be compiled as C++.
Out-of-memory exceptions are expected to be handled by the
mp_set_memory_functions
routines, rather than being a
prospective part of divide-by-zero etc. Some similar considerations
apply but what differs is that out-of-memory can arise deep within GMP
internals. Even fundamental routines like mpn_add_n
and
mpn_addmul_1
can use temporary memory (for instance on Cray
vector systems). Allowing for an error code return would require an
awful lot of checking internally. Perhaps it'd still be worthwhile, but
it'd be a lot of changes and the extra code would probably be rather
rarely executed in normal usages.
A longjmp
recovery for out-of-memory will currently, in
general, lead to memory leaks and may leave GMP variables operated on in
inconsistent states. Maybe it'd be possible to record recovery
information for use by the relevant allocate or reallocate function, but
that too would be a lot of changes.
One scheme for out-of-memory would be to note that all GMP allocations
go through the mp_set_memory_functions
routines. So if the
application has an intended setjmp
recovery point it can
record memory activity by GMP and abandon space allocated and variables
initialized after that point. This might be as simple as directing the
allocation functions to a separate pool, but in general would have the
disadvantage of needing application-level bookkeeping on top of the
normal system malloc
. An advantage however is that it
needs nothing from GMP itself and on that basis doesn't burden
applications not needing recovery. Note that there's probably some
details to be worked out here about reallocs of existing variables, and
perhaps about copying or swapping between "permanent" and "temporary"
variables.
Applications desiring a fine-grained error control, for instance a
language interpreter, would very possibly not be well served by a scheme
requiring longjmp
. Wrapping every GMP function call with a
setjmp
would be very inconvenient.
Stack overflow is another possible exception, but perhaps not one that
can be easily detected in general. On i386 GNU/Linux for instance GCC
normally doesn't generate stack probes for an alloca
, but
merely adjusts %esp
. A big enough alloca
can
miss the stack redzone and hit arbitrary data. GMP stack usage is
normally a function of operand size, knowing that might suffice for some
applications. Otherwise a fixed maximum usage can probably be obtained
by building with --enable-alloca=malloc-reentrant
(or
notreentrant
).
Actually recovering from stack overflow is of course another problem.
It might be possible to catch a SIGSEGV
in the stack
redzone and do something in a sigaltstack
, on systems which
have that, but recovery might otherwise not be possible. This is worth
bearing in mind because there's no point worrying about tight and
careful out-of-memory recovery if an out-of-stack is fatal.
Add a test suite for old bugs. These tests wouldn't loop or use random numbers, but just test for specific bugs found in the past.
More importantly, improve the random number controls for test collections:
With this in place, it should be possible to rid GMP of practically all bugs by having some dedicated GMP test machines running "make check" with ever increasing seeds (and periodically updating to the latest GMP).
If a few more ASSERTs were sprinkled through the code, running some tests with --enable-assert might be useful, though not a substitute for tests on the normal build.
An important feature of any random tests will be to record the seeds used, and perhaps to snapshot operands before performing each test, so any problem exposed can be reproduced.
It'd be nice to have some sort of tool for getting an overview of performance. Clearly a great many things could be done, but some primary uses would be,
A combination of measuring some fundamental routines and some representative application routines might satisfy these.
The tune/time.c routines would be the easiest way to get good
accurate measurements on lots of different systems. The high level
speed_measure
may or may not suit, but the basic
speed_starttime
and speed_endtime
would cover
lots of portability and accuracy questions.