This is gmp.info, produced by makeinfo version 4.2 from gmp.texi. This manual describes how to install and use the GNU multiple precision arithmetic library, version 4.1.2. Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being "A GNU Manual", and with the Back-Cover Texts being "You have freedom to copy and modify this GNU Manual, like GNU software". A copy of the license is included in *Note GNU Free Documentation License::. INFO-DIR-SECTION GNU libraries START-INFO-DIR-ENTRY * gmp: (gmp). GNU Multiple Precision Arithmetic Library. END-INFO-DIR-ENTRY  File: gmp.info, Node: C++ Interface General, Next: C++ Interface Integers, Prev: C++ Class Interface, Up: C++ Class Interface C++ Interface General ===================== All the C++ classes and functions are available with #include Programs should be linked with the `libgmpxx' and `libgmp' libraries. For example, g++ mycxxprog.cc -lgmpxx -lgmp The classes defined are - Class: mpz_class - Class: mpq_class - Class: mpf_class The standard operators and various standard functions are overloaded to allow arithmetic with these classes. For example, int main (void) { mpz_class a, b, c; a = 1234; b = "-5678"; c = a+b; cout << "sum is " << c << "\n"; cout << "absolute value is " << abs(c) << "\n"; return 0; } An important feature of the implementation is that an expression like `a=b+c' results in a single call to the corresponding `mpz_add', without using a temporary for the `b+c' part. Expressions which by their nature imply intermediate values, like `a=b*c+d*e', still use temporaries though. The classes can be freely intermixed in expressions, as can the classes and the standard types `long', `unsigned long' and `double'. Smaller types like `int' or `float' can also be intermixed, since C++ will promote them. Note that `bool' is not accepted directly, but must be explicitly cast to an `int' first. This is because C++ will automatically convert any pointer to a `bool', so if GMP accepted `bool' it would make all sorts of invalid class and pointer combinations compile but almost certainly not do anything sensible. Conversions back from the classes to standard C++ types aren't done automatically, instead member functions like `get_si' are provided (see the following sections for details). Also there are no automatic conversions from the classes to the corresponding GMP C types, instead a reference to the underlying C object can be obtained with the following functions, - Function: mpz_t mpz_class::get_mpz_t () - Function: mpq_t mpq_class::get_mpq_t () - Function: mpf_t mpf_class::get_mpf_t () These can be used to call a C function which doesn't have a C++ class interface. For example to set `a' to the GCD of `b' and `c', mpz_class a, b, c; ... mpz_gcd (a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t()); In the other direction, a class can be initialized from the corresponding GMP C type, or assigned to if an explicit constructor is used. In both cases this makes a copy of the value, it doesn't create any sort of association. For example, mpz_t z; // ... init and calculate z ... mpz_class x(z); mpz_class y; y = mpz_class (z); There are no namespace setups in `gmpxx.h', all types and functions are simply put into the global namespace. This is what `gmp.h' has done in the past, and continues to do for compatibility. The extras provided by `gmpxx.h' follow GMP naming conventions and are unlikely to clash with anything.  File: gmp.info, Node: C++ Interface Integers, Next: C++ Interface Rationals, Prev: C++ Interface General, Up: C++ Class Interface C++ Interface Integers ====================== - Function: void mpz_class::mpz_class (type N) Construct an `mpz_class'. All the standard C++ types may be used, except `long long' and `long double', and all the GMP C++ classes can be used. Any necessary conversion follows the corresponding C function, for example `double' follows `mpz_set_d' (*note Assigning Integers::). - Function: void mpz_class::mpz_class (mpz_t Z) Construct an `mpz_class' from an `mpz_t'. The value in Z is copied into the new `mpz_class', there won't be any permanent association between it and Z. - Function: void mpz_class::mpz_class (const char *S) - Function: void mpz_class::mpz_class (const char *S, int base) - Function: void mpz_class::mpz_class (const string& S) - Function: void mpz_class::mpz_class (const string& S, int base) Construct an `mpz_class' converted from a string using `mpz_set_str', (*note Assigning Integers::). If the BASE is not given then 0 is used. - Function: mpz_class operator/ (mpz_class A, mpz_class D) - Function: mpz_class operator% (mpz_class A, mpz_class D) Divisions involving `mpz_class' round towards zero, as per the `mpz_tdiv_q' and `mpz_tdiv_r' functions (*note Integer Division::). This corresponds to the rounding used for plain `int' calculations on most machines. The `mpz_fdiv...' or `mpz_cdiv...' functions can always be called directly if desired. For example, mpz_class q, a, d; ... mpz_fdiv_q (q.get_mpz_t(), a.get_mpz_t(), d.get_mpz_t()); - Function: mpz_class abs (mpz_class OP1) - Function: int cmp (mpz_class OP1, type OP2) - Function: int cmp (type OP1, mpz_class OP2) - Function: double mpz_class::get_d (void) - Function: long mpz_class::get_si (void) - Function: unsigned long mpz_class::get_ui (void) - Function: bool mpz_class::fits_sint_p (void) - Function: bool mpz_class::fits_slong_p (void) - Function: bool mpz_class::fits_sshort_p (void) - Function: bool mpz_class::fits_uint_p (void) - Function: bool mpz_class::fits_ulong_p (void) - Function: bool mpz_class::fits_ushort_p (void) - Function: int sgn (mpz_class OP) - Function: mpz_class sqrt (mpz_class OP) These functions provide a C++ class interface to the corresponding GMP C routines. `cmp' can be used with any of the classes or the standard C++ types, except `long long' and `long double'. Overloaded operators for combinations of `mpz_class' and `double' are provided for completeness, but it should be noted that if the given `double' is not an integer then the way any rounding is done is currently unspecified. The rounding might take place at the start, in the middle, or at the end of the operation, and it might change in the future. Conversions between `mpz_class' and `double', however, are defined to follow the corresponding C functions `mpz_get_d' and `mpz_set_d'. And comparisons are always made exactly, as per `mpz_cmp_d'.  File: gmp.info, Node: C++ Interface Rationals, Next: C++ Interface Floats, Prev: C++ Interface Integers, Up: C++ Class Interface C++ Interface Rationals ======================= In all the following constructors, if a fraction is given then it should be in canonical form, or if not then `mpq_class::canonicalize' called. - Function: void mpq_class::mpq_class (type OP) - Function: void mpq_class::mpq_class (integer NUM, integer DEN) Construct an `mpq_class'. The initial value can be a single value of any type, or a pair of integers (`mpz_class' or standard C++ integer types) representing a fraction, except that `long long' and `long double' are not supported. For example, mpq_class q (99); mpq_class q (1.75); mpq_class q (1, 3); - Function: void mpq_class::mpq_class (mpq_t Q) Construct an `mpq_class' from an `mpq_t'. The value in Q is copied into the new `mpq_class', there won't be any permanent association between it and Q. - Function: void mpq_class::mpq_class (const char *S) - Function: void mpq_class::mpq_class (const char *S, int base) - Function: void mpq_class::mpq_class (const string& S) - Function: void mpq_class::mpq_class (const string& S, int base) Construct an `mpq_class' converted from a string using `mpq_set_str', (*note Initializing Rationals::). If the BASE is not given then 0 is used. - Function: void mpq_class::canonicalize () Put an `mpq_class' into canonical form, as per *Note Rational Number Functions::. All arithmetic operators require their operands in canonical form, and will return results in canonical form. - Function: mpq_class abs (mpq_class OP) - Function: int cmp (mpq_class OP1, type OP2) - Function: int cmp (type OP1, mpq_class OP2) - Function: double mpq_class::get_d (void) - Function: int sgn (mpq_class OP) These functions provide a C++ class interface to the corresponding GMP C routines. `cmp' can be used with any of the classes or the standard C++ types, except `long long' and `long double'. - Function: mpz_class& mpq_class::get_num () - Function: mpz_class& mpq_class::get_den () Get a reference to an `mpz_class' which is the numerator or denominator of an `mpq_class'. This can be used both for read and write access. If the object returned is modified, it modifies the original `mpq_class'. If direct manipulation might produce a non-canonical value, then `mpq_class::canonicalize' must be called before further operations. - Function: mpz_t mpq_class::get_num_mpz_t () - Function: mpz_t mpq_class::get_den_mpz_t () Get a reference to the underlying `mpz_t' numerator or denominator of an `mpq_class'. This can be passed to C functions expecting an `mpz_t'. Any modifications made to the `mpz_t' will modify the original `mpq_class'. If direct manipulation might produce a non-canonical value, then `mpq_class::canonicalize' must be called before further operations. - Function: istream& operator>> (istream& STREAM, mpq_class& ROP); Read ROP from STREAM, using its `ios' formatting settings, the same as `mpq_t operator>>' (*note C++ Formatted Input::). If the ROP read might not be in canonical form then `mpq_class::canonicalize' must be called.  File: gmp.info, Node: C++ Interface Floats, Next: C++ Interface MPFR, Prev: C++ Interface Rationals, Up: C++ Class Interface C++ Interface Floats ==================== When an expression requires the use of temporary intermediate `mpf_class' values, like `f=g*h+x*y', those temporaries will have the same precision as the destination `f'. Explicit constructors can be used if this doesn't suit. - Function: mpf_class::mpf_class (type OP) - Function: mpf_class::mpf_class (type OP, unsigned long PREC) Construct an `mpf_class'. Any standard C++ type can be used, except `long long' and `long double', and any of the GMP C++ classes can be used. If PREC is given, the initial precision is that value, in bits. If PREC is not given, then the initial precision is determined by the type of OP given. An `mpz_class', `mpq_class', string, or C++ builtin type will give the default `mpf' precision (*note Initializing Floats::). An `mpf_class' or expression will give the precision of that value. The precision of a binary expression is the higher of the two operands. mpf_class f(1.5); // default precision mpf_class f(1.5, 500); // 500 bits (at least) mpf_class f(x); // precision of x mpf_class f(abs(x)); // precision of x mpf_class f(-g, 1000); // 1000 bits (at least) mpf_class f(x+y); // greater of precisions of x and y - Function: mpf_class abs (mpf_class OP) - Function: mpf_class ceil (mpf_class OP) - Function: int cmp (mpf_class OP1, type OP2) - Function: int cmp (type OP1, mpf_class OP2) - Function: mpf_class floor (mpf_class OP) - Function: mpf_class hypot (mpf_class OP1, mpf_class OP2) - Function: double mpf_class::get_d (void) - Function: long mpf_class::get_si (void) - Function: unsigned long mpf_class::get_ui (void) - Function: bool mpf_class::fits_sint_p (void) - Function: bool mpf_class::fits_slong_p (void) - Function: bool mpf_class::fits_sshort_p (void) - Function: bool mpf_class::fits_uint_p (void) - Function: bool mpf_class::fits_ulong_p (void) - Function: bool mpf_class::fits_ushort_p (void) - Function: int sgn (mpf_class OP) - Function: mpf_class sqrt (mpf_class OP) - Function: mpf_class trunc (mpf_class OP) These functions provide a C++ class interface to the corresponding GMP C routines. `cmp' can be used with any of the classes or the standard C++ types, except `long long' and `long double'. The accuracy provided by `hypot' is not currently guaranteed. - Function: unsigned long int mpf_class::get_prec () - Function: void mpf_class::set_prec (unsigned long PREC) - Function: void mpf_class::set_prec_raw (unsigned long PREC) Get or set the current precision of an `mpf_class'. The restrictions described for `mpf_set_prec_raw' (*note Initializing Floats::) apply to `mpf_class::set_prec_raw'. Note in particular that the `mpf_class' must be restored to it's allocated precision before being destroyed. This must be done by application code, there's no automatic mechanism for it.  File: gmp.info, Node: C++ Interface MPFR, Next: C++ Interface Random Numbers, Prev: C++ Interface Floats, Up: C++ Class Interface C++ Interface MPFR ================== The C++ class interface to MPFR is provided if MPFR is enabled (*note Build Options::). This interface must be regarded as preliminary and possibly subject to incompatible changes in the future, since MPFR itself is preliminary. All definitions can be obtained with #include This defines - Class: mpfr_class which behaves similarly to `mpf_class' (*note C++ Interface Floats::).  File: gmp.info, Node: C++ Interface Random Numbers, Next: C++ Interface Limitations, Prev: C++ Interface MPFR, Up: C++ Class Interface C++ Interface Random Numbers ============================ - Class: gmp_randclass The C++ class interface to the GMP random number functions uses `gmp_randclass' to hold an algorithm selection and current state, as per `gmp_randstate_t'. - Function: gmp_randclass::gmp_randclass (void (*RANDINIT) (gmp_randstate_t, ...), ...) Construct a `gmp_randclass', using a call to the given RANDINIT function (*note Random State Initialization::). The arguments expected are the same as RANDINIT, but with `mpz_class' instead of `mpz_t'. For example, gmp_randclass r1 (gmp_randinit_default); gmp_randclass r2 (gmp_randinit_lc_2exp_size, 32); gmp_randclass r3 (gmp_randinit_lc_2exp, a, c, m2exp); `gmp_randinit_lc_2exp_size' can fail if the size requested is too big, the behaviour of `gmp_randclass::gmp_randclass' is undefined in this case (perhaps this will change in the future). - Function: gmp_randclass::gmp_randclass (gmp_randalg_t ALG, ...) Construct a `gmp_randclass' using the same parameters as `gmp_randinit' (*note Random State Initialization::). This function is obsolete and the above RANDINIT style should be preferred. - Function: void gmp_randclass::seed (unsigned long int S) - Function: void gmp_randclass::seed (mpz_class S) Seed a random number generator. See *note Random Number Functions::, for how to choose a good seed. - Function: mpz_class gmp_randclass::get_z_bits (unsigned long BITS) - Function: mpz_class gmp_randclass::get_z_bits (mpz_class BITS) Generate a random integer with a specified number of bits. - Function: mpz_class gmp_randclass::get_z_range (mpz_class N) Generate a random integer in the range 0 to N-1 inclusive. - Function: mpf_class gmp_randclass::get_f () - Function: mpf_class gmp_randclass::get_f (unsigned long PREC) Generate a random float F in the range 0 <= F < 1. F will be to PREC bits precision, or if PREC is not given then to the precision of the destination. For example, gmp_randclass r; ... mpf_class f (0, 512); // 512 bits precision f = r.get_f(); // random number, 512 bits  File: gmp.info, Node: C++ Interface Limitations, Prev: C++ Interface Random Numbers, Up: C++ Class Interface C++ Interface Limitations ========================= `mpq_class' and Templated Reading A generic piece of template code probably won't know that `mpq_class' requires a `canonicalize' call if inputs read with `operator>>' might be non-canonical. This can lead to incorrect results. `operator>>' behaves as it does for reasons of efficiency. A canonicalize can be quite time consuming on large operands, and is best avoided if it's not necessary. But this potential difficulty reduces the usefulness of `mpq_class'. Perhaps a mechanism to tell `operator>>' what to do will be adopted in the future, maybe a preprocessor define, a global flag, or an `ios' flag pressed into service. Or maybe, at the risk of inconsistency, the `mpq_class' `operator>>' could canonicalize and leave `mpq_t' `operator>>' not doing so, for use on those occasions when that's acceptable. Send feedback or alternate ideas to . Subclassing Subclassing the GMP C++ classes works, but is not currently recommended. Expressions involving subclasses resolve correctly (or seem to), but in normal C++ fashion the subclass doesn't inherit constructors and assignments. There's many of those in the GMP classes, and a good way to reestablish them in a subclass is not yet provided. Templated Expressions A subtle difficulty exists when using expressions together with application-defined template functions. Consider the following, with `T' intended to be some numeric type, template T fun (const T &, const T &); When used with, say, plain `mpz_class' variables, it works fine: `T' is resolved as `mpz_class'. mpz_class f(1), g(2); fun (f, g); // Good But when one of the arguments is an expression, it doesn't work. mpz_class f(1), g(2), h(3); fun (f, g+h); // Bad This is because `g+h' ends up being a certain expression template type internal to `gmpxx.h', which the C++ template resolution rules are unable to automatically convert to `mpz_class'. The workaround is simply to add an explicit cast. mpz_class f(1), g(2), h(3); fun (f, mpz_class(g+h)); // Good Similarly, within `fun' it may be necessary to cast an expression to type `T' when calling a templated `fun2'. template void fun (T f, T g) { fun2 (f, f+g); // Bad } template void fun (T f, T g) { fun2 (f, T(f+g)); // Good }  File: gmp.info, Node: BSD Compatible Functions, Next: Custom Allocation, Prev: C++ Class Interface, Up: Top Berkeley MP Compatible Functions ******************************** These functions are intended to be fully compatible with the Berkeley MP library which is available on many BSD derived U*ix systems. The `--enable-mpbsd' option must be used when building GNU MP to make these available (*note Installing GMP::). The original Berkeley MP library has a usage restriction: you cannot use the same variable as both source and destination in a single function call. The compatible functions in GNU MP do not share this restriction--inputs and outputs may overlap. It is not recommended that new programs are written using these functions. Apart from the incomplete set of functions, the interface for initializing `MINT' objects is more error prone, and the `pow' function collides with `pow' in `libm.a'. Include the header `mp.h' to get the definition of the necessary types and functions. If you are on a BSD derived system, make sure to include GNU `mp.h' if you are going to link the GNU `libmp.a' to your program. This means that you probably need to give the `-I' option to the compiler, where `' is the directory where you have GNU `mp.h'. - Function: MINT * itom (signed short int INITIAL_VALUE) Allocate an integer consisting of a `MINT' object and dynamic limb space. Initialize the integer to INITIAL_VALUE. Return a pointer to the `MINT' object. - Function: MINT * xtom (char *INITIAL_VALUE) Allocate an integer consisting of a `MINT' object and dynamic limb space. Initialize the integer from INITIAL_VALUE, a hexadecimal, null-terminated C string. Return a pointer to the `MINT' object. - Function: void move (MINT *SRC, MINT *DEST) Set DEST to SRC by copying. Both variables must be previously initialized. - Function: void madd (MINT *SRC_1, MINT *SRC_2, MINT *DESTINATION) Add SRC_1 and SRC_2 and put the sum in DESTINATION. - Function: void msub (MINT *SRC_1, MINT *SRC_2, MINT *DESTINATION) Subtract SRC_2 from SRC_1 and put the difference in DESTINATION. - Function: void mult (MINT *SRC_1, MINT *SRC_2, MINT *DESTINATION) Multiply SRC_1 and SRC_2 and put the product in DESTINATION. - Function: void mdiv (MINT *DIVIDEND, MINT *DIVISOR, MINT *QUOTIENT, MINT *REMAINDER) - Function: void sdiv (MINT *DIVIDEND, signed short int DIVISOR, MINT *QUOTIENT, signed short int *REMAINDER) Set QUOTIENT to DIVIDEND/DIVISOR, and REMAINDER to DIVIDEND mod DIVISOR. The quotient is rounded towards zero; the remainder has the same sign as the dividend unless it is zero. Some implementations of these functions work differently--or not at all--for negative arguments. - Function: void msqrt (MINT *OP, MINT *ROOT, MINT *REMAINDER) Set ROOT to the truncated integer part of the square root of OP, like `mpz_sqrt'. Set REMAINDER to OP-ROOT*ROOT, i.e. zero if OP is a perfect square. If ROOT and REMAINDER are the same variable, the results are undefined. - Function: void pow (MINT *BASE, MINT *EXP, MINT *MOD, MINT *DEST) Set DEST to (BASE raised to EXP) modulo MOD. - Function: void rpow (MINT *BASE, signed short int EXP, MINT *DEST) Set DEST to BASE raised to EXP. - Function: void gcd (MINT *OP1, MINT *OP2, MINT *RES) Set RES to the greatest common divisor of OP1 and OP2. - Function: int mcmp (MINT *OP1, MINT *OP2) Compare OP1 and OP2. Return a positive value if OP1 > OP2, zero if OP1 = OP2, and a negative value if OP1 < OP2. - Function: void min (MINT *DEST) Input a decimal string from `stdin', and put the read integer in DEST. SPC and TAB are allowed in the number string, and are ignored. - Function: void mout (MINT *SRC) Output SRC to `stdout', as a decimal string. Also output a newline. - Function: char * mtox (MINT *OP) Convert OP to a hexadecimal string, and return a pointer to the string. The returned string is allocated using the default memory allocation function, `malloc' by default. It will be `strlen(str)+1' bytes, that being exactly enough for the string and null-terminator. - Function: void mfree (MINT *OP) De-allocate, the space used by OP. *This function should only be passed a value returned by `itom' or `xtom'.*  File: gmp.info, Node: Custom Allocation, Next: Language Bindings, Prev: BSD Compatible Functions, Up: Top Custom Allocation ***************** By default GMP uses `malloc', `realloc' and `free' for memory allocation, and if they fail GMP prints a message to the standard error output and terminates the program. Alternate functions can be specified to allocate memory in a different way or to have a different error action on running out of memory. This feature is available in the Berkeley compatibility library (*note BSD Compatible Functions::) as well as the main GMP library. - Function: void mp_set_memory_functions ( void *(*ALLOC_FUNC_PTR) (size_t), void *(*REALLOC_FUNC_PTR) (void *, size_t, size_t), void (*FREE_FUNC_PTR) (void *, size_t)) Replace the current allocation functions from the arguments. If an argument is `NULL', the corresponding default function is used. These functions will be used for all memory allocation done by GMP, apart from temporary space from `alloca' if that function is available and GMP is configured to use it (*note Build Options::). *Be sure to call `mp_set_memory_functions' only when there are no active GMP objects allocated using the previous memory functions! Usually that means calling it before any other GMP function.* The functions supplied should fit the following declarations: - Function: void * allocate_function (size_t ALLOC_SIZE) Return a pointer to newly allocated space with at least ALLOC_SIZE bytes. - Function: void * reallocate_function (void *PTR, size_t OLD_SIZE, size_t NEW_SIZE) Resize a previously allocated block PTR of OLD_SIZE bytes to be NEW_SIZE bytes. The block may be moved if necessary or if desired, and in that case the smaller of OLD_SIZE and NEW_SIZE bytes must be copied to the new location. The return value is a pointer to the resized block, that being the new location if moved or just PTR if not. PTR is never `NULL', it's always a previously allocated block. NEW_SIZE may be bigger or smaller than OLD_SIZE. - Function: void deallocate_function (void *PTR, size_t SIZE) De-allocate the space pointed to by PTR. PTR is never `NULL', it's always a previously allocated block of SIZE bytes. A "byte" here means the unit used by the `sizeof' operator. The OLD_SIZE parameters to REALLOCATE_FUNCTION and DEALLOCATE_FUNCTION are passed for convenience, but of course can be ignored if not needed. The default functions using `malloc' and friends for instance don't use them. No error return is allowed from any of these functions, if they return then they must have performed the specified operation. In particular note that ALLOCATE_FUNCTION or REALLOCATE_FUNCTION mustn't return `NULL'. Getting a different fatal error action is a good use for custom allocation functions, for example giving a graphical dialog rather than the default print to `stderr'. How much is possible when genuinely out of memory is another question though. There's currently no defined way for the allocation functions to recover from an error such as out of memory, they must terminate program execution. A `longjmp' or throwing a C++ exception will have undefined results. This may change in the future. GMP may use allocated blocks to hold pointers to other allocated blocks. This will limit the assumptions a conservative garbage collection scheme can make. Since the default GMP allocation uses `malloc' and friends, those functions will be linked in even if the first thing a program does is an `mp_set_memory_functions'. It's necessary to change the GMP sources if this is a problem.  File: gmp.info, Node: Language Bindings, Next: Algorithms, Prev: Custom Allocation, Up: Top Language Bindings ***************** The following packages and projects offer access to GMP from languages other than C, though perhaps with varying levels of functionality and efficiency. C++ * GMP C++ class interface, *note C++ Class Interface:: Straightforward interface, expression templates to eliminate temporaries. * ALP `http://www.inria.fr/saga/logiciels/ALP' Linear algebra and polynomials using templates. * Arithmos `http://win-www.uia.ac.be/u/cant/arithmos' Rationals with infinities and square roots. * CLN `http://clisp.cons.org/~haible/packages-cln.html' High level classes for arithmetic. * LiDIA `http://www.informatik.tu-darmstadt.de/TI/LiDIA' A C++ library for computational number theory. * Linbox `http://www.linalg.org' Sparse vectors and matrices. * NTL `http://www.shoup.net/ntl' A C++ number theory library. Fortran * Omni F77 `http://pdplab.trc.rwcp.or.jp/pdperf/Omni/home.html' Arbitrary precision floats. Haskell * Glasgow Haskell Compiler `http://www.haskell.org/ghc' Java * Kaffe `http://www.kaffe.org' * Kissme `http://kissme.sourceforge.net' Lisp * GNU Common Lisp `http://www.gnu.org/software/gcl/gcl.html' In the process of switching to GMP for bignums. * Librep `http://librep.sourceforge.net' M4 * GNU m4 betas `http://www.seindal.dk/rene/gnu' Optionally provides an arbitrary precision `mpeval'. ML * MLton compiler `http://www.mlton.org' Oz * Mozart `http://www.mozart-oz.org' Pascal * GNU Pascal Compiler `http://www.gnu-pascal.de' GMP unit. Perl * GMP module, see `demos/perl' in the GMP sources. * Math::GMP `http://www.cpan.org' Compatible with Math::BigInt, but not as many functions as the GMP module above. * Math::BigInt::GMP `http://www.cpan.org' Plug Math::GMP into normal Math::BigInt operations. Pike * mpz module in the standard distribution, `http://pike.idonex.com' Prolog * SWI Prolog `http://www.swi.psy.uva.nl/projects/SWI-Prolog' Arbitrary precision floats. Python * mpz module in the standard distribution, `http://www.python.org' * GMPY `http://gmpy.sourceforge.net' Scheme * RScheme `http://www.rscheme.org' * STklos `http://kaolin.unice.fr/STklos' Smalltalk * GNU Smalltalk `http://www.smalltalk.org/versions/GNUSmalltalk.html' Other * DrGenius `http://drgenius.seul.org' Geometry system and mathematical programming language. * GiNaC `http://www.ginac.de' C++ computer algebra using CLN. * Maxima `http://www.ma.utexas.edu/users/wfs/maxima.html' Macsyma computer algebra using GCL. * Q `http://www.musikwissenschaft.uni-mainz.de/~ag/q' Equational programming system. * Regina `http://regina.sourceforge.net' Topological calculator. * Yacas `http://www.xs4all.nl/~apinkus/yacas.html' Yet another computer algebra system.  File: gmp.info, Node: Algorithms, Next: Internals, Prev: Language Bindings, Up: Top Algorithms ********** This chapter is an introduction to some of the algorithms used for various GMP operations. The code is likely to be hard to understand without knowing something about the algorithms. Some GMP internals are mentioned, but applications that expect to be compatible with future GMP releases should take care to use only the documented functions. * Menu: * Multiplication Algorithms:: * Division Algorithms:: * Greatest Common Divisor Algorithms:: * Powering Algorithms:: * Root Extraction Algorithms:: * Radix Conversion Algorithms:: * Other Algorithms:: * Assembler Coding::  File: gmp.info, Node: Multiplication Algorithms, Next: Division Algorithms, Prev: Algorithms, Up: Algorithms Multiplication ============== NxN limb multiplications and squares are done using one of four algorithms, as the size N increases. Algorithm Threshold Basecase (none) Karatsuba `MUL_KARATSUBA_THRESHOLD' Toom-3 `MUL_TOOM3_THRESHOLD' FFT `MUL_FFT_THRESHOLD' Similarly for squaring, with the `SQR' thresholds. Note though that the FFT is only used if GMP is configured with `--enable-fft', *note Build Options::. NxM multiplications of operands with different sizes above `MUL_KARATSUBA_THRESHOLD' are currently done by splitting into MxM pieces. The Karatsuba and Toom-3 routines then operate only on equal size operands. This is not very efficient, and is slated for improvement in the future. * Menu: * Basecase Multiplication:: * Karatsuba Multiplication:: * Toom-Cook 3-Way Multiplication:: * FFT Multiplication:: * Other Multiplication::  File: gmp.info, Node: Basecase Multiplication, Next: Karatsuba Multiplication, Prev: Multiplication Algorithms, Up: Multiplication Algorithms Basecase Multiplication ----------------------- Basecase NxM multiplication is a straightforward rectangular set of cross-products, the same as long multiplication done by hand and for that reason sometimes known as the schoolbook or grammar school method. This is an O(N*M) algorithm. See Knuth section 4.3.1 algorithm M (*note References::), and the `mpn/generic/mul_basecase.c' code. Assembler implementations of `mpn_mul_basecase' are essentially the same as the generic C code, but have all the usual assembler tricks and obscurities introduced for speed. A square can be done in roughly half the time of a multiply, by using the fact that the cross products above and below the diagonal are the same. A triangle of products below the diagonal is formed, doubled (left shift by one bit), and then the products on the diagonal added. This can be seen in `mpn/generic/sqr_basecase.c'. Again the assembler implementations take essentially the same approach. u0 u1 u2 u3 u4 +---+---+---+---+---+ u0 | d | | | | | +---+---+---+---+---+ u1 | | d | | | | +---+---+---+---+---+ u2 | | | d | | | +---+---+---+---+---+ u3 | | | | d | | +---+---+---+---+---+ u4 | | | | | d | +---+---+---+---+---+ In practice squaring isn't a full 2x faster than multiplying, it's usually around 1.5x. Less than 1.5x probably indicates `mpn_sqr_basecase' wants improving on that CPU. On some CPUs `mpn_mul_basecase' can be faster than the generic C `mpn_sqr_basecase'. `SQR_BASECASE_THRESHOLD' is the size at which to use `mpn_sqr_basecase', this will be zero if that routine should be used always.  File: gmp.info, Node: Karatsuba Multiplication, Next: Toom-Cook 3-Way Multiplication, Prev: Basecase Multiplication, Up: Multiplication Algorithms Karatsuba Multiplication ------------------------ The Karatsuba multiplication algorithm is described in Knuth section 4.3.3 part A, and various other textbooks. A brief description is given here. The inputs x and y are treated as each split into two parts of equal length (or the most significant part one limb shorter if N is odd). high low +----------+----------+ | x1 | x0 | +----------+----------+ +----------+----------+ | y1 | y0 | +----------+----------+ Let b be the power of 2 where the split occurs, ie. if x0 is k limbs (y0 the same) then b=2^(k*mp_bits_per_limb). With that x=x1*b+x0 and y=y1*b+y0, and the following holds, x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0 This formula means doing only three multiplies of (N/2)x(N/2) limbs, whereas a basecase multiply of NxN limbs is equivalent to four multiplies of (N/2)x(N/2). The factors (b^2+b) etc represent the positions where the three products must be added. high low +--------+--------+ +--------+--------+ | x1*y1 | | x0*y0 | +--------+--------+ +--------+--------+ +--------+--------+ add | x1*y1 | +--------+--------+ +--------+--------+ add | x0*y0 | +--------+--------+ +--------+--------+ sub | (x1-x0)*(y1-y0) | +--------+--------+ The term (x1-x0)*(y1-y0) is best calculated as an absolute value, and the sign used to choose to add or subtract. Notice the sum high(x0*y0)+low(x1*y1) occurs twice, so it's possible to do 5*k limb additions, rather than 6*k, but in GMP extra function call overheads outweigh the saving. Squaring is similar to multiplying, but with x=y the formula reduces to an equivalent with three squares, x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2 The final result is accumulated from those three squares the same way as for the three multiplies above. The middle term (x1-x0)^2 is now always positive. A similar formula for both multiplying and squaring can be constructed with a middle term (x1+x0)*(y1+y0). But those sums can exceed k limbs, leading to more carry handling and additions than the form above. Karatsuba multiplication is asymptotically an O(N^1.585) algorithm, the exponent being log(3)/log(2), representing 3 multiplies each 1/2 the size of the inputs. This is a big improvement over the basecase multiply at O(N^2) and the advantage soon overcomes the extra additions Karatsuba performs. `MUL_KARATSUBA_THRESHOLD' can be as little as 10 limbs. The `SQR' threshold is usually about twice the `MUL'. The basecase algorithm will take a time of the form M(N) = a*N^2 + b*N + c and the Karatsuba algorithm K(N) = 3*M(N/2) + d*N + e. Clearly per-crossproduct speedups in the basecase code reduce a and decrease the threshold, but linear style speedups reducing b will actually increase the threshold. The latter can be seen for instance when adding an optimized `mpn_sqr_diagonal' to `mpn_sqr_basecase'. Of course all speedups reduce total time, and in that sense the algorithm thresholds are merely of academic interest.  File: gmp.info, Node: Toom-Cook 3-Way Multiplication, Next: FFT Multiplication, Prev: Karatsuba Multiplication, Up: Multiplication Algorithms Toom-Cook 3-Way Multiplication ------------------------------ The Karatsuba formula is the simplest case of a general approach to splitting inputs that leads to both Toom-Cook and FFT algorithms. A description of Toom-Cook can be found in Knuth section 4.3.3, with an example 3-way calculation after Theorem A. The 3-way form used in GMP is described here. The operands are each considered split into 3 pieces of equal length (or the most significant part 1 or 2 limbs shorter than the others). high low +----------+----------+----------+ | x2 | x1 | x0 | +----------+----------+----------+ +----------+----------+----------+ | y2 | y1 | y0 | +----------+----------+----------+ These parts are treated as the coefficients of two polynomials X(t) = x2*t^2 + x1*t + x0 Y(t) = y2*t^2 + y1*t + y0 Again let b equal the power of 2 which is the size of the x0, x1, y0 and y1 pieces, ie. if they're k limbs each then b=2^(k*mp_bits_per_limb). With this x=X(b) and y=Y(b). Let a polynomial W(t)=X(t)*Y(t) and suppose its coefficients are W(t) = w4*t^4 + w3*t^3 + w2*t^2 + w1*t + w0 The w[i] are going to be determined, and when they are they'll give the final result using w=W(b), since x*y=X(b)*Y(b)=W(b). The coefficients will be roughly b^2 each, and the final W(b) will be an addition like, high low +-------+-------+ | w4 | +-------+-------+ +--------+-------+ | w3 | +--------+-------+ +--------+-------+ | w2 | +--------+-------+ +--------+-------+ | w1 | +--------+-------+ +-------+-------+ | w0 | +-------+-------+ The w[i] coefficients could be formed by a simple set of cross products, like w4=x2*y2, w3=x2*y1+x1*y2, w2=x2*y0+x1*y1+x0*y2 etc, but this would need all nine x[i]*y[j] for i,j=0,1,2, and would be equivalent merely to a basecase multiply. Instead the following approach is used. X(t) and Y(t) are evaluated and multiplied at 5 points, giving values of W(t) at those points. The points used can be chosen in various ways, but in GMP the following are used Point Value t=0 x0*y0, which gives w0 immediately t=2 (4*x2+2*x1+x0)*(4*y2+2*y1+y0) t=1 (x2+x1+x0)*(y2+y1+y0) t=1/2 (x2+2*x1+4*x0)*(y2+2*y1+4*y0) t=inf x2*y2, which gives w4 immediately At t=1/2 the value calculated is actually 16*X(1/2)*Y(1/2), giving a value for 16*W(1/2), and this is always an integer. At t=inf the value is actually X(t)*Y(t)/t^4 in the limit as t approaches infinity, but it's much easier to think of as simply x2*y2 giving w4 immediately (much like x0*y0 at t=0 gives w0 immediately). Now each of the points substituted into W(t)=w4*t^4+...+w0 gives a linear combination of the w[i] coefficients, and the value of those combinations has just been calculated. W(0) = w0 16*W(1/2) = w4 + 2*w3 + 4*w2 + 8*w1 + 16*w0 W(1) = w4 + w3 + w2 + w1 + w0 W(2) = 16*w4 + 8*w3 + 4*w2 + 2*w1 + w0 W(inf) = w4 This is a set of five equations in five unknowns, and some elementary linear algebra quickly isolates each w[i], by subtracting multiples of one equation from another. In the code the set of five values W(0),...,W(inf) will represent those certain linear combinations. By adding or subtracting one from another as necessary, values which are each w[i] alone are arrived at. This involves only a few subtractions of small multiples (some of which are powers of 2), and so is fast. A couple of divisions remain by powers of 2 and one division by 3 (or by 6 rather), and that last uses the special `mpn_divexact_by3' (*note Exact Division::). In the code the values w4, w2 and w0 are formed in the destination with pointers `E', `C' and `A', and w3 and w1 in temporary space `D' and `B' are added to them. There are extra limbs `tD', `tC' and `tB' at the high end of w3, w2 and w1 which are handled separately. The final addition then is as follows. high low +-------+-------+-------+-------+-------+-------+ | E | C | A | +-------+-------+-------+-------+-------+-------+ +------+-------++------+-------+ | D || B | +------+-------++------+-------+ -- -- -- |tD| |tC| |tB| -- -- -- The conversion of W(t) values to the coefficients is interpolation. A polynomial of degree 4 like W(t) is uniquely determined by values known at 5 different points. The points can be chosen to make the linear equations come out with a convenient set of steps for isolating the w[i]. In `mpn/generic/mul_n.c' the `interpolate3' routine performs the interpolation. The open-coded one-pass version may be a bit hard to understand, the steps performed can be better seen in the `USE_MORE_MPN' version. Squaring follows the same procedure as multiplication, but there's only one X(t) and it's evaluated at 5 points, and those values squared to give values of W(t). The interpolation is then identical, and in fact the same `interpolate3' subroutine is used for both squaring and multiplying. Toom-3 is asymptotically O(N^1.465), the exponent being log(5)/log(3), representing 5 recursive multiplies of 1/3 the original size. This is an improvement over Karatsuba at O(N^1.585), though Toom-Cook does more work in the evaluation and interpolation and so it only realizes its advantage above a certain size. Near the crossover between Toom-3 and Karatsuba there's generally a range of sizes where the difference between the two is small. `MUL_TOOM3_THRESHOLD' is a somewhat arbitrary point in that range and successive runs of the tune program can give different values due to small variations in measuring. A graph of time versus size for the two shows the effect, see `tune/README'. At the fairly small sizes where the Toom-3 thresholds occur it's worth remembering that the asymptotic behaviour for Karatsuba and Toom-3 can't be expected to make accurate predictions, due of course to the big influence of all sorts of overheads, and the fact that only a few recursions of each are being performed. Even at large sizes there's a good chance machine dependent effects like cache architecture will mean actual performance deviates from what might be predicted. The formula given above for the Karatsuba algorithm has an equivalent for Toom-3 involving only five multiplies, but this would be complicated and unenlightening. An alternate view of Toom-3 can be found in Zuras (*note References::), using a vector to represent the x and y splits and a matrix multiplication for the evaluation and interpolation stages. The matrix inverses are not meant to be actually used, and they have elements with values much greater than in fact arise in the interpolation steps. The diagram shown for the 3-way is attractive, but again doesn't have to be implemented that way and for example with a bit of rearrangement just one division by 6 can be done.