OpenXM_contrib/gmp/gmp.info-7 - annotate

Return to gmp.info-7 CVS log
Up to [local] / OpenXM_contrib / gmp
Annotation of OpenXM_contrib/gmp/gmp.info-7, Revision 1.1

1.1     ! ohara       1: This is gmp.info, produced by makeinfo version 4.2 from gmp.texi.
        !             2:
        !             3: This manual describes how to install and use the GNU multiple precision
        !             4: arithmetic library, version 4.1.2.
        !             5:
        !             6:    Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
        !             7: 2001, 2002 Free Software Foundation, Inc.
        !             8:
        !             9:    Permission is granted to copy, distribute and/or modify this
        !            10: document under the terms of the GNU Free Documentation License, Version
        !            11: 1.1 or any later version published by the Free Software Foundation;
        !            12: with no Invariant Sections, with the Front-Cover Texts being "A GNU
        !            13: Manual", and with the Back-Cover Texts being "You have freedom to copy
        !            14: and modify this GNU Manual, like GNU software".  A copy of the license
        !            15: is included in *Note GNU Free Documentation License::.
        !            16: INFO-DIR-SECTION GNU libraries
        !            17: START-INFO-DIR-ENTRY
        !            18: * gmp: (gmp).                   GNU Multiple Precision Arithmetic Library.
        !            19: END-INFO-DIR-ENTRY
        !            20:
        !            21: 
        !            22: File: gmp.info,  Node: Assembler Cache Handling,  Next: Assembler Floating Point,  Prev: Assembler Carry Propagation,  Up: Assembler Coding
        !            23:
        !            24: Cache Handling
        !            25: --------------
        !            26:
        !            27:    GMP aims to perform well both on operands that fit entirely in L1
        !            28: cache and those which don't.
        !            29:
        !            30:    Basic routines like `mpn_add_n' or `mpn_lshift' are often used on
        !            31: large operands, so L2 and main memory performance is important for them.
        !            32: `mpn_mul_1' and `mpn_addmul_1' are mostly used for multiply and square
        !            33: basecases, so L1 performance matters most for them, unless assembler
        !            34: versions of `mpn_mul_basecase' and `mpn_sqr_basecase' exist, in which
        !            35: case the remaining uses are mostly for larger operands.
        !            36:
        !            37:    For L2 or main memory operands, memory access times will almost
        !            38: certainly be more than the calculation time.  The aim therefore is to
        !            39: maximize memory throughput, by starting a load of the next cache line
        !            40: which processing the contents of the previous one.  Clearly this is
        !            41: only possible if the chip has a lock-up free cache or some sort of
        !            42: prefetch instruction.  Most current chips have both these features.
        !            43:
        !            44:    Prefetching sources combines well with loop unrolling, since a
        !            45: prefetch can be initiated once per unrolled loop (or more than once if
        !            46: the loop covers more than one cache line).
        !            47:
        !            48:    On CPUs without write-allocate caches, prefetching destinations will
        !            49: ensure individual stores don't go further down the cache hierarchy,
        !            50: limiting bandwidth.  Of course for calculations which are slow anyway,
        !            51: like `mpn_divrem_1', write-throughs might be fine.
        !            52:
        !            53:    The distance ahead to prefetch will be determined by memory latency
        !            54: versus throughput.  The aim of course is to have data arriving
        !            55: continuously, at peak throughput.  Some CPUs have limits on the number
        !            56: of fetches or prefetches in progress.
        !            57:
        !            58:    If a special prefetch instruction doesn't exist then a plain load
        !            59: can be used, but in that case care must be taken not to attempt to read
        !            60: past the end of an operand, since that might produce a segmentation
        !            61: violation.
        !            62:
        !            63:    Some CPUs or systems have hardware that detects sequential memory
        !            64: accesses and initiates suitable cache movements automatically, making
        !            65: life easy.
        !            66:
        !            67: 
        !            68: File: gmp.info,  Node: Assembler Floating Point,  Next: Assembler SIMD Instructions,  Prev: Assembler Cache Handling,  Up: Assembler Coding
        !            69:
        !            70: Floating Point
        !            71: --------------
        !            72:
        !            73:    Floating point arithmetic is used in GMP for multiplications on CPUs
        !            74: with poor integer multipliers.  It's mostly useful for `mpn_mul_1',
        !            75: `mpn_addmul_1' and `mpn_submul_1' on 64-bit machines, and
        !            76: `mpn_mul_basecase' on both 32-bit and 64-bit machines.
        !            77:
        !            78:    With IEEE 53-bit double precision floats, integer multiplications
        !            79: producing up to 53 bits will give exact results.  Breaking a 64x64
        !            80: multiplication into eight 16x32->48 bit pieces is convenient.  With
        !            81: some care though six 21x32->53 bit products can be used, if one of the
        !            82: lower two 21-bit pieces also uses the sign bit.
        !            83:
        !            84:    For the `mpn_mul_1' family of functions on a 64-bit machine, the
        !            85: invariant single limb is split at the start, into 3 or 4 pieces.
        !            86: Inside the loop, the bignum operand is split into 32-bit pieces.  Fast
        !            87: conversion of these unsigned 32-bit pieces to floating point is highly
        !            88: machine-dependent.  In some cases, reading the data into the integer
        !            89: unit, zero-extending to 64-bits, then transferring to the floating
        !            90: point unit back via memory is the only option.
        !            91:
        !            92:    Converting partial products back to 64-bit limbs is usually best
        !            93: done as a signed conversion.  Since all values are smaller than 2^53,
        !            94: signed and unsigned are the same, but most processors lack unsigned
        !            95: conversions.
        !            96:
        !            97:
        !            98:
        !            99:    Here is a diagram showing 16x32 bit products for an `mpn_mul_1' or
        !           100: `mpn_addmul_1' with a 64-bit limb.  The single limb operand V is split
        !           101: into four 16-bit parts.  The multi-limb operand U is split in the loop
        !           102: into two 32-bit parts.
        !           103:
        !           104:                      +---+---+---+---+
        !           105:                      |v48|v32|v16|v00|    V operand
        !           106:                      +---+---+---+---+
        !           107:
        !           108:                      +-------+---+---+
        !           109:                  x   |  u32  |  u00  |    U operand (one limb)
        !           110:                      +---------------+
        !           111:
        !           112:      ---------------------------------
        !           113:
        !           114:                          +-----------+
        !           115:                          | u00 x v00 |    p00    48-bit products
        !           116:                          +-----------+
        !           117:                      +-----------+
        !           118:                      | u00 x v16 |        p16
        !           119:                      +-----------+
        !           120:                  +-----------+
        !           121:                  | u00 x v32 |            p32
        !           122:                  +-----------+
        !           123:              +-----------+
        !           124:              | u00 x v48 |                p48
        !           125:              +-----------+
        !           126:                  +-----------+
        !           127:                  | u32 x v00 |            r32
        !           128:                  +-----------+
        !           129:              +-----------+
        !           130:              | u32 x v16 |                r48
        !           131:              +-----------+
        !           132:          +-----------+
        !           133:          | u32 x v32 |                    r64
        !           134:          +-----------+
        !           135:      +-----------+
        !           136:      | u32 x v48 |                        r80
        !           137:      +-----------+
        !           138:
        !           139:    p32 and r32 can be summed using floating-point addition, and
        !           140: likewise p48 and r48.  p00 and p16 can be summed with r64 and r80 from
        !           141: the previous iteration.
        !           142:
        !           143:    For each loop then, four 49-bit quantities are transfered to the
        !           144: integer unit, aligned as follows,
        !           145:
        !           146:      |-----64bits----|-----64bits----|
        !           147:                         +------------+
        !           148:                         | p00 + r64' |    i00
        !           149:                         +------------+
        !           150:                     +------------+
        !           151:                     | p16 + r80' |        i16
        !           152:                     +------------+
        !           153:                 +------------+
        !           154:                 | p32 + r32  |            i32
        !           155:                 +------------+
        !           156:             +------------+
        !           157:             | p48 + r48  |                i48
        !           158:             +------------+
        !           159:
        !           160:    The challenge then is to sum these efficiently and add in a carry
        !           161: limb, generating a low 64-bit result limb and a high 33-bit carry limb
        !           162: (i48 extends 33 bits into the high half).
        !           163:
        !           164: 
        !           165: File: gmp.info,  Node: Assembler SIMD Instructions,  Next: Assembler Software Pipelining,  Prev: Assembler Floating Point,  Up: Assembler Coding
        !           166:
        !           167: SIMD Instructions
        !           168: -----------------
        !           169:
        !           170:    The single-instruction multiple-data support in current
        !           171: microprocessors is aimed at signal processing algorithms where each
        !           172: data point can be treated more or less independently.  There's
        !           173: generally not much support for propagating the sort of carries that
        !           174: arise in GMP.
        !           175:
        !           176:    SIMD multiplications of say four 16x16 bit multiplies only do as much
        !           177: work as one 32x32 from GMP's point of view, and need some shifts and
        !           178: adds besides.  But of course if say the SIMD form is fully pipelined
        !           179: and uses less instruction decoding then it may still be worthwhile.
        !           180:
        !           181:    On the 80x86 chips, MMX has so far found a use in `mpn_rshift' and
        !           182: `mpn_lshift' since it allows 64-bit operations, and is used in a special
        !           183: case for 16-bit multipliers in the P55 `mpn_mul_1'.  3DNow and SSE
        !           184: haven't found a use so far.
        !           185:
        !           186: 
        !           187: File: gmp.info,  Node: Assembler Software Pipelining,  Next: Assembler Loop Unrolling,  Prev: Assembler SIMD Instructions,  Up: Assembler Coding
        !           188:
        !           189: Software Pipelining
        !           190: -------------------
        !           191:
        !           192:    Software pipelining consists of scheduling instructions around the
        !           193: branch point in a loop.  For example a loop taking a checksum of an
        !           194: array of limbs might have a load and an add, but the load wouldn't be
        !           195: for that add, rather for the one next time around the loop.  Each load
        !           196: then is effectively scheduled back in the previous iteration, allowing
        !           197: latency to be hidden.
        !           198:
        !           199:    Naturally this is wanted only when doing things like loads or
        !           200: multiplies that take a few cycles to complete, and only where a CPU has
        !           201: multiple functional units so that other work can be done while waiting.
        !           202:
        !           203:    A pipeline with several stages will have a data value in progress at
        !           204: each stage and each loop iteration moves them along one stage.  This is
        !           205: like juggling.
        !           206:
        !           207:    Within the loop some moves between registers may be necessary to
        !           208: have the right values in the right places for each iteration.  Loop
        !           209: unrolling can help this, with each unrolled block able to use different
        !           210: registers for different values, even if some shuffling is still needed
        !           211: just before going back to the top of the loop.
        !           212:
        !           213: 
        !           214: File: gmp.info,  Node: Assembler Loop Unrolling,  Prev: Assembler Software Pipelining,  Up: Assembler Coding
        !           215:
        !           216: Loop Unrolling
        !           217: --------------
        !           218:
        !           219:    Loop unrolling consists of replicating code so that several limbs are
        !           220: processed in each loop.  At a minimum this reduces loop overheads by a
        !           221: corresponding factor, but it can also allow better register usage, for
        !           222: example alternately using one register combination and then another.
        !           223: Judicious use of `m4' macros can help avoid lots of duplication in the
        !           224: source code.
        !           225:
        !           226:    Unrolling is commonly done to a power of 2 multiple so the number of
        !           227: unrolled loops and the number of remaining limbs can be calculated with
        !           228: a shift and mask.  But other multiples can be used too, just by
        !           229: subtracting each N limbs processed from a counter and waiting for less
        !           230: than N remaining (or offsetting the counter by N so it goes negative
        !           231: when there's less than N remaining).
        !           232:
        !           233:    The limbs not a multiple of the unrolling can be handled in various
        !           234: ways, for example
        !           235:
        !           236:    * A simple loop at the end (or the start) to process the excess.
        !           237:      Care will be wanted that it isn't too much slower than the
        !           238:      unrolled part.
        !           239:
        !           240:    * A set of binary tests, for example after an 8-limb unrolling, test
        !           241:      for 4 more limbs to process, then a further 2 more or not, and
        !           242:      finally 1 more or not.  This will probably take more code space
        !           243:      than a simple loop.
        !           244:
        !           245:    * A `switch' statement, providing separate code for each possible
        !           246:      excess, for example an 8-limb unrolling would have separate code
        !           247:      for 0 remaining, 1 remaining, etc, up to 7 remaining.  This might
        !           248:      take a lot of code, but may be the best way to optimize all cases
        !           249:      in combination with a deep pipelined loop.
        !           250:
        !           251:    * A computed jump into the middle of the loop, thus making the first
        !           252:      iteration handle the excess.  This should make times smoothly
        !           253:      increase with size, which is attractive, but setups for the jump
        !           254:      and adjustments for pointers can be tricky and could become quite
        !           255:      difficult in combination with deep pipelining.
        !           256:
        !           257:    One way to write the setups and finishups for a pipelined unrolled
        !           258: loop is simply to duplicate the loop at the start and the end, then
        !           259: delete instructions at the start which have no valid antecedents, and
        !           260: delete instructions at the end whose results are unwanted.  Sizes not a
        !           261: multiple of the unrolling can then be handled as desired.
        !           262:
        !           263: 
        !           264: File: gmp.info,  Node: Internals,  Next: Contributors,  Prev: Algorithms,  Up: Top
        !           265:
        !           266: Internals
        !           267: *********
        !           268:
        !           269:    *This chapter is provided only for informational purposes and the
        !           270: various internals described here may change in future GMP releases.
        !           271: Applications expecting to be compatible with future releases should use
        !           272: only the documented interfaces described in previous chapters.*
        !           273:
        !           274: * Menu:
        !           275:
        !           276: * Integer Internals::
        !           277: * Rational Internals::
        !           278: * Float Internals::
        !           279: * Raw Output Internals::
        !           280: * C++ Interface Internals::
        !           281:
        !           282: 
        !           283: File: gmp.info,  Node: Integer Internals,  Next: Rational Internals,  Prev: Internals,  Up: Internals
        !           284:
        !           285: Integer Internals
        !           286: =================
        !           287:
        !           288:    `mpz_t' variables represent integers using sign and magnitude, in
        !           289: space dynamically allocated and reallocated.  The fields are as follows.
        !           290:
        !           291: `_mp_size'
        !           292:      The number of limbs, or the negative of that when representing a
        !           293:      negative integer.  Zero is represented by `_mp_size' set to zero,
        !           294:      in which case the `_mp_d' data is unused.
        !           295:
        !           296: `_mp_d'
        !           297:      A pointer to an array of limbs which is the magnitude.  These are
        !           298:      stored "little endian" as per the `mpn' functions, so `_mp_d[0]'
        !           299:      is the least significant limb and `_mp_d[ABS(_mp_size)-1]' is the
        !           300:      most significant.  Whenever `_mp_size' is non-zero, the most
        !           301:      significant limb is non-zero.
        !           302:
        !           303:      Currently there's always at least one limb allocated, so for
        !           304:      instance `mpz_set_ui' never needs to reallocate, and `mpz_get_ui'
        !           305:      can fetch `_mp_d[0]' unconditionally (though its value is then
        !           306:      only wanted if `_mp_size' is non-zero).
        !           307:
        !           308: `_mp_alloc'
        !           309:      `_mp_alloc' is the number of limbs currently allocated at `_mp_d',
        !           310:      and naturally `_mp_alloc >= ABS(_mp_size)'.  When an `mpz' routine
        !           311:      is about to (or might be about to) increase `_mp_size', it checks
        !           312:      `_mp_alloc' to see whether there's enough space, and reallocates
        !           313:      if not.  `MPZ_REALLOC' is generally used for this.
        !           314:
        !           315:    The various bitwise logical functions like `mpz_and' behave as if
        !           316: negative values were twos complement.  But sign and magnitude is always
        !           317: used internally, and necessary adjustments are made during the
        !           318: calculations.  Sometimes this isn't pretty, but sign and magnitude are
        !           319: best for other routines.
        !           320:
        !           321:    Some internal temporary variables are setup with `MPZ_TMP_INIT' and
        !           322: these have `_mp_d' space obtained from `TMP_ALLOC' rather than the
        !           323: memory allocation functions.  Care is taken to ensure that these are
        !           324: big enough that no reallocation is necessary (since it would have
        !           325: unpredictable consequences).
        !           326:
        !           327: 
        !           328: File: gmp.info,  Node: Rational Internals,  Next: Float Internals,  Prev: Integer Internals,  Up: Internals
        !           329:
        !           330: Rational Internals
        !           331: ==================
        !           332:
        !           333:    `mpq_t' variables represent rationals using an `mpz_t' numerator and
        !           334: denominator (*note Integer Internals::).
        !           335:
        !           336:    The canonical form adopted is denominator positive (and non-zero),
        !           337: no common factors between numerator and denominator, and zero uniquely
        !           338: represented as 0/1.
        !           339:
        !           340:    It's believed that casting out common factors at each stage of a
        !           341: calculation is best in general.  A GCD is an O(N^2) operation so it's
        !           342: better to do a few small ones immediately than to delay and have to do
        !           343: a big one later.  Knowing the numerator and denominator have no common
        !           344: factors can be used for example in `mpq_mul' to make only two cross
        !           345: GCDs necessary, not four.
        !           346:
        !           347:    This general approach to common factors is badly sub-optimal in the
        !           348: presence of simple factorizations or little prospect for cancellation,
        !           349: but GMP has no way to know when this will occur.  As per *Note
        !           350: Efficiency::, that's left to applications.  The `mpq_t' framework might
        !           351: still suit, with `mpq_numref' and `mpq_denref' for direct access to the
        !           352: numerator and denominator, or of course `mpz_t' variables can be used
        !           353: directly.
        !           354:
        !           355: 
        !           356: File: gmp.info,  Node: Float Internals,  Next: Raw Output Internals,  Prev: Rational Internals,  Up: Internals
        !           357:
        !           358: Float Internals
        !           359: ===============
        !           360:
        !           361:    Efficient calculation is the primary aim of GMP floats and the use
        !           362: of whole limbs and simple rounding facilitates this.
        !           363:
        !           364:    `mpf_t' floats have a variable precision mantissa and a single
        !           365: machine word signed exponent.  The mantissa is represented using sign
        !           366: and magnitude.
        !           367:
        !           368:         most                   least
        !           369:      significant            significant
        !           370:         limb                   limb
        !           371:
        !           372:                                  _mp_d
        !           373:       |---- _mp_exp --->           |
        !           374:        _____ _____ _____ _____ _____
        !           375:       |_____|_____|_____|_____|_____|
        !           376:                         . <------------ radix point
        !           377:
        !           378:        <-------- _mp_size --------->
        !           379:
        !           380: The fields are as follows.
        !           381:
        !           382: `_mp_size'
        !           383:      The number of limbs currently in use, or the negative of that when
        !           384:      representing a negative value.  Zero is represented by `_mp_size'
        !           385:      and `_mp_exp' both set to zero, and in that case the `_mp_d' data
        !           386:      is unused.  (In the future `_mp_exp' might be undefined when
        !           387:      representing zero.)
        !           388:
        !           389: `_mp_prec'
        !           390:      The precision of the mantissa, in limbs.  In any calculation the
        !           391:      aim is to produce `_mp_prec' limbs of result (the most significant
        !           392:      being non-zero).
        !           393:
        !           394: `_mp_d'
        !           395:      A pointer to the array of limbs which is the absolute value of the
        !           396:      mantissa.  These are stored "little endian" as per the `mpn'
        !           397:      functions, so `_mp_d[0]' is the least significant limb and
        !           398:      `_mp_d[ABS(_mp_size)-1]' the most significant.
        !           399:
        !           400:      The most significant limb is always non-zero, but there are no
        !           401:      other restrictions on its value, in particular the highest 1 bit
        !           402:      can be anywhere within the limb.
        !           403:
        !           404:      `_mp_prec+1' limbs are allocated to `_mp_d', the extra limb being
        !           405:      for convenience (see below).  There are no reallocations during a
        !           406:      calculation, only in a change of precision with `mpf_set_prec'.
        !           407:
        !           408: `_mp_exp'
        !           409:      The exponent, in limbs, determining the location of the implied
        !           410:      radix point.  Zero means the radix point is just above the most
        !           411:      significant limb.  Positive values mean a radix point offset
        !           412:      towards the lower limbs and hence a value >= 1, as for example in
        !           413:      the diagram above.  Negative exponents mean a radix point further
        !           414:      above the highest limb.
        !           415:
        !           416:      Naturally the exponent can be any value, it doesn't have to fall
        !           417:      within the limbs as the diagram shows, it can be a long way above
        !           418:      or a long way below.  Limbs other than those included in the
        !           419:      `{_mp_d,_mp_size}' data are treated as zero.
        !           420:
        !           421:
        !           422: The following various points should be noted.
        !           423:
        !           424: Low Zeros
        !           425:      The least significant limbs `_mp_d[0]' etc can be zero, though
        !           426:      such low zeros can always be ignored.  Routines likely to produce
        !           427:      low zeros check and avoid them to save time in subsequent
        !           428:      calculations, but for most routines they're quite unlikely and
        !           429:      aren't checked.
        !           430:
        !           431: Mantissa Size Range
        !           432:      The `_mp_size' count of limbs in use can be less than `_mp_prec' if
        !           433:      the value can be represented in less.  This means low precision
        !           434:      values or small integers stored in a high precision `mpf_t' can
        !           435:      still be operated on efficiently.
        !           436:
        !           437:      `_mp_size' can also be greater than `_mp_prec'.  Firstly a value is
        !           438:      allowed to use all of the `_mp_prec+1' limbs available at `_mp_d',
        !           439:      and secondly when `mpf_set_prec_raw' lowers `_mp_prec' it leaves
        !           440:      `_mp_size' unchanged and so the size can be arbitrarily bigger than
        !           441:      `_mp_prec'.
        !           442:
        !           443: Rounding
        !           444:      All rounding is done on limb boundaries.  Calculating `_mp_prec'
        !           445:      limbs with the high non-zero will ensure the application requested
        !           446:      minimum precision is obtained.
        !           447:
        !           448:      The use of simple "trunc" rounding towards zero is efficient,
        !           449:      since there's no need to examine extra limbs and increment or
        !           450:      decrement.
        !           451:
        !           452: Bit Shifts
        !           453:      Since the exponent is in limbs, there are no bit shifts in basic
        !           454:      operations like `mpf_add' and `mpf_mul'.  When differing exponents
        !           455:      are encountered all that's needed is to adjust pointers to line up
        !           456:      the relevant limbs.
        !           457:
        !           458:      Of course `mpf_mul_2exp' and `mpf_div_2exp' will require bit
        !           459:      shifts, but the choice is between an exponent in limbs which
        !           460:      requires shifts there, or one in bits which requires them almost
        !           461:      everywhere else.
        !           462:
        !           463: Use of `_mp_prec+1' Limbs
        !           464:      The extra limb on `_mp_d' (`_mp_prec+1' rather than just
        !           465:      `_mp_prec') helps when an `mpf' routine might get a carry from its
        !           466:      operation.  `mpf_add' for instance will do an `mpn_add' of
        !           467:      `_mp_prec' limbs.  If there's no carry then that's the result, but
        !           468:      if there is a carry then it's stored in the extra limb of space and
        !           469:      `_mp_size' becomes `_mp_prec+1'.
        !           470:
        !           471:      Whenever `_mp_prec+1' limbs are held in a variable, the low limb
        !           472:      is not needed for the intended precision, only the `_mp_prec' high
        !           473:      limbs.  But zeroing it out or moving the rest down is unnecessary.
        !           474:      Subsequent routines reading the value will simply take the high
        !           475:      limbs they need, and this will be `_mp_prec' if their target has
        !           476:      that same precision.  This is no more than a pointer adjustment,
        !           477:      and must be checked anyway since the destination precision can be
        !           478:      different from the sources.
        !           479:
        !           480:      Copy functions like `mpf_set' will retain a full `_mp_prec+1' limbs
        !           481:      if available.  This ensures that a variable which has `_mp_size'
        !           482:      equal to `_mp_prec+1' will get its full exact value copied.
        !           483:      Strictly speaking this is unnecessary since only `_mp_prec' limbs
        !           484:      are needed for the application's requested precision, but it's
        !           485:      considered that an `mpf_set' from one variable into another of the
        !           486:      same precision ought to produce an exact copy.
        !           487:
        !           488: Application Precisions
        !           489:      `__GMPF_BITS_TO_PREC' converts an application requested precision
        !           490:      to an `_mp_prec'.  The value in bits is rounded up to a whole limb
        !           491:      then an extra limb is added since the most significant limb of
        !           492:      `_mp_d' is only non-zero and therefore might contain only one bit.
        !           493:
        !           494:      `__GMPF_PREC_TO_BITS' does the reverse conversion, and removes the
        !           495:      extra limb from `_mp_prec' before converting to bits.  The net
        !           496:      effect of reading back with `mpf_get_prec' is simply the precision
        !           497:      rounded up to a multiple of `mp_bits_per_limb'.
        !           498:
        !           499:      Note that the extra limb added here for the high only being
        !           500:      non-zero is in addition to the extra limb allocated to `_mp_d'.
        !           501:      For example with a 32-bit limb, an application request for 250
        !           502:      bits will be rounded up to 8 limbs, then an extra added for the
        !           503:      high being only non-zero, giving an `_mp_prec' of 9.  `_mp_d' then
        !           504:      gets 10 limbs allocated.  Reading back with `mpf_get_prec' will
        !           505:      take `_mp_prec' subtract 1 limb and multiply by 32, giving 256
        !           506:      bits.
        !           507:
        !           508:      Strictly speaking, the fact the high limb has at least one bit
        !           509:      means that a float with, say, 3 limbs of 32-bits each will be
        !           510:      holding at least 65 bits, but for the purposes of `mpf_t' it's
        !           511:      considered simply to be 64 bits, a nice multiple of the limb size.
        !           512:
        !           513: 
        !           514: File: gmp.info,  Node: Raw Output Internals,  Next: C++ Interface Internals,  Prev: Float Internals,  Up: Internals
        !           515:
        !           516: Raw Output Internals
        !           517: ====================
        !           518:
        !           519: `mpz_out_raw' uses the following format.
        !           520:
        !           521:      +------+------------------------+
        !           522:      | size |       data bytes       |
        !           523:      +------+------------------------+
        !           524:
        !           525:    The size is 4 bytes written most significant byte first, being the
        !           526: number of subsequent data bytes, or the twos complement negative of
        !           527: that when a negative integer is represented.  The data bytes are the
        !           528: absolute value of the integer, written most significant byte first.
        !           529:
        !           530:    The most significant data byte is always non-zero, so the output is
        !           531: the same on all systems, irrespective of limb size.
        !           532:
        !           533:    In GMP 1, leading zero bytes were written to pad the data bytes to a
        !           534: multiple of the limb size.  `mpz_inp_raw' will still accept this, for
        !           535: compatibility.
        !           536:
        !           537:    The use of "big endian" for both the size and data fields is
        !           538: deliberate, it makes the data easy to read in a hex dump of a file.
        !           539: Unfortunately it also means that the limb data must be reversed when
        !           540: reading or writing, so neither a big endian nor little endian system
        !           541: can just read and write `_mp_d'.
        !           542:
        !           543: 
        !           544: File: gmp.info,  Node: C++ Interface Internals,  Prev: Raw Output Internals,  Up: Internals
        !           545:
        !           546: C++ Interface Internals
        !           547: =======================
        !           548:
        !           549:    A system of expression templates is used to ensure something like
        !           550: `a=b+c' turns into a simple call to `mpz_add' etc.  For `mpf_class' and
        !           551: `mpfr_class' the scheme also ensures the precision of the final
        !           552: destination is used for any temporaries within a statement like
        !           553: `f=w*x+y*z'.  These are important features which a naive implementation
        !           554: cannot provide.
        !           555:
        !           556:    A simplified description of the scheme follows.  The true scheme is
        !           557: complicated by the fact that expressions have different return types.
        !           558: For detailed information, refer to the source code.
        !           559:
        !           560:    To perform an operation, say, addition, we first define a "function
        !           561: object" evaluating it,
        !           562:
        !           563:      struct __gmp_binary_plus
        !           564:      {
        !           565:        static void eval(mpf_t f, mpf_t g, mpf_t h) { mpf_add(f, g, h); }
        !           566:      };
        !           567:
        !           568: And an "additive expression" object,
        !           569:
        !           570:      __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
        !           571:      operator+(const mpf_class &f, const mpf_class &g)
        !           572:      {
        !           573:        return __gmp_expr
        !           574:          <__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
        !           575:      }
        !           576:
        !           577:    The seemingly redundant `__gmp_expr<__gmp_binary_expr<...>>' is used
        !           578: to encapsulate any possible kind of expression into a single template
        !           579: type.  In fact even `mpf_class' etc are `typedef' specializations of
        !           580: `__gmp_expr'.
        !           581:
        !           582:    Next we define assignment of `__gmp_expr' to `mpf_class'.
        !           583:
        !           584:      template <class T>
        !           585:      mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
        !           586:      {
        !           587:        expr.eval(this->get_mpf_t(), this->precision());
        !           588:        return *this;
        !           589:      }
        !           590:
        !           591:      template <class Op>
        !           592:      void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
        !           593:      (mpf_t f, unsigned long int precision)
        !           594:      {
        !           595:        Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
        !           596:      }
        !           597:
        !           598:    where `expr.val1' and `expr.val2' are references to the expression's
        !           599: operands (here `expr' is the `__gmp_binary_expr' stored within the
        !           600: `__gmp_expr').
        !           601:
        !           602:    This way, the expression is actually evaluated only at the time of
        !           603: assignment, when the required precision (that of `f') is known.
        !           604: Furthermore the target `mpf_t' is now available, thus we can call
        !           605: `mpf_add' directly with `f' as the output argument.
        !           606:
        !           607:    Compound expressions are handled by defining operators taking
        !           608: subexpressions as their arguments, like this:
        !           609:
        !           610:      template <class T, class U>
        !           611:      __gmp_expr
        !           612:      <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
        !           613:      operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
        !           614:      {
        !           615:        return __gmp_expr
        !           616:          <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
        !           617:          (expr1, expr2);
        !           618:      }
        !           619:
        !           620:    And the corresponding specializations of `__gmp_expr::eval':
        !           621:
        !           622:      template <class T, class U, class Op>
        !           623:      void __gmp_expr
        !           624:      <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
        !           625:      (mpf_t f, unsigned long int precision)
        !           626:      {
        !           627:        // declare two temporaries
        !           628:        mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
        !           629:        Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
        !           630:      }
        !           631:
        !           632:    The expression is thus recursively evaluated to any level of
        !           633: complexity and all subexpressions are evaluated to the precision of `f'.
        !           634:
        !           635: 
        !           636: File: gmp.info,  Node: Contributors,  Next: References,  Prev: Internals,  Up: Top
        !           637:
        !           638: Contributors
        !           639: ************
        !           640:
        !           641:    Torbjorn Granlund wrote the original GMP library and is still
        !           642: developing and maintaining it.  Several other individuals and
        !           643: organizations have contributed to GMP in various ways.  Here is a list
        !           644: in chronological order:
        !           645:
        !           646:    Gunnar Sjoedin and Hans Riesel helped with mathematical problems in
        !           647: early versions of the library.
        !           648:
        !           649:    Richard Stallman contributed to the interface design and revised the
        !           650: first version of this manual.
        !           651:
        !           652:    Brian Beuning and Doug Lea helped with testing of early versions of
        !           653: the library and made creative suggestions.
        !           654:
        !           655:    John Amanatides of York University in Canada contributed the function
        !           656: `mpz_probab_prime_p'.
        !           657:
        !           658:    Paul Zimmermann of Inria sparked the development of GMP 2, with his
        !           659: comparisons between bignum packages.
        !           660:
        !           661:    Ken Weber (Kent State University, Universidade Federal do Rio Grande
        !           662: do Sul) contributed `mpz_gcd', `mpz_divexact', `mpn_gcd', and
        !           663: `mpn_bdivmod', partially supported by CNPq (Brazil) grant 301314194-2.
        !           664:
        !           665:    Per Bothner of Cygnus Support helped to set up GMP to use Cygnus'
        !           666: configure.  He has also made valuable suggestions and tested numerous
        !           667: intermediary releases.
        !           668:
        !           669:    Joachim Hollman was involved in the design of the `mpf' interface,
        !           670: and in the `mpz' design revisions for version 2.
        !           671:
        !           672:    Bennet Yee contributed the initial versions of `mpz_jacobi' and
        !           673: `mpz_legendre'.
        !           674:
        !           675:    Andreas Schwab contributed the files `mpn/m68k/lshift.S' and
        !           676: `mpn/m68k/rshift.S' (now in `.asm' form).
        !           677:
        !           678:    The development of floating point functions of GNU MP 2, were
        !           679: supported in part by the ESPRIT-BRA (Basic Research Activities) 6846
        !           680: project POSSO (POlynomial System SOlving).
        !           681:
        !           682:    GNU MP 2 was finished and released by SWOX AB, SWEDEN, in
        !           683: cooperation with the IDA Center for Computing Sciences, USA.
        !           684:
        !           685:    Robert Harley of Inria, France and David Seal of ARM, England,
        !           686: suggested clever improvements for population count.
        !           687:
        !           688:    Robert Harley also wrote highly optimized Karatsuba and 3-way Toom
        !           689: multiplication functions for GMP 3.  He also contributed the ARM
        !           690: assembly code.
        !           691:
        !           692:    Torsten Ekedahl of the Mathematical department of Stockholm
        !           693: University provided significant inspiration during several phases of
        !           694: the GMP development.  His mathematical expertise helped improve several
        !           695: algorithms.
        !           696:
        !           697:    Paul Zimmermann wrote the Divide and Conquer division code, the REDC
        !           698: code, the REDC-based mpz_powm code, the FFT multiply code, and the
        !           699: Karatsuba square root.  The ECMNET project Paul is organizing was a
        !           700: driving force behind many of the optimizations in GMP 3.
        !           701:
        !           702:    Linus Nordberg wrote the new configure system based on autoconf and
        !           703: implemented the new random functions.
        !           704:
        !           705:    Kent Boortz made the Macintosh port.
        !           706:
        !           707:    Kevin Ryde worked on a number of things: optimized x86 code, m4 asm
        !           708: macros, parameter tuning, speed measuring, the configure system,
        !           709: function inlining, divisibility tests, bit scanning, Jacobi symbols,
        !           710: Fibonacci and Lucas number functions, printf and scanf functions, perl
        !           711: interface, demo expression parser, the algorithms chapter in the
        !           712: manual, `gmpasm-mode.el', and various miscellaneous improvements
        !           713: elsewhere.
        !           714:
        !           715:    Steve Root helped write the optimized alpha 21264 assembly code.
        !           716:
        !           717:    Gerardo Ballabio wrote the `gmpxx.h' C++ class interface and the C++
        !           718: `istream' input routines.
        !           719:
        !           720:    GNU MP 4.0 was finished and released by Torbjorn Granlund and Kevin
        !           721: Ryde.  Torbjorn's work was partially funded by the IDA Center for
        !           722: Computing Sciences, USA.
        !           723:
        !           724:    (This list is chronological, not ordered after significance.  If you
        !           725: have contributed to GMP but are not listed above, please tell
        !           726: <tege@swox.com> about the omission!)
        !           727:
        !           728:    Thanks goes to Hans Thorsen for donating an SGI system for the GMP
        !           729: test system environment.
        !           730:
        !           731: 
        !           732: File: gmp.info,  Node: References,  Next: GNU Free Documentation License,  Prev: Contributors,  Up: Top
        !           733:
        !           734: References
        !           735: **********
        !           736:
        !           737: Books
        !           738: =====
        !           739:
        !           740:    * Jonathan M. Borwein and Peter B. Borwein, "Pi and the AGM: A Study
        !           741:      in Analytic Number Theory and Computational Complexity", Wiley,
        !           742:      John & Sons, 1998.
        !           743:
        !           744:    * Henri Cohen, "A Course in Computational Algebraic Number Theory",
        !           745:      Graduate Texts in Mathematics number 138, Springer-Verlag, 1993.
        !           746:      `http://www.math.u-bordeaux.fr/~cohen'
        !           747:
        !           748:    * Donald E. Knuth, "The Art of Computer Programming", volume 2,
        !           749:      "Seminumerical Algorithms", 3rd edition, Addison-Wesley, 1998.
        !           750:      `http://www-cs-faculty.stanford.edu/~knuth/taocp.html'
        !           751:
        !           752:    * John D. Lipson, "Elements of Algebra and Algebraic Computing", The
        !           753:      Benjamin Cummings Publishing Company Inc, 1981.
        !           754:
        !           755:    * Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone,
        !           756:      "Handbook of Applied Cryptography",
        !           757:      `http://www.cacr.math.uwaterloo.ca/hac/'
        !           758:
        !           759:    * Richard M. Stallman, "Using and Porting GCC", Free Software
        !           760:      Foundation, 1999, available online
        !           761:      `http://www.gnu.org/software/gcc/onlinedocs/', and in the GCC
        !           762:      package `ftp://ftp.gnu.org/gnu/gcc/'
        !           763:
        !           764: Papers
        !           765: ======
        !           766:
        !           767:    * Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division",
        !           768:      Max-Planck-Institut fuer Informatik Research Report
        !           769:      MPI-I-98-1-022,
        !           770:      `http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022'
        !           771:
        !           772:    * Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
        !           773:      Integers using Multiplication", in Proceedings of the SIGPLAN
        !           774:      PLDI'94 Conference, June 1994.  Also available
        !           775:      `ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz' (and .psl.gz).
        !           776:
        !           777:    * Peter L. Montgomery, "Modular Multiplication Without Trial
        !           778:      Division", in Mathematics of Computation, volume 44, number 170,
        !           779:      April 1985.
        !           780:
        !           781:    * Tudor Jebelean, "An algorithm for exact division", Journal of
        !           782:      Symbolic Computation, volume 15, 1993, pp. 169-180.  Research
        !           783:      report version available
        !           784:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz'
        !           785:
        !           786:    * Tudor Jebelean, "Exact Division with Karatsuba Complexity -
        !           787:      Extended Abstract", RISC-Linz technical report 96-31,
        !           788:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz'
        !           789:
        !           790:    * Tudor Jebelean, "Practical Integer Division with Karatsuba
        !           791:      Complexity", ISSAC 97, pp. 339-341.  Technical report available
        !           792:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz'
        !           793:
        !           794:    * Tudor Jebelean, "A Generalization of the Binary GCD Algorithm",
        !           795:      ISSAC 93, pp. 111-116.  Technical report version available
        !           796:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz'
        !           797:
        !           798:    * Tudor Jebelean, "A Double-Digit Lehmer-Euclid Algorithm for
        !           799:      Finding the GCD of Long Integers", Journal of Symbolic
        !           800:      Computation, volume 19, 1995, pp. 145-157.  Technical report
        !           801:      version also available
        !           802:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz'
        !           803:
        !           804:    * Werner Krandick and Tudor Jebelean, "Bidirectional Exact Integer
        !           805:      Division", Journal of Symbolic Computation, volume 21, 1996, pp.
        !           806:      441-455.  Early technical report version also available
        !           807:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz'
        !           808:
        !           809:    * R. Moenck and A. Borodin, "Fast Modular Transforms via Division",
        !           810:      Proceedings of the 13th Annual IEEE Symposium on Switching and
        !           811:      Automata Theory, October 1972, pp. 90-96.  Reprinted as "Fast
        !           812:      Modular Transforms", Journal of Computer and System Sciences,
        !           813:      volume 8, number 3, June 1974, pp. 366-386.
        !           814:
        !           815:    * Arnold Scho"nhage and Volker Strassen, "Schnelle Multiplikation
        !           816:      grosser Zahlen", Computing 7, 1971, pp. 281-292.
        !           817:
        !           818:    * Kenneth Weber, "The accelerated integer GCD algorithm", ACM
        !           819:      Transactions on Mathematical Software, volume 21, number 1, March
        !           820:      1995, pp. 111-122.
        !           821:
        !           822:    * Paul Zimmermann, "Karatsuba Square Root", INRIA Research Report
        !           823:      3805, November 1999, `http://www.inria.fr/RRRT/RR-3805.html'
        !           824:
        !           825:    * Paul Zimmermann, "A Proof of GMP Fast Division and Square Root
        !           826:      Implementations",
        !           827:      `http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz'
        !           828:
        !           829:    * Dan Zuras, "On Squaring and Multiplying Large Integers", ARITH-11:
        !           830:      IEEE Symposium on Computer Arithmetic, 1993, pp. 260 to 271.
        !           831:      Reprinted as "More on Multiplying and Squaring Large Integers",
        !           832:      IEEE Transactions on Computers, volume 43, number 8, August 1994,
        !           833:      pp. 899-908.
        !           834:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>