[BACK]Return to gmp.info-7 CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp

Annotation of OpenXM_contrib/gmp/gmp.info-7, Revision 1.1.1.1

1.1       ohara       1: This is gmp.info, produced by makeinfo version 4.2 from gmp.texi.
                      2:
                      3: This manual describes how to install and use the GNU multiple precision
                      4: arithmetic library, version 4.1.2.
                      5:
                      6:    Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
                      7: 2001, 2002 Free Software Foundation, Inc.
                      8:
                      9:    Permission is granted to copy, distribute and/or modify this
                     10: document under the terms of the GNU Free Documentation License, Version
                     11: 1.1 or any later version published by the Free Software Foundation;
                     12: with no Invariant Sections, with the Front-Cover Texts being "A GNU
                     13: Manual", and with the Back-Cover Texts being "You have freedom to copy
                     14: and modify this GNU Manual, like GNU software".  A copy of the license
                     15: is included in *Note GNU Free Documentation License::.
                     16: INFO-DIR-SECTION GNU libraries
                     17: START-INFO-DIR-ENTRY
                     18: * gmp: (gmp).                   GNU Multiple Precision Arithmetic Library.
                     19: END-INFO-DIR-ENTRY
                     20:
                     21: 
                     22: File: gmp.info,  Node: Assembler Cache Handling,  Next: Assembler Floating Point,  Prev: Assembler Carry Propagation,  Up: Assembler Coding
                     23:
                     24: Cache Handling
                     25: --------------
                     26:
                     27:    GMP aims to perform well both on operands that fit entirely in L1
                     28: cache and those which don't.
                     29:
                     30:    Basic routines like `mpn_add_n' or `mpn_lshift' are often used on
                     31: large operands, so L2 and main memory performance is important for them.
                     32: `mpn_mul_1' and `mpn_addmul_1' are mostly used for multiply and square
                     33: basecases, so L1 performance matters most for them, unless assembler
                     34: versions of `mpn_mul_basecase' and `mpn_sqr_basecase' exist, in which
                     35: case the remaining uses are mostly for larger operands.
                     36:
                     37:    For L2 or main memory operands, memory access times will almost
                     38: certainly be more than the calculation time.  The aim therefore is to
                     39: maximize memory throughput, by starting a load of the next cache line
                     40: which processing the contents of the previous one.  Clearly this is
                     41: only possible if the chip has a lock-up free cache or some sort of
                     42: prefetch instruction.  Most current chips have both these features.
                     43:
                     44:    Prefetching sources combines well with loop unrolling, since a
                     45: prefetch can be initiated once per unrolled loop (or more than once if
                     46: the loop covers more than one cache line).
                     47:
                     48:    On CPUs without write-allocate caches, prefetching destinations will
                     49: ensure individual stores don't go further down the cache hierarchy,
                     50: limiting bandwidth.  Of course for calculations which are slow anyway,
                     51: like `mpn_divrem_1', write-throughs might be fine.
                     52:
                     53:    The distance ahead to prefetch will be determined by memory latency
                     54: versus throughput.  The aim of course is to have data arriving
                     55: continuously, at peak throughput.  Some CPUs have limits on the number
                     56: of fetches or prefetches in progress.
                     57:
                     58:    If a special prefetch instruction doesn't exist then a plain load
                     59: can be used, but in that case care must be taken not to attempt to read
                     60: past the end of an operand, since that might produce a segmentation
                     61: violation.
                     62:
                     63:    Some CPUs or systems have hardware that detects sequential memory
                     64: accesses and initiates suitable cache movements automatically, making
                     65: life easy.
                     66:
                     67: 
                     68: File: gmp.info,  Node: Assembler Floating Point,  Next: Assembler SIMD Instructions,  Prev: Assembler Cache Handling,  Up: Assembler Coding
                     69:
                     70: Floating Point
                     71: --------------
                     72:
                     73:    Floating point arithmetic is used in GMP for multiplications on CPUs
                     74: with poor integer multipliers.  It's mostly useful for `mpn_mul_1',
                     75: `mpn_addmul_1' and `mpn_submul_1' on 64-bit machines, and
                     76: `mpn_mul_basecase' on both 32-bit and 64-bit machines.
                     77:
                     78:    With IEEE 53-bit double precision floats, integer multiplications
                     79: producing up to 53 bits will give exact results.  Breaking a 64x64
                     80: multiplication into eight 16x32->48 bit pieces is convenient.  With
                     81: some care though six 21x32->53 bit products can be used, if one of the
                     82: lower two 21-bit pieces also uses the sign bit.
                     83:
                     84:    For the `mpn_mul_1' family of functions on a 64-bit machine, the
                     85: invariant single limb is split at the start, into 3 or 4 pieces.
                     86: Inside the loop, the bignum operand is split into 32-bit pieces.  Fast
                     87: conversion of these unsigned 32-bit pieces to floating point is highly
                     88: machine-dependent.  In some cases, reading the data into the integer
                     89: unit, zero-extending to 64-bits, then transferring to the floating
                     90: point unit back via memory is the only option.
                     91:
                     92:    Converting partial products back to 64-bit limbs is usually best
                     93: done as a signed conversion.  Since all values are smaller than 2^53,
                     94: signed and unsigned are the same, but most processors lack unsigned
                     95: conversions.
                     96:
                     97:
                     98:
                     99:    Here is a diagram showing 16x32 bit products for an `mpn_mul_1' or
                    100: `mpn_addmul_1' with a 64-bit limb.  The single limb operand V is split
                    101: into four 16-bit parts.  The multi-limb operand U is split in the loop
                    102: into two 32-bit parts.
                    103:
                    104:                      +---+---+---+---+
                    105:                      |v48|v32|v16|v00|    V operand
                    106:                      +---+---+---+---+
                    107:
                    108:                      +-------+---+---+
                    109:                  x   |  u32  |  u00  |    U operand (one limb)
                    110:                      +---------------+
                    111:
                    112:      ---------------------------------
                    113:
                    114:                          +-----------+
                    115:                          | u00 x v00 |    p00    48-bit products
                    116:                          +-----------+
                    117:                      +-----------+
                    118:                      | u00 x v16 |        p16
                    119:                      +-----------+
                    120:                  +-----------+
                    121:                  | u00 x v32 |            p32
                    122:                  +-----------+
                    123:              +-----------+
                    124:              | u00 x v48 |                p48
                    125:              +-----------+
                    126:                  +-----------+
                    127:                  | u32 x v00 |            r32
                    128:                  +-----------+
                    129:              +-----------+
                    130:              | u32 x v16 |                r48
                    131:              +-----------+
                    132:          +-----------+
                    133:          | u32 x v32 |                    r64
                    134:          +-----------+
                    135:      +-----------+
                    136:      | u32 x v48 |                        r80
                    137:      +-----------+
                    138:
                    139:    p32 and r32 can be summed using floating-point addition, and
                    140: likewise p48 and r48.  p00 and p16 can be summed with r64 and r80 from
                    141: the previous iteration.
                    142:
                    143:    For each loop then, four 49-bit quantities are transfered to the
                    144: integer unit, aligned as follows,
                    145:
                    146:      |-----64bits----|-----64bits----|
                    147:                         +------------+
                    148:                         | p00 + r64' |    i00
                    149:                         +------------+
                    150:                     +------------+
                    151:                     | p16 + r80' |        i16
                    152:                     +------------+
                    153:                 +------------+
                    154:                 | p32 + r32  |            i32
                    155:                 +------------+
                    156:             +------------+
                    157:             | p48 + r48  |                i48
                    158:             +------------+
                    159:
                    160:    The challenge then is to sum these efficiently and add in a carry
                    161: limb, generating a low 64-bit result limb and a high 33-bit carry limb
                    162: (i48 extends 33 bits into the high half).
                    163:
                    164: 
                    165: File: gmp.info,  Node: Assembler SIMD Instructions,  Next: Assembler Software Pipelining,  Prev: Assembler Floating Point,  Up: Assembler Coding
                    166:
                    167: SIMD Instructions
                    168: -----------------
                    169:
                    170:    The single-instruction multiple-data support in current
                    171: microprocessors is aimed at signal processing algorithms where each
                    172: data point can be treated more or less independently.  There's
                    173: generally not much support for propagating the sort of carries that
                    174: arise in GMP.
                    175:
                    176:    SIMD multiplications of say four 16x16 bit multiplies only do as much
                    177: work as one 32x32 from GMP's point of view, and need some shifts and
                    178: adds besides.  But of course if say the SIMD form is fully pipelined
                    179: and uses less instruction decoding then it may still be worthwhile.
                    180:
                    181:    On the 80x86 chips, MMX has so far found a use in `mpn_rshift' and
                    182: `mpn_lshift' since it allows 64-bit operations, and is used in a special
                    183: case for 16-bit multipliers in the P55 `mpn_mul_1'.  3DNow and SSE
                    184: haven't found a use so far.
                    185:
                    186: 
                    187: File: gmp.info,  Node: Assembler Software Pipelining,  Next: Assembler Loop Unrolling,  Prev: Assembler SIMD Instructions,  Up: Assembler Coding
                    188:
                    189: Software Pipelining
                    190: -------------------
                    191:
                    192:    Software pipelining consists of scheduling instructions around the
                    193: branch point in a loop.  For example a loop taking a checksum of an
                    194: array of limbs might have a load and an add, but the load wouldn't be
                    195: for that add, rather for the one next time around the loop.  Each load
                    196: then is effectively scheduled back in the previous iteration, allowing
                    197: latency to be hidden.
                    198:
                    199:    Naturally this is wanted only when doing things like loads or
                    200: multiplies that take a few cycles to complete, and only where a CPU has
                    201: multiple functional units so that other work can be done while waiting.
                    202:
                    203:    A pipeline with several stages will have a data value in progress at
                    204: each stage and each loop iteration moves them along one stage.  This is
                    205: like juggling.
                    206:
                    207:    Within the loop some moves between registers may be necessary to
                    208: have the right values in the right places for each iteration.  Loop
                    209: unrolling can help this, with each unrolled block able to use different
                    210: registers for different values, even if some shuffling is still needed
                    211: just before going back to the top of the loop.
                    212:
                    213: 
                    214: File: gmp.info,  Node: Assembler Loop Unrolling,  Prev: Assembler Software Pipelining,  Up: Assembler Coding
                    215:
                    216: Loop Unrolling
                    217: --------------
                    218:
                    219:    Loop unrolling consists of replicating code so that several limbs are
                    220: processed in each loop.  At a minimum this reduces loop overheads by a
                    221: corresponding factor, but it can also allow better register usage, for
                    222: example alternately using one register combination and then another.
                    223: Judicious use of `m4' macros can help avoid lots of duplication in the
                    224: source code.
                    225:
                    226:    Unrolling is commonly done to a power of 2 multiple so the number of
                    227: unrolled loops and the number of remaining limbs can be calculated with
                    228: a shift and mask.  But other multiples can be used too, just by
                    229: subtracting each N limbs processed from a counter and waiting for less
                    230: than N remaining (or offsetting the counter by N so it goes negative
                    231: when there's less than N remaining).
                    232:
                    233:    The limbs not a multiple of the unrolling can be handled in various
                    234: ways, for example
                    235:
                    236:    * A simple loop at the end (or the start) to process the excess.
                    237:      Care will be wanted that it isn't too much slower than the
                    238:      unrolled part.
                    239:
                    240:    * A set of binary tests, for example after an 8-limb unrolling, test
                    241:      for 4 more limbs to process, then a further 2 more or not, and
                    242:      finally 1 more or not.  This will probably take more code space
                    243:      than a simple loop.
                    244:
                    245:    * A `switch' statement, providing separate code for each possible
                    246:      excess, for example an 8-limb unrolling would have separate code
                    247:      for 0 remaining, 1 remaining, etc, up to 7 remaining.  This might
                    248:      take a lot of code, but may be the best way to optimize all cases
                    249:      in combination with a deep pipelined loop.
                    250:
                    251:    * A computed jump into the middle of the loop, thus making the first
                    252:      iteration handle the excess.  This should make times smoothly
                    253:      increase with size, which is attractive, but setups for the jump
                    254:      and adjustments for pointers can be tricky and could become quite
                    255:      difficult in combination with deep pipelining.
                    256:
                    257:    One way to write the setups and finishups for a pipelined unrolled
                    258: loop is simply to duplicate the loop at the start and the end, then
                    259: delete instructions at the start which have no valid antecedents, and
                    260: delete instructions at the end whose results are unwanted.  Sizes not a
                    261: multiple of the unrolling can then be handled as desired.
                    262:
                    263: 
                    264: File: gmp.info,  Node: Internals,  Next: Contributors,  Prev: Algorithms,  Up: Top
                    265:
                    266: Internals
                    267: *********
                    268:
                    269:    *This chapter is provided only for informational purposes and the
                    270: various internals described here may change in future GMP releases.
                    271: Applications expecting to be compatible with future releases should use
                    272: only the documented interfaces described in previous chapters.*
                    273:
                    274: * Menu:
                    275:
                    276: * Integer Internals::
                    277: * Rational Internals::
                    278: * Float Internals::
                    279: * Raw Output Internals::
                    280: * C++ Interface Internals::
                    281:
                    282: 
                    283: File: gmp.info,  Node: Integer Internals,  Next: Rational Internals,  Prev: Internals,  Up: Internals
                    284:
                    285: Integer Internals
                    286: =================
                    287:
                    288:    `mpz_t' variables represent integers using sign and magnitude, in
                    289: space dynamically allocated and reallocated.  The fields are as follows.
                    290:
                    291: `_mp_size'
                    292:      The number of limbs, or the negative of that when representing a
                    293:      negative integer.  Zero is represented by `_mp_size' set to zero,
                    294:      in which case the `_mp_d' data is unused.
                    295:
                    296: `_mp_d'
                    297:      A pointer to an array of limbs which is the magnitude.  These are
                    298:      stored "little endian" as per the `mpn' functions, so `_mp_d[0]'
                    299:      is the least significant limb and `_mp_d[ABS(_mp_size)-1]' is the
                    300:      most significant.  Whenever `_mp_size' is non-zero, the most
                    301:      significant limb is non-zero.
                    302:
                    303:      Currently there's always at least one limb allocated, so for
                    304:      instance `mpz_set_ui' never needs to reallocate, and `mpz_get_ui'
                    305:      can fetch `_mp_d[0]' unconditionally (though its value is then
                    306:      only wanted if `_mp_size' is non-zero).
                    307:
                    308: `_mp_alloc'
                    309:      `_mp_alloc' is the number of limbs currently allocated at `_mp_d',
                    310:      and naturally `_mp_alloc >= ABS(_mp_size)'.  When an `mpz' routine
                    311:      is about to (or might be about to) increase `_mp_size', it checks
                    312:      `_mp_alloc' to see whether there's enough space, and reallocates
                    313:      if not.  `MPZ_REALLOC' is generally used for this.
                    314:
                    315:    The various bitwise logical functions like `mpz_and' behave as if
                    316: negative values were twos complement.  But sign and magnitude is always
                    317: used internally, and necessary adjustments are made during the
                    318: calculations.  Sometimes this isn't pretty, but sign and magnitude are
                    319: best for other routines.
                    320:
                    321:    Some internal temporary variables are setup with `MPZ_TMP_INIT' and
                    322: these have `_mp_d' space obtained from `TMP_ALLOC' rather than the
                    323: memory allocation functions.  Care is taken to ensure that these are
                    324: big enough that no reallocation is necessary (since it would have
                    325: unpredictable consequences).
                    326:
                    327: 
                    328: File: gmp.info,  Node: Rational Internals,  Next: Float Internals,  Prev: Integer Internals,  Up: Internals
                    329:
                    330: Rational Internals
                    331: ==================
                    332:
                    333:    `mpq_t' variables represent rationals using an `mpz_t' numerator and
                    334: denominator (*note Integer Internals::).
                    335:
                    336:    The canonical form adopted is denominator positive (and non-zero),
                    337: no common factors between numerator and denominator, and zero uniquely
                    338: represented as 0/1.
                    339:
                    340:    It's believed that casting out common factors at each stage of a
                    341: calculation is best in general.  A GCD is an O(N^2) operation so it's
                    342: better to do a few small ones immediately than to delay and have to do
                    343: a big one later.  Knowing the numerator and denominator have no common
                    344: factors can be used for example in `mpq_mul' to make only two cross
                    345: GCDs necessary, not four.
                    346:
                    347:    This general approach to common factors is badly sub-optimal in the
                    348: presence of simple factorizations or little prospect for cancellation,
                    349: but GMP has no way to know when this will occur.  As per *Note
                    350: Efficiency::, that's left to applications.  The `mpq_t' framework might
                    351: still suit, with `mpq_numref' and `mpq_denref' for direct access to the
                    352: numerator and denominator, or of course `mpz_t' variables can be used
                    353: directly.
                    354:
                    355: 
                    356: File: gmp.info,  Node: Float Internals,  Next: Raw Output Internals,  Prev: Rational Internals,  Up: Internals
                    357:
                    358: Float Internals
                    359: ===============
                    360:
                    361:    Efficient calculation is the primary aim of GMP floats and the use
                    362: of whole limbs and simple rounding facilitates this.
                    363:
                    364:    `mpf_t' floats have a variable precision mantissa and a single
                    365: machine word signed exponent.  The mantissa is represented using sign
                    366: and magnitude.
                    367:
                    368:         most                   least
                    369:      significant            significant
                    370:         limb                   limb
                    371:
                    372:                                  _mp_d
                    373:       |---- _mp_exp --->           |
                    374:        _____ _____ _____ _____ _____
                    375:       |_____|_____|_____|_____|_____|
                    376:                         . <------------ radix point
                    377:
                    378:        <-------- _mp_size --------->
                    379:
                    380: The fields are as follows.
                    381:
                    382: `_mp_size'
                    383:      The number of limbs currently in use, or the negative of that when
                    384:      representing a negative value.  Zero is represented by `_mp_size'
                    385:      and `_mp_exp' both set to zero, and in that case the `_mp_d' data
                    386:      is unused.  (In the future `_mp_exp' might be undefined when
                    387:      representing zero.)
                    388:
                    389: `_mp_prec'
                    390:      The precision of the mantissa, in limbs.  In any calculation the
                    391:      aim is to produce `_mp_prec' limbs of result (the most significant
                    392:      being non-zero).
                    393:
                    394: `_mp_d'
                    395:      A pointer to the array of limbs which is the absolute value of the
                    396:      mantissa.  These are stored "little endian" as per the `mpn'
                    397:      functions, so `_mp_d[0]' is the least significant limb and
                    398:      `_mp_d[ABS(_mp_size)-1]' the most significant.
                    399:
                    400:      The most significant limb is always non-zero, but there are no
                    401:      other restrictions on its value, in particular the highest 1 bit
                    402:      can be anywhere within the limb.
                    403:
                    404:      `_mp_prec+1' limbs are allocated to `_mp_d', the extra limb being
                    405:      for convenience (see below).  There are no reallocations during a
                    406:      calculation, only in a change of precision with `mpf_set_prec'.
                    407:
                    408: `_mp_exp'
                    409:      The exponent, in limbs, determining the location of the implied
                    410:      radix point.  Zero means the radix point is just above the most
                    411:      significant limb.  Positive values mean a radix point offset
                    412:      towards the lower limbs and hence a value >= 1, as for example in
                    413:      the diagram above.  Negative exponents mean a radix point further
                    414:      above the highest limb.
                    415:
                    416:      Naturally the exponent can be any value, it doesn't have to fall
                    417:      within the limbs as the diagram shows, it can be a long way above
                    418:      or a long way below.  Limbs other than those included in the
                    419:      `{_mp_d,_mp_size}' data are treated as zero.
                    420:
                    421:
                    422: The following various points should be noted.
                    423:
                    424: Low Zeros
                    425:      The least significant limbs `_mp_d[0]' etc can be zero, though
                    426:      such low zeros can always be ignored.  Routines likely to produce
                    427:      low zeros check and avoid them to save time in subsequent
                    428:      calculations, but for most routines they're quite unlikely and
                    429:      aren't checked.
                    430:
                    431: Mantissa Size Range
                    432:      The `_mp_size' count of limbs in use can be less than `_mp_prec' if
                    433:      the value can be represented in less.  This means low precision
                    434:      values or small integers stored in a high precision `mpf_t' can
                    435:      still be operated on efficiently.
                    436:
                    437:      `_mp_size' can also be greater than `_mp_prec'.  Firstly a value is
                    438:      allowed to use all of the `_mp_prec+1' limbs available at `_mp_d',
                    439:      and secondly when `mpf_set_prec_raw' lowers `_mp_prec' it leaves
                    440:      `_mp_size' unchanged and so the size can be arbitrarily bigger than
                    441:      `_mp_prec'.
                    442:
                    443: Rounding
                    444:      All rounding is done on limb boundaries.  Calculating `_mp_prec'
                    445:      limbs with the high non-zero will ensure the application requested
                    446:      minimum precision is obtained.
                    447:
                    448:      The use of simple "trunc" rounding towards zero is efficient,
                    449:      since there's no need to examine extra limbs and increment or
                    450:      decrement.
                    451:
                    452: Bit Shifts
                    453:      Since the exponent is in limbs, there are no bit shifts in basic
                    454:      operations like `mpf_add' and `mpf_mul'.  When differing exponents
                    455:      are encountered all that's needed is to adjust pointers to line up
                    456:      the relevant limbs.
                    457:
                    458:      Of course `mpf_mul_2exp' and `mpf_div_2exp' will require bit
                    459:      shifts, but the choice is between an exponent in limbs which
                    460:      requires shifts there, or one in bits which requires them almost
                    461:      everywhere else.
                    462:
                    463: Use of `_mp_prec+1' Limbs
                    464:      The extra limb on `_mp_d' (`_mp_prec+1' rather than just
                    465:      `_mp_prec') helps when an `mpf' routine might get a carry from its
                    466:      operation.  `mpf_add' for instance will do an `mpn_add' of
                    467:      `_mp_prec' limbs.  If there's no carry then that's the result, but
                    468:      if there is a carry then it's stored in the extra limb of space and
                    469:      `_mp_size' becomes `_mp_prec+1'.
                    470:
                    471:      Whenever `_mp_prec+1' limbs are held in a variable, the low limb
                    472:      is not needed for the intended precision, only the `_mp_prec' high
                    473:      limbs.  But zeroing it out or moving the rest down is unnecessary.
                    474:      Subsequent routines reading the value will simply take the high
                    475:      limbs they need, and this will be `_mp_prec' if their target has
                    476:      that same precision.  This is no more than a pointer adjustment,
                    477:      and must be checked anyway since the destination precision can be
                    478:      different from the sources.
                    479:
                    480:      Copy functions like `mpf_set' will retain a full `_mp_prec+1' limbs
                    481:      if available.  This ensures that a variable which has `_mp_size'
                    482:      equal to `_mp_prec+1' will get its full exact value copied.
                    483:      Strictly speaking this is unnecessary since only `_mp_prec' limbs
                    484:      are needed for the application's requested precision, but it's
                    485:      considered that an `mpf_set' from one variable into another of the
                    486:      same precision ought to produce an exact copy.
                    487:
                    488: Application Precisions
                    489:      `__GMPF_BITS_TO_PREC' converts an application requested precision
                    490:      to an `_mp_prec'.  The value in bits is rounded up to a whole limb
                    491:      then an extra limb is added since the most significant limb of
                    492:      `_mp_d' is only non-zero and therefore might contain only one bit.
                    493:
                    494:      `__GMPF_PREC_TO_BITS' does the reverse conversion, and removes the
                    495:      extra limb from `_mp_prec' before converting to bits.  The net
                    496:      effect of reading back with `mpf_get_prec' is simply the precision
                    497:      rounded up to a multiple of `mp_bits_per_limb'.
                    498:
                    499:      Note that the extra limb added here for the high only being
                    500:      non-zero is in addition to the extra limb allocated to `_mp_d'.
                    501:      For example with a 32-bit limb, an application request for 250
                    502:      bits will be rounded up to 8 limbs, then an extra added for the
                    503:      high being only non-zero, giving an `_mp_prec' of 9.  `_mp_d' then
                    504:      gets 10 limbs allocated.  Reading back with `mpf_get_prec' will
                    505:      take `_mp_prec' subtract 1 limb and multiply by 32, giving 256
                    506:      bits.
                    507:
                    508:      Strictly speaking, the fact the high limb has at least one bit
                    509:      means that a float with, say, 3 limbs of 32-bits each will be
                    510:      holding at least 65 bits, but for the purposes of `mpf_t' it's
                    511:      considered simply to be 64 bits, a nice multiple of the limb size.
                    512:
                    513: 
                    514: File: gmp.info,  Node: Raw Output Internals,  Next: C++ Interface Internals,  Prev: Float Internals,  Up: Internals
                    515:
                    516: Raw Output Internals
                    517: ====================
                    518:
                    519: `mpz_out_raw' uses the following format.
                    520:
                    521:      +------+------------------------+
                    522:      | size |       data bytes       |
                    523:      +------+------------------------+
                    524:
                    525:    The size is 4 bytes written most significant byte first, being the
                    526: number of subsequent data bytes, or the twos complement negative of
                    527: that when a negative integer is represented.  The data bytes are the
                    528: absolute value of the integer, written most significant byte first.
                    529:
                    530:    The most significant data byte is always non-zero, so the output is
                    531: the same on all systems, irrespective of limb size.
                    532:
                    533:    In GMP 1, leading zero bytes were written to pad the data bytes to a
                    534: multiple of the limb size.  `mpz_inp_raw' will still accept this, for
                    535: compatibility.
                    536:
                    537:    The use of "big endian" for both the size and data fields is
                    538: deliberate, it makes the data easy to read in a hex dump of a file.
                    539: Unfortunately it also means that the limb data must be reversed when
                    540: reading or writing, so neither a big endian nor little endian system
                    541: can just read and write `_mp_d'.
                    542:
                    543: 
                    544: File: gmp.info,  Node: C++ Interface Internals,  Prev: Raw Output Internals,  Up: Internals
                    545:
                    546: C++ Interface Internals
                    547: =======================
                    548:
                    549:    A system of expression templates is used to ensure something like
                    550: `a=b+c' turns into a simple call to `mpz_add' etc.  For `mpf_class' and
                    551: `mpfr_class' the scheme also ensures the precision of the final
                    552: destination is used for any temporaries within a statement like
                    553: `f=w*x+y*z'.  These are important features which a naive implementation
                    554: cannot provide.
                    555:
                    556:    A simplified description of the scheme follows.  The true scheme is
                    557: complicated by the fact that expressions have different return types.
                    558: For detailed information, refer to the source code.
                    559:
                    560:    To perform an operation, say, addition, we first define a "function
                    561: object" evaluating it,
                    562:
                    563:      struct __gmp_binary_plus
                    564:      {
                    565:        static void eval(mpf_t f, mpf_t g, mpf_t h) { mpf_add(f, g, h); }
                    566:      };
                    567:
                    568: And an "additive expression" object,
                    569:
                    570:      __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
                    571:      operator+(const mpf_class &f, const mpf_class &g)
                    572:      {
                    573:        return __gmp_expr
                    574:          <__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
                    575:      }
                    576:
                    577:    The seemingly redundant `__gmp_expr<__gmp_binary_expr<...>>' is used
                    578: to encapsulate any possible kind of expression into a single template
                    579: type.  In fact even `mpf_class' etc are `typedef' specializations of
                    580: `__gmp_expr'.
                    581:
                    582:    Next we define assignment of `__gmp_expr' to `mpf_class'.
                    583:
                    584:      template <class T>
                    585:      mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
                    586:      {
                    587:        expr.eval(this->get_mpf_t(), this->precision());
                    588:        return *this;
                    589:      }
                    590:
                    591:      template <class Op>
                    592:      void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
                    593:      (mpf_t f, unsigned long int precision)
                    594:      {
                    595:        Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
                    596:      }
                    597:
                    598:    where `expr.val1' and `expr.val2' are references to the expression's
                    599: operands (here `expr' is the `__gmp_binary_expr' stored within the
                    600: `__gmp_expr').
                    601:
                    602:    This way, the expression is actually evaluated only at the time of
                    603: assignment, when the required precision (that of `f') is known.
                    604: Furthermore the target `mpf_t' is now available, thus we can call
                    605: `mpf_add' directly with `f' as the output argument.
                    606:
                    607:    Compound expressions are handled by defining operators taking
                    608: subexpressions as their arguments, like this:
                    609:
                    610:      template <class T, class U>
                    611:      __gmp_expr
                    612:      <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
                    613:      operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
                    614:      {
                    615:        return __gmp_expr
                    616:          <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
                    617:          (expr1, expr2);
                    618:      }
                    619:
                    620:    And the corresponding specializations of `__gmp_expr::eval':
                    621:
                    622:      template <class T, class U, class Op>
                    623:      void __gmp_expr
                    624:      <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
                    625:      (mpf_t f, unsigned long int precision)
                    626:      {
                    627:        // declare two temporaries
                    628:        mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
                    629:        Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
                    630:      }
                    631:
                    632:    The expression is thus recursively evaluated to any level of
                    633: complexity and all subexpressions are evaluated to the precision of `f'.
                    634:
                    635: 
                    636: File: gmp.info,  Node: Contributors,  Next: References,  Prev: Internals,  Up: Top
                    637:
                    638: Contributors
                    639: ************
                    640:
                    641:    Torbjorn Granlund wrote the original GMP library and is still
                    642: developing and maintaining it.  Several other individuals and
                    643: organizations have contributed to GMP in various ways.  Here is a list
                    644: in chronological order:
                    645:
                    646:    Gunnar Sjoedin and Hans Riesel helped with mathematical problems in
                    647: early versions of the library.
                    648:
                    649:    Richard Stallman contributed to the interface design and revised the
                    650: first version of this manual.
                    651:
                    652:    Brian Beuning and Doug Lea helped with testing of early versions of
                    653: the library and made creative suggestions.
                    654:
                    655:    John Amanatides of York University in Canada contributed the function
                    656: `mpz_probab_prime_p'.
                    657:
                    658:    Paul Zimmermann of Inria sparked the development of GMP 2, with his
                    659: comparisons between bignum packages.
                    660:
                    661:    Ken Weber (Kent State University, Universidade Federal do Rio Grande
                    662: do Sul) contributed `mpz_gcd', `mpz_divexact', `mpn_gcd', and
                    663: `mpn_bdivmod', partially supported by CNPq (Brazil) grant 301314194-2.
                    664:
                    665:    Per Bothner of Cygnus Support helped to set up GMP to use Cygnus'
                    666: configure.  He has also made valuable suggestions and tested numerous
                    667: intermediary releases.
                    668:
                    669:    Joachim Hollman was involved in the design of the `mpf' interface,
                    670: and in the `mpz' design revisions for version 2.
                    671:
                    672:    Bennet Yee contributed the initial versions of `mpz_jacobi' and
                    673: `mpz_legendre'.
                    674:
                    675:    Andreas Schwab contributed the files `mpn/m68k/lshift.S' and
                    676: `mpn/m68k/rshift.S' (now in `.asm' form).
                    677:
                    678:    The development of floating point functions of GNU MP 2, were
                    679: supported in part by the ESPRIT-BRA (Basic Research Activities) 6846
                    680: project POSSO (POlynomial System SOlving).
                    681:
                    682:    GNU MP 2 was finished and released by SWOX AB, SWEDEN, in
                    683: cooperation with the IDA Center for Computing Sciences, USA.
                    684:
                    685:    Robert Harley of Inria, France and David Seal of ARM, England,
                    686: suggested clever improvements for population count.
                    687:
                    688:    Robert Harley also wrote highly optimized Karatsuba and 3-way Toom
                    689: multiplication functions for GMP 3.  He also contributed the ARM
                    690: assembly code.
                    691:
                    692:    Torsten Ekedahl of the Mathematical department of Stockholm
                    693: University provided significant inspiration during several phases of
                    694: the GMP development.  His mathematical expertise helped improve several
                    695: algorithms.
                    696:
                    697:    Paul Zimmermann wrote the Divide and Conquer division code, the REDC
                    698: code, the REDC-based mpz_powm code, the FFT multiply code, and the
                    699: Karatsuba square root.  The ECMNET project Paul is organizing was a
                    700: driving force behind many of the optimizations in GMP 3.
                    701:
                    702:    Linus Nordberg wrote the new configure system based on autoconf and
                    703: implemented the new random functions.
                    704:
                    705:    Kent Boortz made the Macintosh port.
                    706:
                    707:    Kevin Ryde worked on a number of things: optimized x86 code, m4 asm
                    708: macros, parameter tuning, speed measuring, the configure system,
                    709: function inlining, divisibility tests, bit scanning, Jacobi symbols,
                    710: Fibonacci and Lucas number functions, printf and scanf functions, perl
                    711: interface, demo expression parser, the algorithms chapter in the
                    712: manual, `gmpasm-mode.el', and various miscellaneous improvements
                    713: elsewhere.
                    714:
                    715:    Steve Root helped write the optimized alpha 21264 assembly code.
                    716:
                    717:    Gerardo Ballabio wrote the `gmpxx.h' C++ class interface and the C++
                    718: `istream' input routines.
                    719:
                    720:    GNU MP 4.0 was finished and released by Torbjorn Granlund and Kevin
                    721: Ryde.  Torbjorn's work was partially funded by the IDA Center for
                    722: Computing Sciences, USA.
                    723:
                    724:    (This list is chronological, not ordered after significance.  If you
                    725: have contributed to GMP but are not listed above, please tell
                    726: <tege@swox.com> about the omission!)
                    727:
                    728:    Thanks goes to Hans Thorsen for donating an SGI system for the GMP
                    729: test system environment.
                    730:
                    731: 
                    732: File: gmp.info,  Node: References,  Next: GNU Free Documentation License,  Prev: Contributors,  Up: Top
                    733:
                    734: References
                    735: **********
                    736:
                    737: Books
                    738: =====
                    739:
                    740:    * Jonathan M. Borwein and Peter B. Borwein, "Pi and the AGM: A Study
                    741:      in Analytic Number Theory and Computational Complexity", Wiley,
                    742:      John & Sons, 1998.
                    743:
                    744:    * Henri Cohen, "A Course in Computational Algebraic Number Theory",
                    745:      Graduate Texts in Mathematics number 138, Springer-Verlag, 1993.
                    746:      `http://www.math.u-bordeaux.fr/~cohen'
                    747:
                    748:    * Donald E. Knuth, "The Art of Computer Programming", volume 2,
                    749:      "Seminumerical Algorithms", 3rd edition, Addison-Wesley, 1998.
                    750:      `http://www-cs-faculty.stanford.edu/~knuth/taocp.html'
                    751:
                    752:    * John D. Lipson, "Elements of Algebra and Algebraic Computing", The
                    753:      Benjamin Cummings Publishing Company Inc, 1981.
                    754:
                    755:    * Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone,
                    756:      "Handbook of Applied Cryptography",
                    757:      `http://www.cacr.math.uwaterloo.ca/hac/'
                    758:
                    759:    * Richard M. Stallman, "Using and Porting GCC", Free Software
                    760:      Foundation, 1999, available online
                    761:      `http://www.gnu.org/software/gcc/onlinedocs/', and in the GCC
                    762:      package `ftp://ftp.gnu.org/gnu/gcc/'
                    763:
                    764: Papers
                    765: ======
                    766:
                    767:    * Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division",
                    768:      Max-Planck-Institut fuer Informatik Research Report
                    769:      MPI-I-98-1-022,
                    770:      `http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022'
                    771:
                    772:    * Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
                    773:      Integers using Multiplication", in Proceedings of the SIGPLAN
                    774:      PLDI'94 Conference, June 1994.  Also available
                    775:      `ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz' (and .psl.gz).
                    776:
                    777:    * Peter L. Montgomery, "Modular Multiplication Without Trial
                    778:      Division", in Mathematics of Computation, volume 44, number 170,
                    779:      April 1985.
                    780:
                    781:    * Tudor Jebelean, "An algorithm for exact division", Journal of
                    782:      Symbolic Computation, volume 15, 1993, pp. 169-180.  Research
                    783:      report version available
                    784:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz'
                    785:
                    786:    * Tudor Jebelean, "Exact Division with Karatsuba Complexity -
                    787:      Extended Abstract", RISC-Linz technical report 96-31,
                    788:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz'
                    789:
                    790:    * Tudor Jebelean, "Practical Integer Division with Karatsuba
                    791:      Complexity", ISSAC 97, pp. 339-341.  Technical report available
                    792:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz'
                    793:
                    794:    * Tudor Jebelean, "A Generalization of the Binary GCD Algorithm",
                    795:      ISSAC 93, pp. 111-116.  Technical report version available
                    796:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz'
                    797:
                    798:    * Tudor Jebelean, "A Double-Digit Lehmer-Euclid Algorithm for
                    799:      Finding the GCD of Long Integers", Journal of Symbolic
                    800:      Computation, volume 19, 1995, pp. 145-157.  Technical report
                    801:      version also available
                    802:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz'
                    803:
                    804:    * Werner Krandick and Tudor Jebelean, "Bidirectional Exact Integer
                    805:      Division", Journal of Symbolic Computation, volume 21, 1996, pp.
                    806:      441-455.  Early technical report version also available
                    807:      `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz'
                    808:
                    809:    * R. Moenck and A. Borodin, "Fast Modular Transforms via Division",
                    810:      Proceedings of the 13th Annual IEEE Symposium on Switching and
                    811:      Automata Theory, October 1972, pp. 90-96.  Reprinted as "Fast
                    812:      Modular Transforms", Journal of Computer and System Sciences,
                    813:      volume 8, number 3, June 1974, pp. 366-386.
                    814:
                    815:    * Arnold Scho"nhage and Volker Strassen, "Schnelle Multiplikation
                    816:      grosser Zahlen", Computing 7, 1971, pp. 281-292.
                    817:
                    818:    * Kenneth Weber, "The accelerated integer GCD algorithm", ACM
                    819:      Transactions on Mathematical Software, volume 21, number 1, March
                    820:      1995, pp. 111-122.
                    821:
                    822:    * Paul Zimmermann, "Karatsuba Square Root", INRIA Research Report
                    823:      3805, November 1999, `http://www.inria.fr/RRRT/RR-3805.html'
                    824:
                    825:    * Paul Zimmermann, "A Proof of GMP Fast Division and Square Root
                    826:      Implementations",
                    827:      `http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz'
                    828:
                    829:    * Dan Zuras, "On Squaring and Multiplying Large Integers", ARITH-11:
                    830:      IEEE Symposium on Computer Arithmetic, 1993, pp. 260 to 271.
                    831:      Reprinted as "More on Multiplying and Squaring Large Integers",
                    832:      IEEE Transactions on Computers, volume 43, number 8, August 1994,
                    833:      pp. 899-908.
                    834:

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>