Annotation of OpenXM_contrib/gmp/gmp.info-7, Revision 1.1
1.1 ! ohara 1: This is gmp.info, produced by makeinfo version 4.2 from gmp.texi.
! 2:
! 3: This manual describes how to install and use the GNU multiple precision
! 4: arithmetic library, version 4.1.2.
! 5:
! 6: Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
! 7: 2001, 2002 Free Software Foundation, Inc.
! 8:
! 9: Permission is granted to copy, distribute and/or modify this
! 10: document under the terms of the GNU Free Documentation License, Version
! 11: 1.1 or any later version published by the Free Software Foundation;
! 12: with no Invariant Sections, with the Front-Cover Texts being "A GNU
! 13: Manual", and with the Back-Cover Texts being "You have freedom to copy
! 14: and modify this GNU Manual, like GNU software". A copy of the license
! 15: is included in *Note GNU Free Documentation License::.
! 16: INFO-DIR-SECTION GNU libraries
! 17: START-INFO-DIR-ENTRY
! 18: * gmp: (gmp). GNU Multiple Precision Arithmetic Library.
! 19: END-INFO-DIR-ENTRY
! 20:
! 21:
! 22: File: gmp.info, Node: Assembler Cache Handling, Next: Assembler Floating Point, Prev: Assembler Carry Propagation, Up: Assembler Coding
! 23:
! 24: Cache Handling
! 25: --------------
! 26:
! 27: GMP aims to perform well both on operands that fit entirely in L1
! 28: cache and those which don't.
! 29:
! 30: Basic routines like `mpn_add_n' or `mpn_lshift' are often used on
! 31: large operands, so L2 and main memory performance is important for them.
! 32: `mpn_mul_1' and `mpn_addmul_1' are mostly used for multiply and square
! 33: basecases, so L1 performance matters most for them, unless assembler
! 34: versions of `mpn_mul_basecase' and `mpn_sqr_basecase' exist, in which
! 35: case the remaining uses are mostly for larger operands.
! 36:
! 37: For L2 or main memory operands, memory access times will almost
! 38: certainly be more than the calculation time. The aim therefore is to
! 39: maximize memory throughput, by starting a load of the next cache line
! 40: which processing the contents of the previous one. Clearly this is
! 41: only possible if the chip has a lock-up free cache or some sort of
! 42: prefetch instruction. Most current chips have both these features.
! 43:
! 44: Prefetching sources combines well with loop unrolling, since a
! 45: prefetch can be initiated once per unrolled loop (or more than once if
! 46: the loop covers more than one cache line).
! 47:
! 48: On CPUs without write-allocate caches, prefetching destinations will
! 49: ensure individual stores don't go further down the cache hierarchy,
! 50: limiting bandwidth. Of course for calculations which are slow anyway,
! 51: like `mpn_divrem_1', write-throughs might be fine.
! 52:
! 53: The distance ahead to prefetch will be determined by memory latency
! 54: versus throughput. The aim of course is to have data arriving
! 55: continuously, at peak throughput. Some CPUs have limits on the number
! 56: of fetches or prefetches in progress.
! 57:
! 58: If a special prefetch instruction doesn't exist then a plain load
! 59: can be used, but in that case care must be taken not to attempt to read
! 60: past the end of an operand, since that might produce a segmentation
! 61: violation.
! 62:
! 63: Some CPUs or systems have hardware that detects sequential memory
! 64: accesses and initiates suitable cache movements automatically, making
! 65: life easy.
! 66:
! 67:
! 68: File: gmp.info, Node: Assembler Floating Point, Next: Assembler SIMD Instructions, Prev: Assembler Cache Handling, Up: Assembler Coding
! 69:
! 70: Floating Point
! 71: --------------
! 72:
! 73: Floating point arithmetic is used in GMP for multiplications on CPUs
! 74: with poor integer multipliers. It's mostly useful for `mpn_mul_1',
! 75: `mpn_addmul_1' and `mpn_submul_1' on 64-bit machines, and
! 76: `mpn_mul_basecase' on both 32-bit and 64-bit machines.
! 77:
! 78: With IEEE 53-bit double precision floats, integer multiplications
! 79: producing up to 53 bits will give exact results. Breaking a 64x64
! 80: multiplication into eight 16x32->48 bit pieces is convenient. With
! 81: some care though six 21x32->53 bit products can be used, if one of the
! 82: lower two 21-bit pieces also uses the sign bit.
! 83:
! 84: For the `mpn_mul_1' family of functions on a 64-bit machine, the
! 85: invariant single limb is split at the start, into 3 or 4 pieces.
! 86: Inside the loop, the bignum operand is split into 32-bit pieces. Fast
! 87: conversion of these unsigned 32-bit pieces to floating point is highly
! 88: machine-dependent. In some cases, reading the data into the integer
! 89: unit, zero-extending to 64-bits, then transferring to the floating
! 90: point unit back via memory is the only option.
! 91:
! 92: Converting partial products back to 64-bit limbs is usually best
! 93: done as a signed conversion. Since all values are smaller than 2^53,
! 94: signed and unsigned are the same, but most processors lack unsigned
! 95: conversions.
! 96:
! 97:
! 98:
! 99: Here is a diagram showing 16x32 bit products for an `mpn_mul_1' or
! 100: `mpn_addmul_1' with a 64-bit limb. The single limb operand V is split
! 101: into four 16-bit parts. The multi-limb operand U is split in the loop
! 102: into two 32-bit parts.
! 103:
! 104: +---+---+---+---+
! 105: |v48|v32|v16|v00| V operand
! 106: +---+---+---+---+
! 107:
! 108: +-------+---+---+
! 109: x | u32 | u00 | U operand (one limb)
! 110: +---------------+
! 111:
! 112: ---------------------------------
! 113:
! 114: +-----------+
! 115: | u00 x v00 | p00 48-bit products
! 116: +-----------+
! 117: +-----------+
! 118: | u00 x v16 | p16
! 119: +-----------+
! 120: +-----------+
! 121: | u00 x v32 | p32
! 122: +-----------+
! 123: +-----------+
! 124: | u00 x v48 | p48
! 125: +-----------+
! 126: +-----------+
! 127: | u32 x v00 | r32
! 128: +-----------+
! 129: +-----------+
! 130: | u32 x v16 | r48
! 131: +-----------+
! 132: +-----------+
! 133: | u32 x v32 | r64
! 134: +-----------+
! 135: +-----------+
! 136: | u32 x v48 | r80
! 137: +-----------+
! 138:
! 139: p32 and r32 can be summed using floating-point addition, and
! 140: likewise p48 and r48. p00 and p16 can be summed with r64 and r80 from
! 141: the previous iteration.
! 142:
! 143: For each loop then, four 49-bit quantities are transfered to the
! 144: integer unit, aligned as follows,
! 145:
! 146: |-----64bits----|-----64bits----|
! 147: +------------+
! 148: | p00 + r64' | i00
! 149: +------------+
! 150: +------------+
! 151: | p16 + r80' | i16
! 152: +------------+
! 153: +------------+
! 154: | p32 + r32 | i32
! 155: +------------+
! 156: +------------+
! 157: | p48 + r48 | i48
! 158: +------------+
! 159:
! 160: The challenge then is to sum these efficiently and add in a carry
! 161: limb, generating a low 64-bit result limb and a high 33-bit carry limb
! 162: (i48 extends 33 bits into the high half).
! 163:
! 164:
! 165: File: gmp.info, Node: Assembler SIMD Instructions, Next: Assembler Software Pipelining, Prev: Assembler Floating Point, Up: Assembler Coding
! 166:
! 167: SIMD Instructions
! 168: -----------------
! 169:
! 170: The single-instruction multiple-data support in current
! 171: microprocessors is aimed at signal processing algorithms where each
! 172: data point can be treated more or less independently. There's
! 173: generally not much support for propagating the sort of carries that
! 174: arise in GMP.
! 175:
! 176: SIMD multiplications of say four 16x16 bit multiplies only do as much
! 177: work as one 32x32 from GMP's point of view, and need some shifts and
! 178: adds besides. But of course if say the SIMD form is fully pipelined
! 179: and uses less instruction decoding then it may still be worthwhile.
! 180:
! 181: On the 80x86 chips, MMX has so far found a use in `mpn_rshift' and
! 182: `mpn_lshift' since it allows 64-bit operations, and is used in a special
! 183: case for 16-bit multipliers in the P55 `mpn_mul_1'. 3DNow and SSE
! 184: haven't found a use so far.
! 185:
! 186:
! 187: File: gmp.info, Node: Assembler Software Pipelining, Next: Assembler Loop Unrolling, Prev: Assembler SIMD Instructions, Up: Assembler Coding
! 188:
! 189: Software Pipelining
! 190: -------------------
! 191:
! 192: Software pipelining consists of scheduling instructions around the
! 193: branch point in a loop. For example a loop taking a checksum of an
! 194: array of limbs might have a load and an add, but the load wouldn't be
! 195: for that add, rather for the one next time around the loop. Each load
! 196: then is effectively scheduled back in the previous iteration, allowing
! 197: latency to be hidden.
! 198:
! 199: Naturally this is wanted only when doing things like loads or
! 200: multiplies that take a few cycles to complete, and only where a CPU has
! 201: multiple functional units so that other work can be done while waiting.
! 202:
! 203: A pipeline with several stages will have a data value in progress at
! 204: each stage and each loop iteration moves them along one stage. This is
! 205: like juggling.
! 206:
! 207: Within the loop some moves between registers may be necessary to
! 208: have the right values in the right places for each iteration. Loop
! 209: unrolling can help this, with each unrolled block able to use different
! 210: registers for different values, even if some shuffling is still needed
! 211: just before going back to the top of the loop.
! 212:
! 213:
! 214: File: gmp.info, Node: Assembler Loop Unrolling, Prev: Assembler Software Pipelining, Up: Assembler Coding
! 215:
! 216: Loop Unrolling
! 217: --------------
! 218:
! 219: Loop unrolling consists of replicating code so that several limbs are
! 220: processed in each loop. At a minimum this reduces loop overheads by a
! 221: corresponding factor, but it can also allow better register usage, for
! 222: example alternately using one register combination and then another.
! 223: Judicious use of `m4' macros can help avoid lots of duplication in the
! 224: source code.
! 225:
! 226: Unrolling is commonly done to a power of 2 multiple so the number of
! 227: unrolled loops and the number of remaining limbs can be calculated with
! 228: a shift and mask. But other multiples can be used too, just by
! 229: subtracting each N limbs processed from a counter and waiting for less
! 230: than N remaining (or offsetting the counter by N so it goes negative
! 231: when there's less than N remaining).
! 232:
! 233: The limbs not a multiple of the unrolling can be handled in various
! 234: ways, for example
! 235:
! 236: * A simple loop at the end (or the start) to process the excess.
! 237: Care will be wanted that it isn't too much slower than the
! 238: unrolled part.
! 239:
! 240: * A set of binary tests, for example after an 8-limb unrolling, test
! 241: for 4 more limbs to process, then a further 2 more or not, and
! 242: finally 1 more or not. This will probably take more code space
! 243: than a simple loop.
! 244:
! 245: * A `switch' statement, providing separate code for each possible
! 246: excess, for example an 8-limb unrolling would have separate code
! 247: for 0 remaining, 1 remaining, etc, up to 7 remaining. This might
! 248: take a lot of code, but may be the best way to optimize all cases
! 249: in combination with a deep pipelined loop.
! 250:
! 251: * A computed jump into the middle of the loop, thus making the first
! 252: iteration handle the excess. This should make times smoothly
! 253: increase with size, which is attractive, but setups for the jump
! 254: and adjustments for pointers can be tricky and could become quite
! 255: difficult in combination with deep pipelining.
! 256:
! 257: One way to write the setups and finishups for a pipelined unrolled
! 258: loop is simply to duplicate the loop at the start and the end, then
! 259: delete instructions at the start which have no valid antecedents, and
! 260: delete instructions at the end whose results are unwanted. Sizes not a
! 261: multiple of the unrolling can then be handled as desired.
! 262:
! 263:
! 264: File: gmp.info, Node: Internals, Next: Contributors, Prev: Algorithms, Up: Top
! 265:
! 266: Internals
! 267: *********
! 268:
! 269: *This chapter is provided only for informational purposes and the
! 270: various internals described here may change in future GMP releases.
! 271: Applications expecting to be compatible with future releases should use
! 272: only the documented interfaces described in previous chapters.*
! 273:
! 274: * Menu:
! 275:
! 276: * Integer Internals::
! 277: * Rational Internals::
! 278: * Float Internals::
! 279: * Raw Output Internals::
! 280: * C++ Interface Internals::
! 281:
! 282:
! 283: File: gmp.info, Node: Integer Internals, Next: Rational Internals, Prev: Internals, Up: Internals
! 284:
! 285: Integer Internals
! 286: =================
! 287:
! 288: `mpz_t' variables represent integers using sign and magnitude, in
! 289: space dynamically allocated and reallocated. The fields are as follows.
! 290:
! 291: `_mp_size'
! 292: The number of limbs, or the negative of that when representing a
! 293: negative integer. Zero is represented by `_mp_size' set to zero,
! 294: in which case the `_mp_d' data is unused.
! 295:
! 296: `_mp_d'
! 297: A pointer to an array of limbs which is the magnitude. These are
! 298: stored "little endian" as per the `mpn' functions, so `_mp_d[0]'
! 299: is the least significant limb and `_mp_d[ABS(_mp_size)-1]' is the
! 300: most significant. Whenever `_mp_size' is non-zero, the most
! 301: significant limb is non-zero.
! 302:
! 303: Currently there's always at least one limb allocated, so for
! 304: instance `mpz_set_ui' never needs to reallocate, and `mpz_get_ui'
! 305: can fetch `_mp_d[0]' unconditionally (though its value is then
! 306: only wanted if `_mp_size' is non-zero).
! 307:
! 308: `_mp_alloc'
! 309: `_mp_alloc' is the number of limbs currently allocated at `_mp_d',
! 310: and naturally `_mp_alloc >= ABS(_mp_size)'. When an `mpz' routine
! 311: is about to (or might be about to) increase `_mp_size', it checks
! 312: `_mp_alloc' to see whether there's enough space, and reallocates
! 313: if not. `MPZ_REALLOC' is generally used for this.
! 314:
! 315: The various bitwise logical functions like `mpz_and' behave as if
! 316: negative values were twos complement. But sign and magnitude is always
! 317: used internally, and necessary adjustments are made during the
! 318: calculations. Sometimes this isn't pretty, but sign and magnitude are
! 319: best for other routines.
! 320:
! 321: Some internal temporary variables are setup with `MPZ_TMP_INIT' and
! 322: these have `_mp_d' space obtained from `TMP_ALLOC' rather than the
! 323: memory allocation functions. Care is taken to ensure that these are
! 324: big enough that no reallocation is necessary (since it would have
! 325: unpredictable consequences).
! 326:
! 327:
! 328: File: gmp.info, Node: Rational Internals, Next: Float Internals, Prev: Integer Internals, Up: Internals
! 329:
! 330: Rational Internals
! 331: ==================
! 332:
! 333: `mpq_t' variables represent rationals using an `mpz_t' numerator and
! 334: denominator (*note Integer Internals::).
! 335:
! 336: The canonical form adopted is denominator positive (and non-zero),
! 337: no common factors between numerator and denominator, and zero uniquely
! 338: represented as 0/1.
! 339:
! 340: It's believed that casting out common factors at each stage of a
! 341: calculation is best in general. A GCD is an O(N^2) operation so it's
! 342: better to do a few small ones immediately than to delay and have to do
! 343: a big one later. Knowing the numerator and denominator have no common
! 344: factors can be used for example in `mpq_mul' to make only two cross
! 345: GCDs necessary, not four.
! 346:
! 347: This general approach to common factors is badly sub-optimal in the
! 348: presence of simple factorizations or little prospect for cancellation,
! 349: but GMP has no way to know when this will occur. As per *Note
! 350: Efficiency::, that's left to applications. The `mpq_t' framework might
! 351: still suit, with `mpq_numref' and `mpq_denref' for direct access to the
! 352: numerator and denominator, or of course `mpz_t' variables can be used
! 353: directly.
! 354:
! 355:
! 356: File: gmp.info, Node: Float Internals, Next: Raw Output Internals, Prev: Rational Internals, Up: Internals
! 357:
! 358: Float Internals
! 359: ===============
! 360:
! 361: Efficient calculation is the primary aim of GMP floats and the use
! 362: of whole limbs and simple rounding facilitates this.
! 363:
! 364: `mpf_t' floats have a variable precision mantissa and a single
! 365: machine word signed exponent. The mantissa is represented using sign
! 366: and magnitude.
! 367:
! 368: most least
! 369: significant significant
! 370: limb limb
! 371:
! 372: _mp_d
! 373: |---- _mp_exp ---> |
! 374: _____ _____ _____ _____ _____
! 375: |_____|_____|_____|_____|_____|
! 376: . <------------ radix point
! 377:
! 378: <-------- _mp_size --------->
! 379:
! 380: The fields are as follows.
! 381:
! 382: `_mp_size'
! 383: The number of limbs currently in use, or the negative of that when
! 384: representing a negative value. Zero is represented by `_mp_size'
! 385: and `_mp_exp' both set to zero, and in that case the `_mp_d' data
! 386: is unused. (In the future `_mp_exp' might be undefined when
! 387: representing zero.)
! 388:
! 389: `_mp_prec'
! 390: The precision of the mantissa, in limbs. In any calculation the
! 391: aim is to produce `_mp_prec' limbs of result (the most significant
! 392: being non-zero).
! 393:
! 394: `_mp_d'
! 395: A pointer to the array of limbs which is the absolute value of the
! 396: mantissa. These are stored "little endian" as per the `mpn'
! 397: functions, so `_mp_d[0]' is the least significant limb and
! 398: `_mp_d[ABS(_mp_size)-1]' the most significant.
! 399:
! 400: The most significant limb is always non-zero, but there are no
! 401: other restrictions on its value, in particular the highest 1 bit
! 402: can be anywhere within the limb.
! 403:
! 404: `_mp_prec+1' limbs are allocated to `_mp_d', the extra limb being
! 405: for convenience (see below). There are no reallocations during a
! 406: calculation, only in a change of precision with `mpf_set_prec'.
! 407:
! 408: `_mp_exp'
! 409: The exponent, in limbs, determining the location of the implied
! 410: radix point. Zero means the radix point is just above the most
! 411: significant limb. Positive values mean a radix point offset
! 412: towards the lower limbs and hence a value >= 1, as for example in
! 413: the diagram above. Negative exponents mean a radix point further
! 414: above the highest limb.
! 415:
! 416: Naturally the exponent can be any value, it doesn't have to fall
! 417: within the limbs as the diagram shows, it can be a long way above
! 418: or a long way below. Limbs other than those included in the
! 419: `{_mp_d,_mp_size}' data are treated as zero.
! 420:
! 421:
! 422: The following various points should be noted.
! 423:
! 424: Low Zeros
! 425: The least significant limbs `_mp_d[0]' etc can be zero, though
! 426: such low zeros can always be ignored. Routines likely to produce
! 427: low zeros check and avoid them to save time in subsequent
! 428: calculations, but for most routines they're quite unlikely and
! 429: aren't checked.
! 430:
! 431: Mantissa Size Range
! 432: The `_mp_size' count of limbs in use can be less than `_mp_prec' if
! 433: the value can be represented in less. This means low precision
! 434: values or small integers stored in a high precision `mpf_t' can
! 435: still be operated on efficiently.
! 436:
! 437: `_mp_size' can also be greater than `_mp_prec'. Firstly a value is
! 438: allowed to use all of the `_mp_prec+1' limbs available at `_mp_d',
! 439: and secondly when `mpf_set_prec_raw' lowers `_mp_prec' it leaves
! 440: `_mp_size' unchanged and so the size can be arbitrarily bigger than
! 441: `_mp_prec'.
! 442:
! 443: Rounding
! 444: All rounding is done on limb boundaries. Calculating `_mp_prec'
! 445: limbs with the high non-zero will ensure the application requested
! 446: minimum precision is obtained.
! 447:
! 448: The use of simple "trunc" rounding towards zero is efficient,
! 449: since there's no need to examine extra limbs and increment or
! 450: decrement.
! 451:
! 452: Bit Shifts
! 453: Since the exponent is in limbs, there are no bit shifts in basic
! 454: operations like `mpf_add' and `mpf_mul'. When differing exponents
! 455: are encountered all that's needed is to adjust pointers to line up
! 456: the relevant limbs.
! 457:
! 458: Of course `mpf_mul_2exp' and `mpf_div_2exp' will require bit
! 459: shifts, but the choice is between an exponent in limbs which
! 460: requires shifts there, or one in bits which requires them almost
! 461: everywhere else.
! 462:
! 463: Use of `_mp_prec+1' Limbs
! 464: The extra limb on `_mp_d' (`_mp_prec+1' rather than just
! 465: `_mp_prec') helps when an `mpf' routine might get a carry from its
! 466: operation. `mpf_add' for instance will do an `mpn_add' of
! 467: `_mp_prec' limbs. If there's no carry then that's the result, but
! 468: if there is a carry then it's stored in the extra limb of space and
! 469: `_mp_size' becomes `_mp_prec+1'.
! 470:
! 471: Whenever `_mp_prec+1' limbs are held in a variable, the low limb
! 472: is not needed for the intended precision, only the `_mp_prec' high
! 473: limbs. But zeroing it out or moving the rest down is unnecessary.
! 474: Subsequent routines reading the value will simply take the high
! 475: limbs they need, and this will be `_mp_prec' if their target has
! 476: that same precision. This is no more than a pointer adjustment,
! 477: and must be checked anyway since the destination precision can be
! 478: different from the sources.
! 479:
! 480: Copy functions like `mpf_set' will retain a full `_mp_prec+1' limbs
! 481: if available. This ensures that a variable which has `_mp_size'
! 482: equal to `_mp_prec+1' will get its full exact value copied.
! 483: Strictly speaking this is unnecessary since only `_mp_prec' limbs
! 484: are needed for the application's requested precision, but it's
! 485: considered that an `mpf_set' from one variable into another of the
! 486: same precision ought to produce an exact copy.
! 487:
! 488: Application Precisions
! 489: `__GMPF_BITS_TO_PREC' converts an application requested precision
! 490: to an `_mp_prec'. The value in bits is rounded up to a whole limb
! 491: then an extra limb is added since the most significant limb of
! 492: `_mp_d' is only non-zero and therefore might contain only one bit.
! 493:
! 494: `__GMPF_PREC_TO_BITS' does the reverse conversion, and removes the
! 495: extra limb from `_mp_prec' before converting to bits. The net
! 496: effect of reading back with `mpf_get_prec' is simply the precision
! 497: rounded up to a multiple of `mp_bits_per_limb'.
! 498:
! 499: Note that the extra limb added here for the high only being
! 500: non-zero is in addition to the extra limb allocated to `_mp_d'.
! 501: For example with a 32-bit limb, an application request for 250
! 502: bits will be rounded up to 8 limbs, then an extra added for the
! 503: high being only non-zero, giving an `_mp_prec' of 9. `_mp_d' then
! 504: gets 10 limbs allocated. Reading back with `mpf_get_prec' will
! 505: take `_mp_prec' subtract 1 limb and multiply by 32, giving 256
! 506: bits.
! 507:
! 508: Strictly speaking, the fact the high limb has at least one bit
! 509: means that a float with, say, 3 limbs of 32-bits each will be
! 510: holding at least 65 bits, but for the purposes of `mpf_t' it's
! 511: considered simply to be 64 bits, a nice multiple of the limb size.
! 512:
! 513:
! 514: File: gmp.info, Node: Raw Output Internals, Next: C++ Interface Internals, Prev: Float Internals, Up: Internals
! 515:
! 516: Raw Output Internals
! 517: ====================
! 518:
! 519: `mpz_out_raw' uses the following format.
! 520:
! 521: +------+------------------------+
! 522: | size | data bytes |
! 523: +------+------------------------+
! 524:
! 525: The size is 4 bytes written most significant byte first, being the
! 526: number of subsequent data bytes, or the twos complement negative of
! 527: that when a negative integer is represented. The data bytes are the
! 528: absolute value of the integer, written most significant byte first.
! 529:
! 530: The most significant data byte is always non-zero, so the output is
! 531: the same on all systems, irrespective of limb size.
! 532:
! 533: In GMP 1, leading zero bytes were written to pad the data bytes to a
! 534: multiple of the limb size. `mpz_inp_raw' will still accept this, for
! 535: compatibility.
! 536:
! 537: The use of "big endian" for both the size and data fields is
! 538: deliberate, it makes the data easy to read in a hex dump of a file.
! 539: Unfortunately it also means that the limb data must be reversed when
! 540: reading or writing, so neither a big endian nor little endian system
! 541: can just read and write `_mp_d'.
! 542:
! 543:
! 544: File: gmp.info, Node: C++ Interface Internals, Prev: Raw Output Internals, Up: Internals
! 545:
! 546: C++ Interface Internals
! 547: =======================
! 548:
! 549: A system of expression templates is used to ensure something like
! 550: `a=b+c' turns into a simple call to `mpz_add' etc. For `mpf_class' and
! 551: `mpfr_class' the scheme also ensures the precision of the final
! 552: destination is used for any temporaries within a statement like
! 553: `f=w*x+y*z'. These are important features which a naive implementation
! 554: cannot provide.
! 555:
! 556: A simplified description of the scheme follows. The true scheme is
! 557: complicated by the fact that expressions have different return types.
! 558: For detailed information, refer to the source code.
! 559:
! 560: To perform an operation, say, addition, we first define a "function
! 561: object" evaluating it,
! 562:
! 563: struct __gmp_binary_plus
! 564: {
! 565: static void eval(mpf_t f, mpf_t g, mpf_t h) { mpf_add(f, g, h); }
! 566: };
! 567:
! 568: And an "additive expression" object,
! 569:
! 570: __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
! 571: operator+(const mpf_class &f, const mpf_class &g)
! 572: {
! 573: return __gmp_expr
! 574: <__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
! 575: }
! 576:
! 577: The seemingly redundant `__gmp_expr<__gmp_binary_expr<...>>' is used
! 578: to encapsulate any possible kind of expression into a single template
! 579: type. In fact even `mpf_class' etc are `typedef' specializations of
! 580: `__gmp_expr'.
! 581:
! 582: Next we define assignment of `__gmp_expr' to `mpf_class'.
! 583:
! 584: template <class T>
! 585: mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
! 586: {
! 587: expr.eval(this->get_mpf_t(), this->precision());
! 588: return *this;
! 589: }
! 590:
! 591: template <class Op>
! 592: void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
! 593: (mpf_t f, unsigned long int precision)
! 594: {
! 595: Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
! 596: }
! 597:
! 598: where `expr.val1' and `expr.val2' are references to the expression's
! 599: operands (here `expr' is the `__gmp_binary_expr' stored within the
! 600: `__gmp_expr').
! 601:
! 602: This way, the expression is actually evaluated only at the time of
! 603: assignment, when the required precision (that of `f') is known.
! 604: Furthermore the target `mpf_t' is now available, thus we can call
! 605: `mpf_add' directly with `f' as the output argument.
! 606:
! 607: Compound expressions are handled by defining operators taking
! 608: subexpressions as their arguments, like this:
! 609:
! 610: template <class T, class U>
! 611: __gmp_expr
! 612: <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
! 613: operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
! 614: {
! 615: return __gmp_expr
! 616: <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
! 617: (expr1, expr2);
! 618: }
! 619:
! 620: And the corresponding specializations of `__gmp_expr::eval':
! 621:
! 622: template <class T, class U, class Op>
! 623: void __gmp_expr
! 624: <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
! 625: (mpf_t f, unsigned long int precision)
! 626: {
! 627: // declare two temporaries
! 628: mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
! 629: Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
! 630: }
! 631:
! 632: The expression is thus recursively evaluated to any level of
! 633: complexity and all subexpressions are evaluated to the precision of `f'.
! 634:
! 635:
! 636: File: gmp.info, Node: Contributors, Next: References, Prev: Internals, Up: Top
! 637:
! 638: Contributors
! 639: ************
! 640:
! 641: Torbjorn Granlund wrote the original GMP library and is still
! 642: developing and maintaining it. Several other individuals and
! 643: organizations have contributed to GMP in various ways. Here is a list
! 644: in chronological order:
! 645:
! 646: Gunnar Sjoedin and Hans Riesel helped with mathematical problems in
! 647: early versions of the library.
! 648:
! 649: Richard Stallman contributed to the interface design and revised the
! 650: first version of this manual.
! 651:
! 652: Brian Beuning and Doug Lea helped with testing of early versions of
! 653: the library and made creative suggestions.
! 654:
! 655: John Amanatides of York University in Canada contributed the function
! 656: `mpz_probab_prime_p'.
! 657:
! 658: Paul Zimmermann of Inria sparked the development of GMP 2, with his
! 659: comparisons between bignum packages.
! 660:
! 661: Ken Weber (Kent State University, Universidade Federal do Rio Grande
! 662: do Sul) contributed `mpz_gcd', `mpz_divexact', `mpn_gcd', and
! 663: `mpn_bdivmod', partially supported by CNPq (Brazil) grant 301314194-2.
! 664:
! 665: Per Bothner of Cygnus Support helped to set up GMP to use Cygnus'
! 666: configure. He has also made valuable suggestions and tested numerous
! 667: intermediary releases.
! 668:
! 669: Joachim Hollman was involved in the design of the `mpf' interface,
! 670: and in the `mpz' design revisions for version 2.
! 671:
! 672: Bennet Yee contributed the initial versions of `mpz_jacobi' and
! 673: `mpz_legendre'.
! 674:
! 675: Andreas Schwab contributed the files `mpn/m68k/lshift.S' and
! 676: `mpn/m68k/rshift.S' (now in `.asm' form).
! 677:
! 678: The development of floating point functions of GNU MP 2, were
! 679: supported in part by the ESPRIT-BRA (Basic Research Activities) 6846
! 680: project POSSO (POlynomial System SOlving).
! 681:
! 682: GNU MP 2 was finished and released by SWOX AB, SWEDEN, in
! 683: cooperation with the IDA Center for Computing Sciences, USA.
! 684:
! 685: Robert Harley of Inria, France and David Seal of ARM, England,
! 686: suggested clever improvements for population count.
! 687:
! 688: Robert Harley also wrote highly optimized Karatsuba and 3-way Toom
! 689: multiplication functions for GMP 3. He also contributed the ARM
! 690: assembly code.
! 691:
! 692: Torsten Ekedahl of the Mathematical department of Stockholm
! 693: University provided significant inspiration during several phases of
! 694: the GMP development. His mathematical expertise helped improve several
! 695: algorithms.
! 696:
! 697: Paul Zimmermann wrote the Divide and Conquer division code, the REDC
! 698: code, the REDC-based mpz_powm code, the FFT multiply code, and the
! 699: Karatsuba square root. The ECMNET project Paul is organizing was a
! 700: driving force behind many of the optimizations in GMP 3.
! 701:
! 702: Linus Nordberg wrote the new configure system based on autoconf and
! 703: implemented the new random functions.
! 704:
! 705: Kent Boortz made the Macintosh port.
! 706:
! 707: Kevin Ryde worked on a number of things: optimized x86 code, m4 asm
! 708: macros, parameter tuning, speed measuring, the configure system,
! 709: function inlining, divisibility tests, bit scanning, Jacobi symbols,
! 710: Fibonacci and Lucas number functions, printf and scanf functions, perl
! 711: interface, demo expression parser, the algorithms chapter in the
! 712: manual, `gmpasm-mode.el', and various miscellaneous improvements
! 713: elsewhere.
! 714:
! 715: Steve Root helped write the optimized alpha 21264 assembly code.
! 716:
! 717: Gerardo Ballabio wrote the `gmpxx.h' C++ class interface and the C++
! 718: `istream' input routines.
! 719:
! 720: GNU MP 4.0 was finished and released by Torbjorn Granlund and Kevin
! 721: Ryde. Torbjorn's work was partially funded by the IDA Center for
! 722: Computing Sciences, USA.
! 723:
! 724: (This list is chronological, not ordered after significance. If you
! 725: have contributed to GMP but are not listed above, please tell
! 726: <tege@swox.com> about the omission!)
! 727:
! 728: Thanks goes to Hans Thorsen for donating an SGI system for the GMP
! 729: test system environment.
! 730:
! 731:
! 732: File: gmp.info, Node: References, Next: GNU Free Documentation License, Prev: Contributors, Up: Top
! 733:
! 734: References
! 735: **********
! 736:
! 737: Books
! 738: =====
! 739:
! 740: * Jonathan M. Borwein and Peter B. Borwein, "Pi and the AGM: A Study
! 741: in Analytic Number Theory and Computational Complexity", Wiley,
! 742: John & Sons, 1998.
! 743:
! 744: * Henri Cohen, "A Course in Computational Algebraic Number Theory",
! 745: Graduate Texts in Mathematics number 138, Springer-Verlag, 1993.
! 746: `http://www.math.u-bordeaux.fr/~cohen'
! 747:
! 748: * Donald E. Knuth, "The Art of Computer Programming", volume 2,
! 749: "Seminumerical Algorithms", 3rd edition, Addison-Wesley, 1998.
! 750: `http://www-cs-faculty.stanford.edu/~knuth/taocp.html'
! 751:
! 752: * John D. Lipson, "Elements of Algebra and Algebraic Computing", The
! 753: Benjamin Cummings Publishing Company Inc, 1981.
! 754:
! 755: * Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone,
! 756: "Handbook of Applied Cryptography",
! 757: `http://www.cacr.math.uwaterloo.ca/hac/'
! 758:
! 759: * Richard M. Stallman, "Using and Porting GCC", Free Software
! 760: Foundation, 1999, available online
! 761: `http://www.gnu.org/software/gcc/onlinedocs/', and in the GCC
! 762: package `ftp://ftp.gnu.org/gnu/gcc/'
! 763:
! 764: Papers
! 765: ======
! 766:
! 767: * Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division",
! 768: Max-Planck-Institut fuer Informatik Research Report
! 769: MPI-I-98-1-022,
! 770: `http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022'
! 771:
! 772: * Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
! 773: Integers using Multiplication", in Proceedings of the SIGPLAN
! 774: PLDI'94 Conference, June 1994. Also available
! 775: `ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz' (and .psl.gz).
! 776:
! 777: * Peter L. Montgomery, "Modular Multiplication Without Trial
! 778: Division", in Mathematics of Computation, volume 44, number 170,
! 779: April 1985.
! 780:
! 781: * Tudor Jebelean, "An algorithm for exact division", Journal of
! 782: Symbolic Computation, volume 15, 1993, pp. 169-180. Research
! 783: report version available
! 784: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz'
! 785:
! 786: * Tudor Jebelean, "Exact Division with Karatsuba Complexity -
! 787: Extended Abstract", RISC-Linz technical report 96-31,
! 788: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz'
! 789:
! 790: * Tudor Jebelean, "Practical Integer Division with Karatsuba
! 791: Complexity", ISSAC 97, pp. 339-341. Technical report available
! 792: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz'
! 793:
! 794: * Tudor Jebelean, "A Generalization of the Binary GCD Algorithm",
! 795: ISSAC 93, pp. 111-116. Technical report version available
! 796: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz'
! 797:
! 798: * Tudor Jebelean, "A Double-Digit Lehmer-Euclid Algorithm for
! 799: Finding the GCD of Long Integers", Journal of Symbolic
! 800: Computation, volume 19, 1995, pp. 145-157. Technical report
! 801: version also available
! 802: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz'
! 803:
! 804: * Werner Krandick and Tudor Jebelean, "Bidirectional Exact Integer
! 805: Division", Journal of Symbolic Computation, volume 21, 1996, pp.
! 806: 441-455. Early technical report version also available
! 807: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz'
! 808:
! 809: * R. Moenck and A. Borodin, "Fast Modular Transforms via Division",
! 810: Proceedings of the 13th Annual IEEE Symposium on Switching and
! 811: Automata Theory, October 1972, pp. 90-96. Reprinted as "Fast
! 812: Modular Transforms", Journal of Computer and System Sciences,
! 813: volume 8, number 3, June 1974, pp. 366-386.
! 814:
! 815: * Arnold Scho"nhage and Volker Strassen, "Schnelle Multiplikation
! 816: grosser Zahlen", Computing 7, 1971, pp. 281-292.
! 817:
! 818: * Kenneth Weber, "The accelerated integer GCD algorithm", ACM
! 819: Transactions on Mathematical Software, volume 21, number 1, March
! 820: 1995, pp. 111-122.
! 821:
! 822: * Paul Zimmermann, "Karatsuba Square Root", INRIA Research Report
! 823: 3805, November 1999, `http://www.inria.fr/RRRT/RR-3805.html'
! 824:
! 825: * Paul Zimmermann, "A Proof of GMP Fast Division and Square Root
! 826: Implementations",
! 827: `http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz'
! 828:
! 829: * Dan Zuras, "On Squaring and Multiplying Large Integers", ARITH-11:
! 830: IEEE Symposium on Computer Arithmetic, 1993, pp. 260 to 271.
! 831: Reprinted as "More on Multiplying and Squaring Large Integers",
! 832: IEEE Transactions on Computers, volume 43, number 8, August 1994,
! 833: pp. 899-908.
! 834:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>