Annotation of OpenXM_contrib/gmp/gmp.info-7, Revision 1.1.1.1
1.1 ohara 1: This is gmp.info, produced by makeinfo version 4.2 from gmp.texi.
2:
3: This manual describes how to install and use the GNU multiple precision
4: arithmetic library, version 4.1.2.
5:
6: Copyright 1991, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000,
7: 2001, 2002 Free Software Foundation, Inc.
8:
9: Permission is granted to copy, distribute and/or modify this
10: document under the terms of the GNU Free Documentation License, Version
11: 1.1 or any later version published by the Free Software Foundation;
12: with no Invariant Sections, with the Front-Cover Texts being "A GNU
13: Manual", and with the Back-Cover Texts being "You have freedom to copy
14: and modify this GNU Manual, like GNU software". A copy of the license
15: is included in *Note GNU Free Documentation License::.
16: INFO-DIR-SECTION GNU libraries
17: START-INFO-DIR-ENTRY
18: * gmp: (gmp). GNU Multiple Precision Arithmetic Library.
19: END-INFO-DIR-ENTRY
20:
21:
22: File: gmp.info, Node: Assembler Cache Handling, Next: Assembler Floating Point, Prev: Assembler Carry Propagation, Up: Assembler Coding
23:
24: Cache Handling
25: --------------
26:
27: GMP aims to perform well both on operands that fit entirely in L1
28: cache and those which don't.
29:
30: Basic routines like `mpn_add_n' or `mpn_lshift' are often used on
31: large operands, so L2 and main memory performance is important for them.
32: `mpn_mul_1' and `mpn_addmul_1' are mostly used for multiply and square
33: basecases, so L1 performance matters most for them, unless assembler
34: versions of `mpn_mul_basecase' and `mpn_sqr_basecase' exist, in which
35: case the remaining uses are mostly for larger operands.
36:
37: For L2 or main memory operands, memory access times will almost
38: certainly be more than the calculation time. The aim therefore is to
39: maximize memory throughput, by starting a load of the next cache line
40: which processing the contents of the previous one. Clearly this is
41: only possible if the chip has a lock-up free cache or some sort of
42: prefetch instruction. Most current chips have both these features.
43:
44: Prefetching sources combines well with loop unrolling, since a
45: prefetch can be initiated once per unrolled loop (or more than once if
46: the loop covers more than one cache line).
47:
48: On CPUs without write-allocate caches, prefetching destinations will
49: ensure individual stores don't go further down the cache hierarchy,
50: limiting bandwidth. Of course for calculations which are slow anyway,
51: like `mpn_divrem_1', write-throughs might be fine.
52:
53: The distance ahead to prefetch will be determined by memory latency
54: versus throughput. The aim of course is to have data arriving
55: continuously, at peak throughput. Some CPUs have limits on the number
56: of fetches or prefetches in progress.
57:
58: If a special prefetch instruction doesn't exist then a plain load
59: can be used, but in that case care must be taken not to attempt to read
60: past the end of an operand, since that might produce a segmentation
61: violation.
62:
63: Some CPUs or systems have hardware that detects sequential memory
64: accesses and initiates suitable cache movements automatically, making
65: life easy.
66:
67:
68: File: gmp.info, Node: Assembler Floating Point, Next: Assembler SIMD Instructions, Prev: Assembler Cache Handling, Up: Assembler Coding
69:
70: Floating Point
71: --------------
72:
73: Floating point arithmetic is used in GMP for multiplications on CPUs
74: with poor integer multipliers. It's mostly useful for `mpn_mul_1',
75: `mpn_addmul_1' and `mpn_submul_1' on 64-bit machines, and
76: `mpn_mul_basecase' on both 32-bit and 64-bit machines.
77:
78: With IEEE 53-bit double precision floats, integer multiplications
79: producing up to 53 bits will give exact results. Breaking a 64x64
80: multiplication into eight 16x32->48 bit pieces is convenient. With
81: some care though six 21x32->53 bit products can be used, if one of the
82: lower two 21-bit pieces also uses the sign bit.
83:
84: For the `mpn_mul_1' family of functions on a 64-bit machine, the
85: invariant single limb is split at the start, into 3 or 4 pieces.
86: Inside the loop, the bignum operand is split into 32-bit pieces. Fast
87: conversion of these unsigned 32-bit pieces to floating point is highly
88: machine-dependent. In some cases, reading the data into the integer
89: unit, zero-extending to 64-bits, then transferring to the floating
90: point unit back via memory is the only option.
91:
92: Converting partial products back to 64-bit limbs is usually best
93: done as a signed conversion. Since all values are smaller than 2^53,
94: signed and unsigned are the same, but most processors lack unsigned
95: conversions.
96:
97:
98:
99: Here is a diagram showing 16x32 bit products for an `mpn_mul_1' or
100: `mpn_addmul_1' with a 64-bit limb. The single limb operand V is split
101: into four 16-bit parts. The multi-limb operand U is split in the loop
102: into two 32-bit parts.
103:
104: +---+---+---+---+
105: |v48|v32|v16|v00| V operand
106: +---+---+---+---+
107:
108: +-------+---+---+
109: x | u32 | u00 | U operand (one limb)
110: +---------------+
111:
112: ---------------------------------
113:
114: +-----------+
115: | u00 x v00 | p00 48-bit products
116: +-----------+
117: +-----------+
118: | u00 x v16 | p16
119: +-----------+
120: +-----------+
121: | u00 x v32 | p32
122: +-----------+
123: +-----------+
124: | u00 x v48 | p48
125: +-----------+
126: +-----------+
127: | u32 x v00 | r32
128: +-----------+
129: +-----------+
130: | u32 x v16 | r48
131: +-----------+
132: +-----------+
133: | u32 x v32 | r64
134: +-----------+
135: +-----------+
136: | u32 x v48 | r80
137: +-----------+
138:
139: p32 and r32 can be summed using floating-point addition, and
140: likewise p48 and r48. p00 and p16 can be summed with r64 and r80 from
141: the previous iteration.
142:
143: For each loop then, four 49-bit quantities are transfered to the
144: integer unit, aligned as follows,
145:
146: |-----64bits----|-----64bits----|
147: +------------+
148: | p00 + r64' | i00
149: +------------+
150: +------------+
151: | p16 + r80' | i16
152: +------------+
153: +------------+
154: | p32 + r32 | i32
155: +------------+
156: +------------+
157: | p48 + r48 | i48
158: +------------+
159:
160: The challenge then is to sum these efficiently and add in a carry
161: limb, generating a low 64-bit result limb and a high 33-bit carry limb
162: (i48 extends 33 bits into the high half).
163:
164:
165: File: gmp.info, Node: Assembler SIMD Instructions, Next: Assembler Software Pipelining, Prev: Assembler Floating Point, Up: Assembler Coding
166:
167: SIMD Instructions
168: -----------------
169:
170: The single-instruction multiple-data support in current
171: microprocessors is aimed at signal processing algorithms where each
172: data point can be treated more or less independently. There's
173: generally not much support for propagating the sort of carries that
174: arise in GMP.
175:
176: SIMD multiplications of say four 16x16 bit multiplies only do as much
177: work as one 32x32 from GMP's point of view, and need some shifts and
178: adds besides. But of course if say the SIMD form is fully pipelined
179: and uses less instruction decoding then it may still be worthwhile.
180:
181: On the 80x86 chips, MMX has so far found a use in `mpn_rshift' and
182: `mpn_lshift' since it allows 64-bit operations, and is used in a special
183: case for 16-bit multipliers in the P55 `mpn_mul_1'. 3DNow and SSE
184: haven't found a use so far.
185:
186:
187: File: gmp.info, Node: Assembler Software Pipelining, Next: Assembler Loop Unrolling, Prev: Assembler SIMD Instructions, Up: Assembler Coding
188:
189: Software Pipelining
190: -------------------
191:
192: Software pipelining consists of scheduling instructions around the
193: branch point in a loop. For example a loop taking a checksum of an
194: array of limbs might have a load and an add, but the load wouldn't be
195: for that add, rather for the one next time around the loop. Each load
196: then is effectively scheduled back in the previous iteration, allowing
197: latency to be hidden.
198:
199: Naturally this is wanted only when doing things like loads or
200: multiplies that take a few cycles to complete, and only where a CPU has
201: multiple functional units so that other work can be done while waiting.
202:
203: A pipeline with several stages will have a data value in progress at
204: each stage and each loop iteration moves them along one stage. This is
205: like juggling.
206:
207: Within the loop some moves between registers may be necessary to
208: have the right values in the right places for each iteration. Loop
209: unrolling can help this, with each unrolled block able to use different
210: registers for different values, even if some shuffling is still needed
211: just before going back to the top of the loop.
212:
213:
214: File: gmp.info, Node: Assembler Loop Unrolling, Prev: Assembler Software Pipelining, Up: Assembler Coding
215:
216: Loop Unrolling
217: --------------
218:
219: Loop unrolling consists of replicating code so that several limbs are
220: processed in each loop. At a minimum this reduces loop overheads by a
221: corresponding factor, but it can also allow better register usage, for
222: example alternately using one register combination and then another.
223: Judicious use of `m4' macros can help avoid lots of duplication in the
224: source code.
225:
226: Unrolling is commonly done to a power of 2 multiple so the number of
227: unrolled loops and the number of remaining limbs can be calculated with
228: a shift and mask. But other multiples can be used too, just by
229: subtracting each N limbs processed from a counter and waiting for less
230: than N remaining (or offsetting the counter by N so it goes negative
231: when there's less than N remaining).
232:
233: The limbs not a multiple of the unrolling can be handled in various
234: ways, for example
235:
236: * A simple loop at the end (or the start) to process the excess.
237: Care will be wanted that it isn't too much slower than the
238: unrolled part.
239:
240: * A set of binary tests, for example after an 8-limb unrolling, test
241: for 4 more limbs to process, then a further 2 more or not, and
242: finally 1 more or not. This will probably take more code space
243: than a simple loop.
244:
245: * A `switch' statement, providing separate code for each possible
246: excess, for example an 8-limb unrolling would have separate code
247: for 0 remaining, 1 remaining, etc, up to 7 remaining. This might
248: take a lot of code, but may be the best way to optimize all cases
249: in combination with a deep pipelined loop.
250:
251: * A computed jump into the middle of the loop, thus making the first
252: iteration handle the excess. This should make times smoothly
253: increase with size, which is attractive, but setups for the jump
254: and adjustments for pointers can be tricky and could become quite
255: difficult in combination with deep pipelining.
256:
257: One way to write the setups and finishups for a pipelined unrolled
258: loop is simply to duplicate the loop at the start and the end, then
259: delete instructions at the start which have no valid antecedents, and
260: delete instructions at the end whose results are unwanted. Sizes not a
261: multiple of the unrolling can then be handled as desired.
262:
263:
264: File: gmp.info, Node: Internals, Next: Contributors, Prev: Algorithms, Up: Top
265:
266: Internals
267: *********
268:
269: *This chapter is provided only for informational purposes and the
270: various internals described here may change in future GMP releases.
271: Applications expecting to be compatible with future releases should use
272: only the documented interfaces described in previous chapters.*
273:
274: * Menu:
275:
276: * Integer Internals::
277: * Rational Internals::
278: * Float Internals::
279: * Raw Output Internals::
280: * C++ Interface Internals::
281:
282:
283: File: gmp.info, Node: Integer Internals, Next: Rational Internals, Prev: Internals, Up: Internals
284:
285: Integer Internals
286: =================
287:
288: `mpz_t' variables represent integers using sign and magnitude, in
289: space dynamically allocated and reallocated. The fields are as follows.
290:
291: `_mp_size'
292: The number of limbs, or the negative of that when representing a
293: negative integer. Zero is represented by `_mp_size' set to zero,
294: in which case the `_mp_d' data is unused.
295:
296: `_mp_d'
297: A pointer to an array of limbs which is the magnitude. These are
298: stored "little endian" as per the `mpn' functions, so `_mp_d[0]'
299: is the least significant limb and `_mp_d[ABS(_mp_size)-1]' is the
300: most significant. Whenever `_mp_size' is non-zero, the most
301: significant limb is non-zero.
302:
303: Currently there's always at least one limb allocated, so for
304: instance `mpz_set_ui' never needs to reallocate, and `mpz_get_ui'
305: can fetch `_mp_d[0]' unconditionally (though its value is then
306: only wanted if `_mp_size' is non-zero).
307:
308: `_mp_alloc'
309: `_mp_alloc' is the number of limbs currently allocated at `_mp_d',
310: and naturally `_mp_alloc >= ABS(_mp_size)'. When an `mpz' routine
311: is about to (or might be about to) increase `_mp_size', it checks
312: `_mp_alloc' to see whether there's enough space, and reallocates
313: if not. `MPZ_REALLOC' is generally used for this.
314:
315: The various bitwise logical functions like `mpz_and' behave as if
316: negative values were twos complement. But sign and magnitude is always
317: used internally, and necessary adjustments are made during the
318: calculations. Sometimes this isn't pretty, but sign and magnitude are
319: best for other routines.
320:
321: Some internal temporary variables are setup with `MPZ_TMP_INIT' and
322: these have `_mp_d' space obtained from `TMP_ALLOC' rather than the
323: memory allocation functions. Care is taken to ensure that these are
324: big enough that no reallocation is necessary (since it would have
325: unpredictable consequences).
326:
327:
328: File: gmp.info, Node: Rational Internals, Next: Float Internals, Prev: Integer Internals, Up: Internals
329:
330: Rational Internals
331: ==================
332:
333: `mpq_t' variables represent rationals using an `mpz_t' numerator and
334: denominator (*note Integer Internals::).
335:
336: The canonical form adopted is denominator positive (and non-zero),
337: no common factors between numerator and denominator, and zero uniquely
338: represented as 0/1.
339:
340: It's believed that casting out common factors at each stage of a
341: calculation is best in general. A GCD is an O(N^2) operation so it's
342: better to do a few small ones immediately than to delay and have to do
343: a big one later. Knowing the numerator and denominator have no common
344: factors can be used for example in `mpq_mul' to make only two cross
345: GCDs necessary, not four.
346:
347: This general approach to common factors is badly sub-optimal in the
348: presence of simple factorizations or little prospect for cancellation,
349: but GMP has no way to know when this will occur. As per *Note
350: Efficiency::, that's left to applications. The `mpq_t' framework might
351: still suit, with `mpq_numref' and `mpq_denref' for direct access to the
352: numerator and denominator, or of course `mpz_t' variables can be used
353: directly.
354:
355:
356: File: gmp.info, Node: Float Internals, Next: Raw Output Internals, Prev: Rational Internals, Up: Internals
357:
358: Float Internals
359: ===============
360:
361: Efficient calculation is the primary aim of GMP floats and the use
362: of whole limbs and simple rounding facilitates this.
363:
364: `mpf_t' floats have a variable precision mantissa and a single
365: machine word signed exponent. The mantissa is represented using sign
366: and magnitude.
367:
368: most least
369: significant significant
370: limb limb
371:
372: _mp_d
373: |---- _mp_exp ---> |
374: _____ _____ _____ _____ _____
375: |_____|_____|_____|_____|_____|
376: . <------------ radix point
377:
378: <-------- _mp_size --------->
379:
380: The fields are as follows.
381:
382: `_mp_size'
383: The number of limbs currently in use, or the negative of that when
384: representing a negative value. Zero is represented by `_mp_size'
385: and `_mp_exp' both set to zero, and in that case the `_mp_d' data
386: is unused. (In the future `_mp_exp' might be undefined when
387: representing zero.)
388:
389: `_mp_prec'
390: The precision of the mantissa, in limbs. In any calculation the
391: aim is to produce `_mp_prec' limbs of result (the most significant
392: being non-zero).
393:
394: `_mp_d'
395: A pointer to the array of limbs which is the absolute value of the
396: mantissa. These are stored "little endian" as per the `mpn'
397: functions, so `_mp_d[0]' is the least significant limb and
398: `_mp_d[ABS(_mp_size)-1]' the most significant.
399:
400: The most significant limb is always non-zero, but there are no
401: other restrictions on its value, in particular the highest 1 bit
402: can be anywhere within the limb.
403:
404: `_mp_prec+1' limbs are allocated to `_mp_d', the extra limb being
405: for convenience (see below). There are no reallocations during a
406: calculation, only in a change of precision with `mpf_set_prec'.
407:
408: `_mp_exp'
409: The exponent, in limbs, determining the location of the implied
410: radix point. Zero means the radix point is just above the most
411: significant limb. Positive values mean a radix point offset
412: towards the lower limbs and hence a value >= 1, as for example in
413: the diagram above. Negative exponents mean a radix point further
414: above the highest limb.
415:
416: Naturally the exponent can be any value, it doesn't have to fall
417: within the limbs as the diagram shows, it can be a long way above
418: or a long way below. Limbs other than those included in the
419: `{_mp_d,_mp_size}' data are treated as zero.
420:
421:
422: The following various points should be noted.
423:
424: Low Zeros
425: The least significant limbs `_mp_d[0]' etc can be zero, though
426: such low zeros can always be ignored. Routines likely to produce
427: low zeros check and avoid them to save time in subsequent
428: calculations, but for most routines they're quite unlikely and
429: aren't checked.
430:
431: Mantissa Size Range
432: The `_mp_size' count of limbs in use can be less than `_mp_prec' if
433: the value can be represented in less. This means low precision
434: values or small integers stored in a high precision `mpf_t' can
435: still be operated on efficiently.
436:
437: `_mp_size' can also be greater than `_mp_prec'. Firstly a value is
438: allowed to use all of the `_mp_prec+1' limbs available at `_mp_d',
439: and secondly when `mpf_set_prec_raw' lowers `_mp_prec' it leaves
440: `_mp_size' unchanged and so the size can be arbitrarily bigger than
441: `_mp_prec'.
442:
443: Rounding
444: All rounding is done on limb boundaries. Calculating `_mp_prec'
445: limbs with the high non-zero will ensure the application requested
446: minimum precision is obtained.
447:
448: The use of simple "trunc" rounding towards zero is efficient,
449: since there's no need to examine extra limbs and increment or
450: decrement.
451:
452: Bit Shifts
453: Since the exponent is in limbs, there are no bit shifts in basic
454: operations like `mpf_add' and `mpf_mul'. When differing exponents
455: are encountered all that's needed is to adjust pointers to line up
456: the relevant limbs.
457:
458: Of course `mpf_mul_2exp' and `mpf_div_2exp' will require bit
459: shifts, but the choice is between an exponent in limbs which
460: requires shifts there, or one in bits which requires them almost
461: everywhere else.
462:
463: Use of `_mp_prec+1' Limbs
464: The extra limb on `_mp_d' (`_mp_prec+1' rather than just
465: `_mp_prec') helps when an `mpf' routine might get a carry from its
466: operation. `mpf_add' for instance will do an `mpn_add' of
467: `_mp_prec' limbs. If there's no carry then that's the result, but
468: if there is a carry then it's stored in the extra limb of space and
469: `_mp_size' becomes `_mp_prec+1'.
470:
471: Whenever `_mp_prec+1' limbs are held in a variable, the low limb
472: is not needed for the intended precision, only the `_mp_prec' high
473: limbs. But zeroing it out or moving the rest down is unnecessary.
474: Subsequent routines reading the value will simply take the high
475: limbs they need, and this will be `_mp_prec' if their target has
476: that same precision. This is no more than a pointer adjustment,
477: and must be checked anyway since the destination precision can be
478: different from the sources.
479:
480: Copy functions like `mpf_set' will retain a full `_mp_prec+1' limbs
481: if available. This ensures that a variable which has `_mp_size'
482: equal to `_mp_prec+1' will get its full exact value copied.
483: Strictly speaking this is unnecessary since only `_mp_prec' limbs
484: are needed for the application's requested precision, but it's
485: considered that an `mpf_set' from one variable into another of the
486: same precision ought to produce an exact copy.
487:
488: Application Precisions
489: `__GMPF_BITS_TO_PREC' converts an application requested precision
490: to an `_mp_prec'. The value in bits is rounded up to a whole limb
491: then an extra limb is added since the most significant limb of
492: `_mp_d' is only non-zero and therefore might contain only one bit.
493:
494: `__GMPF_PREC_TO_BITS' does the reverse conversion, and removes the
495: extra limb from `_mp_prec' before converting to bits. The net
496: effect of reading back with `mpf_get_prec' is simply the precision
497: rounded up to a multiple of `mp_bits_per_limb'.
498:
499: Note that the extra limb added here for the high only being
500: non-zero is in addition to the extra limb allocated to `_mp_d'.
501: For example with a 32-bit limb, an application request for 250
502: bits will be rounded up to 8 limbs, then an extra added for the
503: high being only non-zero, giving an `_mp_prec' of 9. `_mp_d' then
504: gets 10 limbs allocated. Reading back with `mpf_get_prec' will
505: take `_mp_prec' subtract 1 limb and multiply by 32, giving 256
506: bits.
507:
508: Strictly speaking, the fact the high limb has at least one bit
509: means that a float with, say, 3 limbs of 32-bits each will be
510: holding at least 65 bits, but for the purposes of `mpf_t' it's
511: considered simply to be 64 bits, a nice multiple of the limb size.
512:
513:
514: File: gmp.info, Node: Raw Output Internals, Next: C++ Interface Internals, Prev: Float Internals, Up: Internals
515:
516: Raw Output Internals
517: ====================
518:
519: `mpz_out_raw' uses the following format.
520:
521: +------+------------------------+
522: | size | data bytes |
523: +------+------------------------+
524:
525: The size is 4 bytes written most significant byte first, being the
526: number of subsequent data bytes, or the twos complement negative of
527: that when a negative integer is represented. The data bytes are the
528: absolute value of the integer, written most significant byte first.
529:
530: The most significant data byte is always non-zero, so the output is
531: the same on all systems, irrespective of limb size.
532:
533: In GMP 1, leading zero bytes were written to pad the data bytes to a
534: multiple of the limb size. `mpz_inp_raw' will still accept this, for
535: compatibility.
536:
537: The use of "big endian" for both the size and data fields is
538: deliberate, it makes the data easy to read in a hex dump of a file.
539: Unfortunately it also means that the limb data must be reversed when
540: reading or writing, so neither a big endian nor little endian system
541: can just read and write `_mp_d'.
542:
543:
544: File: gmp.info, Node: C++ Interface Internals, Prev: Raw Output Internals, Up: Internals
545:
546: C++ Interface Internals
547: =======================
548:
549: A system of expression templates is used to ensure something like
550: `a=b+c' turns into a simple call to `mpz_add' etc. For `mpf_class' and
551: `mpfr_class' the scheme also ensures the precision of the final
552: destination is used for any temporaries within a statement like
553: `f=w*x+y*z'. These are important features which a naive implementation
554: cannot provide.
555:
556: A simplified description of the scheme follows. The true scheme is
557: complicated by the fact that expressions have different return types.
558: For detailed information, refer to the source code.
559:
560: To perform an operation, say, addition, we first define a "function
561: object" evaluating it,
562:
563: struct __gmp_binary_plus
564: {
565: static void eval(mpf_t f, mpf_t g, mpf_t h) { mpf_add(f, g, h); }
566: };
567:
568: And an "additive expression" object,
569:
570: __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >
571: operator+(const mpf_class &f, const mpf_class &g)
572: {
573: return __gmp_expr
574: <__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);
575: }
576:
577: The seemingly redundant `__gmp_expr<__gmp_binary_expr<...>>' is used
578: to encapsulate any possible kind of expression into a single template
579: type. In fact even `mpf_class' etc are `typedef' specializations of
580: `__gmp_expr'.
581:
582: Next we define assignment of `__gmp_expr' to `mpf_class'.
583:
584: template <class T>
585: mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)
586: {
587: expr.eval(this->get_mpf_t(), this->precision());
588: return *this;
589: }
590:
591: template <class Op>
592: void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval
593: (mpf_t f, unsigned long int precision)
594: {
595: Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());
596: }
597:
598: where `expr.val1' and `expr.val2' are references to the expression's
599: operands (here `expr' is the `__gmp_binary_expr' stored within the
600: `__gmp_expr').
601:
602: This way, the expression is actually evaluated only at the time of
603: assignment, when the required precision (that of `f') is known.
604: Furthermore the target `mpf_t' is now available, thus we can call
605: `mpf_add' directly with `f' as the output argument.
606:
607: Compound expressions are handled by defining operators taking
608: subexpressions as their arguments, like this:
609:
610: template <class T, class U>
611: __gmp_expr
612: <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
613: operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)
614: {
615: return __gmp_expr
616: <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >
617: (expr1, expr2);
618: }
619:
620: And the corresponding specializations of `__gmp_expr::eval':
621:
622: template <class T, class U, class Op>
623: void __gmp_expr
624: <__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval
625: (mpf_t f, unsigned long int precision)
626: {
627: // declare two temporaries
628: mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);
629: Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());
630: }
631:
632: The expression is thus recursively evaluated to any level of
633: complexity and all subexpressions are evaluated to the precision of `f'.
634:
635:
636: File: gmp.info, Node: Contributors, Next: References, Prev: Internals, Up: Top
637:
638: Contributors
639: ************
640:
641: Torbjorn Granlund wrote the original GMP library and is still
642: developing and maintaining it. Several other individuals and
643: organizations have contributed to GMP in various ways. Here is a list
644: in chronological order:
645:
646: Gunnar Sjoedin and Hans Riesel helped with mathematical problems in
647: early versions of the library.
648:
649: Richard Stallman contributed to the interface design and revised the
650: first version of this manual.
651:
652: Brian Beuning and Doug Lea helped with testing of early versions of
653: the library and made creative suggestions.
654:
655: John Amanatides of York University in Canada contributed the function
656: `mpz_probab_prime_p'.
657:
658: Paul Zimmermann of Inria sparked the development of GMP 2, with his
659: comparisons between bignum packages.
660:
661: Ken Weber (Kent State University, Universidade Federal do Rio Grande
662: do Sul) contributed `mpz_gcd', `mpz_divexact', `mpn_gcd', and
663: `mpn_bdivmod', partially supported by CNPq (Brazil) grant 301314194-2.
664:
665: Per Bothner of Cygnus Support helped to set up GMP to use Cygnus'
666: configure. He has also made valuable suggestions and tested numerous
667: intermediary releases.
668:
669: Joachim Hollman was involved in the design of the `mpf' interface,
670: and in the `mpz' design revisions for version 2.
671:
672: Bennet Yee contributed the initial versions of `mpz_jacobi' and
673: `mpz_legendre'.
674:
675: Andreas Schwab contributed the files `mpn/m68k/lshift.S' and
676: `mpn/m68k/rshift.S' (now in `.asm' form).
677:
678: The development of floating point functions of GNU MP 2, were
679: supported in part by the ESPRIT-BRA (Basic Research Activities) 6846
680: project POSSO (POlynomial System SOlving).
681:
682: GNU MP 2 was finished and released by SWOX AB, SWEDEN, in
683: cooperation with the IDA Center for Computing Sciences, USA.
684:
685: Robert Harley of Inria, France and David Seal of ARM, England,
686: suggested clever improvements for population count.
687:
688: Robert Harley also wrote highly optimized Karatsuba and 3-way Toom
689: multiplication functions for GMP 3. He also contributed the ARM
690: assembly code.
691:
692: Torsten Ekedahl of the Mathematical department of Stockholm
693: University provided significant inspiration during several phases of
694: the GMP development. His mathematical expertise helped improve several
695: algorithms.
696:
697: Paul Zimmermann wrote the Divide and Conquer division code, the REDC
698: code, the REDC-based mpz_powm code, the FFT multiply code, and the
699: Karatsuba square root. The ECMNET project Paul is organizing was a
700: driving force behind many of the optimizations in GMP 3.
701:
702: Linus Nordberg wrote the new configure system based on autoconf and
703: implemented the new random functions.
704:
705: Kent Boortz made the Macintosh port.
706:
707: Kevin Ryde worked on a number of things: optimized x86 code, m4 asm
708: macros, parameter tuning, speed measuring, the configure system,
709: function inlining, divisibility tests, bit scanning, Jacobi symbols,
710: Fibonacci and Lucas number functions, printf and scanf functions, perl
711: interface, demo expression parser, the algorithms chapter in the
712: manual, `gmpasm-mode.el', and various miscellaneous improvements
713: elsewhere.
714:
715: Steve Root helped write the optimized alpha 21264 assembly code.
716:
717: Gerardo Ballabio wrote the `gmpxx.h' C++ class interface and the C++
718: `istream' input routines.
719:
720: GNU MP 4.0 was finished and released by Torbjorn Granlund and Kevin
721: Ryde. Torbjorn's work was partially funded by the IDA Center for
722: Computing Sciences, USA.
723:
724: (This list is chronological, not ordered after significance. If you
725: have contributed to GMP but are not listed above, please tell
726: <tege@swox.com> about the omission!)
727:
728: Thanks goes to Hans Thorsen for donating an SGI system for the GMP
729: test system environment.
730:
731:
732: File: gmp.info, Node: References, Next: GNU Free Documentation License, Prev: Contributors, Up: Top
733:
734: References
735: **********
736:
737: Books
738: =====
739:
740: * Jonathan M. Borwein and Peter B. Borwein, "Pi and the AGM: A Study
741: in Analytic Number Theory and Computational Complexity", Wiley,
742: John & Sons, 1998.
743:
744: * Henri Cohen, "A Course in Computational Algebraic Number Theory",
745: Graduate Texts in Mathematics number 138, Springer-Verlag, 1993.
746: `http://www.math.u-bordeaux.fr/~cohen'
747:
748: * Donald E. Knuth, "The Art of Computer Programming", volume 2,
749: "Seminumerical Algorithms", 3rd edition, Addison-Wesley, 1998.
750: `http://www-cs-faculty.stanford.edu/~knuth/taocp.html'
751:
752: * John D. Lipson, "Elements of Algebra and Algebraic Computing", The
753: Benjamin Cummings Publishing Company Inc, 1981.
754:
755: * Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone,
756: "Handbook of Applied Cryptography",
757: `http://www.cacr.math.uwaterloo.ca/hac/'
758:
759: * Richard M. Stallman, "Using and Porting GCC", Free Software
760: Foundation, 1999, available online
761: `http://www.gnu.org/software/gcc/onlinedocs/', and in the GCC
762: package `ftp://ftp.gnu.org/gnu/gcc/'
763:
764: Papers
765: ======
766:
767: * Christoph Burnikel and Joachim Ziegler, "Fast Recursive Division",
768: Max-Planck-Institut fuer Informatik Research Report
769: MPI-I-98-1-022,
770: `http://data.mpi-sb.mpg.de/internet/reports.nsf/NumberView/1998-1-022'
771:
772: * Torbjorn Granlund and Peter L. Montgomery, "Division by Invariant
773: Integers using Multiplication", in Proceedings of the SIGPLAN
774: PLDI'94 Conference, June 1994. Also available
775: `ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz' (and .psl.gz).
776:
777: * Peter L. Montgomery, "Modular Multiplication Without Trial
778: Division", in Mathematics of Computation, volume 44, number 170,
779: April 1985.
780:
781: * Tudor Jebelean, "An algorithm for exact division", Journal of
782: Symbolic Computation, volume 15, 1993, pp. 169-180. Research
783: report version available
784: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz'
785:
786: * Tudor Jebelean, "Exact Division with Karatsuba Complexity -
787: Extended Abstract", RISC-Linz technical report 96-31,
788: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz'
789:
790: * Tudor Jebelean, "Practical Integer Division with Karatsuba
791: Complexity", ISSAC 97, pp. 339-341. Technical report available
792: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz'
793:
794: * Tudor Jebelean, "A Generalization of the Binary GCD Algorithm",
795: ISSAC 93, pp. 111-116. Technical report version available
796: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz'
797:
798: * Tudor Jebelean, "A Double-Digit Lehmer-Euclid Algorithm for
799: Finding the GCD of Long Integers", Journal of Symbolic
800: Computation, volume 19, 1995, pp. 145-157. Technical report
801: version also available
802: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz'
803:
804: * Werner Krandick and Tudor Jebelean, "Bidirectional Exact Integer
805: Division", Journal of Symbolic Computation, volume 21, 1996, pp.
806: 441-455. Early technical report version also available
807: `ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz'
808:
809: * R. Moenck and A. Borodin, "Fast Modular Transforms via Division",
810: Proceedings of the 13th Annual IEEE Symposium on Switching and
811: Automata Theory, October 1972, pp. 90-96. Reprinted as "Fast
812: Modular Transforms", Journal of Computer and System Sciences,
813: volume 8, number 3, June 1974, pp. 366-386.
814:
815: * Arnold Scho"nhage and Volker Strassen, "Schnelle Multiplikation
816: grosser Zahlen", Computing 7, 1971, pp. 281-292.
817:
818: * Kenneth Weber, "The accelerated integer GCD algorithm", ACM
819: Transactions on Mathematical Software, volume 21, number 1, March
820: 1995, pp. 111-122.
821:
822: * Paul Zimmermann, "Karatsuba Square Root", INRIA Research Report
823: 3805, November 1999, `http://www.inria.fr/RRRT/RR-3805.html'
824:
825: * Paul Zimmermann, "A Proof of GMP Fast Division and Square Root
826: Implementations",
827: `http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz'
828:
829: * Dan Zuras, "On Squaring and Multiplying Large Integers", ARITH-11:
830: IEEE Symposium on Computer Arithmetic, 1993, pp. 260 to 271.
831: Reprinted as "More on Multiplying and Squaring Large Integers",
832: IEEE Transactions on Computers, volume 43, number 8, August 1994,
833: pp. 899-908.
834:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>