[BACK]Return to README CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / mpn / x86 / p6

Annotation of OpenXM_contrib/gmp/mpn/x86/p6/README, Revision 1.1

1.1     ! maekawa     1:
        !             2:                       INTEL P6 MPN SUBROUTINES
        !             3:
        !             4:
        !             5:
        !             6: This directory contains code optimized for Intel P6 class CPUs, meaning
        !             7: PentiumPro, Pentium II and Pentium III.  The mmx and p3mmx subdirectories
        !             8: have routines using MMX instructions.
        !             9:
        !            10:
        !            11:
        !            12: STATUS
        !            13:
        !            14: Times for the loops, with all code and data in L1 cache, are as follows.
        !            15: Some of these might be able to be improved.
        !            16:
        !            17:                                cycles/limb
        !            18:
        !            19:        mpn_add_n/sub_n           3.7
        !            20:
        !            21:        mpn_copyi                 0.75
        !            22:        mpn_copyd                 2.4
        !            23:
        !            24:        mpn_divrem_1             39.0
        !            25:        mpn_mod_1                39.0
        !            26:        mpn_divexact_by3          8.5
        !            27:
        !            28:        mpn_mul_1                 5.5
        !            29:        mpn_addmul/submul_1       6.35
        !            30:
        !            31:        mpn_l/rshift              2.5
        !            32:
        !            33:        mpn_mul_basecase          8.2 cycles/crossproduct (approx)
        !            34:        mpn_sqr_basecase          4.0 cycles/crossproduct (approx)
        !            35:                                  or 7.75 cycles/triangleproduct (approx)
        !            36:
        !            37: Pentium II and III have MMX and get the following improvements.
        !            38:
        !            39:        mpn_divrem_1             25.0 integer part, 17.5 fractional part
        !            40:        mpn_mod_1                24.0
        !            41:
        !            42:        mpn_l/rshift              1.75
        !            43:
        !            44:
        !            45:
        !            46:
        !            47: NOTES
        !            48:
        !            49: Write-allocate L1 data cache means prefetching of destinations is unnecessary.
        !            50:
        !            51: Mispredicted branches have a penalty of between 9 and 15 cycles, and even up
        !            52: to 26 cycles depending how far speculative execution has gone.  The 9 cycle
        !            53: minimum penalty comes from the issue pipeline being 9 stages.
        !            54:
        !            55: A copy with rep movs seems to copy 16 bytes at a time, since speeds for 4,
        !            56: 5, 6 or 7 limb operations are all the same.  The 0.75 cycles/limb would be 3
        !            57: cycles per 16 byte block.
        !            58:
        !            59:
        !            60:
        !            61:
        !            62: CODING
        !            63:
        !            64: Instructions in general code have been shown grouped if they can execute
        !            65: together, which means up to three instructions with no successive
        !            66: dependencies, and with only the first being a multiple micro-op.
        !            67:
        !            68: P6 has out-of-order execution, so the groupings are really only showing
        !            69: dependent paths where some shuffling might allow some latencies to be
        !            70: hidden.
        !            71:
        !            72:
        !            73:
        !            74:
        !            75: REFERENCES
        !            76:
        !            77: "Intel Architecture Optimization Reference Manual", 1999, revision 001 dated
        !            78: 02/99, order number 245127 (order number 730795-001 is in the document too).
        !            79: Available on-line:
        !            80:
        !            81:        http://download.intel.com/design/PentiumII/manuals/245127.htm
        !            82:
        !            83: "Intel Architecture Optimization Manual", 1997, order number 242816.  This
        !            84: is an older document mostly about P5 and not as good as the above.
        !            85: Available on-line:
        !            86:
        !            87:        http://download.intel.com/design/PentiumII/manuals/242816.htm
        !            88:
        !            89:
        !            90:
        !            91: ----------------
        !            92: Local variables:
        !            93: mode: text
        !            94: fill-column: 76
        !            95: End:

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>