[BACK]Return to assembly_code CVS log [TXT][DIR] Up to [local] / OpenXM_contrib / gmp / doc

Annotation of OpenXM_contrib/gmp/doc/assembly_code, Revision 1.1

1.1     ! maekawa     1: Most mpn subdirectories contain machine-dependent code, written in
        !             2: assembly or C.  The `generic' subdirectory contains default code, used
        !             3: when there is no machine-dependent replacement for a particular
        !             4: machine.
        !             5:
        !             6: There is one subdirectory for each ISA family.  Note that e.g., 32-bit SPARC
        !             7: and 64-bit SPARC are very different ISA's, and thus cannot share any code.
        !             8:
        !             9: A particular compile will only use code from one subdirectory, and the
        !            10: `generic' subdirectory.  The ISA-specific subdirectories contain hierarchies of
        !            11: directories for various architecture variants and implementations; the
        !            12: top-most level contains code that runs correctly on all variants.
        !            13:
        !            14: HOW TO WRITE FAST ASSEMBLY CODE FOR GMP
        !            15:
        !            16: [This should ultimately be made into a chapter of the GMP manual.]
        !            17:
        !            18: The most basic techniques are software pipelining and loop unrolling.
        !            19:
        !            20: Software pipelining is the technique of scheduling instructions around
        !            21: the branch point in a loop, so that consecutive iterations overlap.
        !            22: It is very much like juggling.
        !            23:
        !            24: Unrolling is useful when software pipelining does not get us close
        !            25: enough to the peek performance of a processor's pipeline.  Unrolling
        !            26: decreases the loop overhead, but also often allows a more even load on
        !            27: a processor's functional units.
        !            28:
        !            29: For processors with very few registers, software pipelining is not
        !            30: feasible as it increases register pressure.
        !            31:
        !            32: For superscalar machines, it is often the case that all available
        !            33: execution capabilities are not used.  Scheduling some instructions
        !            34: for these otherwise unused resources will never cost us anything.
        !            35:
        !            36: Try to determine the alternative instructions that can be used for a
        !            37: particular processor.  For GMP, the problem that presents most
        !            38: challenges is propagating carry from one iteration to the next.
        !            39: Explore the different possibilities for doing that with the available
        !            40: instructions!
        !            41:
        !            42: For wide superscalar processors, the performance might be completely
        !            43: determined by the number of dependent instruction required from
        !            44: accepting carry-in from the previous iteration until producing
        !            45: carry-out for the next iteration.  This is particularly true for
        !            46: simple operations like mpn_add_n and mpn_sub_n.  Some carry
        !            47: propagation schemes require 4 instructions, translating to at least
        !            48: four cycles per iterations.  Other schemes can propagate carry in two
        !            49: cycles or even just one cycle.
        !            50:
        !            51: Therefore, for wide superscalar processors, finding methods with
        !            52: "shallow" carry propagation given an instruction set is often the
        !            53: central problem we need to address.  The rest is just is hard coding
        !            54: work.
        !            55:
        !            56: [Describe: First find issue maps with desired performance
        !            57:           Then schedule for latency]

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>