Annotation of OpenXM/src/kan96xx/gmp-2.0.2-ssh-2/mpn/alpha/README, Revision 1.1
1.1 ! takayama 1: This directory contains mpn functions optimized for DEC Alpha processors.
! 2:
! 3: RELEVANT OPTIMIZATION ISSUES
! 4:
! 5: EV4
! 6:
! 7: 1. This chip has very limited store bandwidth. The on-chip L1 cache is
! 8: write-through, and a cache line is transfered from the store buffer to the
! 9: off-chip L2 in as much 15 cycles on most systems. This delay hurts
! 10: mpn_add_n, mpn_sub_n, mpn_lshift, and mpn_rshift.
! 11:
! 12: 2. Pairing is possible between memory instructions and integer arithmetic
! 13: instructions.
! 14:
! 15: 3. mulq and umulh is documented to have a latency of 23 cycles, but 2 of
! 16: these cycles are pipelined. Thus, multiply instructions can be issued at a
! 17: rate of one each 21nd cycle.
! 18:
! 19: EV5
! 20:
! 21: 1. The memory bandwidth of this chip seems excellent, both for loads and
! 22: stores. Even when the working set is larger than the on-chip L1 and L2
! 23: caches, the perfromance remain almost unaffected.
! 24:
! 25: 2. mulq has a measured latency of 13 cycles and an issue rate of 1 each 8th
! 26: cycle. umulh has a measured latency of 15 cycles and an issue rate of 1
! 27: each 10th cycle. But the exact timing is somewhat confusing.
! 28:
! 29: 3. mpn_add_n. With 4-fold unrolling, we need 37 instructions, whereof 12
! 30: are memory operations. This will take at least
! 31: ceil(37/2) [dual issue] + 1 [taken branch] = 20 cycles
! 32: We have 12 memory cycles, plus 4 after-store conflict cycles, or 16 data
! 33: cache cycles, which should be completely hidden in the 20 issue cycles.
! 34: The computation is inherently serial, with these dependencies:
! 35: addq
! 36: / \
! 37: addq cmpult
! 38: | |
! 39: cmpult |
! 40: \ /
! 41: or
! 42: I.e., there is a 4 cycle path for each limb, making 16 cycles the absolute
! 43: minimum. We could replace the `or' with a cmoveq/cmovne, which would save
! 44: a cycle on EV5, but that might waste a cycle on EV4. Also, cmov takes 2
! 45: cycles.
! 46: addq
! 47: / \
! 48: addq cmpult
! 49: | \
! 50: cmpult -> cmovne
! 51:
! 52: STATUS
! 53:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>