Annotation of OpenXM/src/kan96xx/gmp-2.0.2-ssh-2/mpn/hppa/README, Revision 1.1
1.1 ! takayama 1: This directory contains mpn functions for various HP PA-RISC chips. Code
! 2: that runs faster on the PA7100 and later implementations, is in the pa7100
! 3: directory.
! 4:
! 5: RELEVANT OPTIMIZATION ISSUES
! 6:
! 7: Load and Store timing
! 8:
! 9: On the PA7000 no memory instructions can issue the two cycles after a store.
! 10: For the PA7100, this is reduced to one cycle.
! 11:
! 12: The PA7100 has a lookup-free cache, so it helps to schedule loads and the
! 13: dependent instruction really far from each other.
! 14:
! 15: STATUS
! 16:
! 17: 1. mpn_mul_1 could be improved to 6.5 cycles/limb on the PA7100, using the
! 18: instructions bwlow (but some sw pipelining is needed to avoid the
! 19: xmpyu-fstds delay):
! 20:
! 21: fldds s1_ptr
! 22:
! 23: xmpyu
! 24: fstds N(%r30)
! 25: xmpyu
! 26: fstds N(%r30)
! 27:
! 28: ldws N(%r30)
! 29: ldws N(%r30)
! 30: ldws N(%r30)
! 31: ldws N(%r30)
! 32:
! 33: addc
! 34: stws res_ptr
! 35: addc
! 36: stws res_ptr
! 37:
! 38: addib Loop
! 39:
! 40: 2. mpn_addmul_1 could be improved from the current 10 to 7.5 cycles/limb
! 41: (asymptotically) on the PA7100, using the instructions below. With proper
! 42: sw pipelining and the unrolling level below, the speed becomes 8
! 43: cycles/limb.
! 44:
! 45: fldds s1_ptr
! 46: fldds s1_ptr
! 47:
! 48: xmpyu
! 49: fstds N(%r30)
! 50: xmpyu
! 51: fstds N(%r30)
! 52: xmpyu
! 53: fstds N(%r30)
! 54: xmpyu
! 55: fstds N(%r30)
! 56:
! 57: ldws N(%r30)
! 58: ldws N(%r30)
! 59: ldws N(%r30)
! 60: ldws N(%r30)
! 61: ldws N(%r30)
! 62: ldws N(%r30)
! 63: ldws N(%r30)
! 64: ldws N(%r30)
! 65: addc
! 66: addc
! 67: addc
! 68: addc
! 69: addc %r0,%r0,cy-limb
! 70:
! 71: ldws res_ptr
! 72: ldws res_ptr
! 73: ldws res_ptr
! 74: ldws res_ptr
! 75: add
! 76: stws res_ptr
! 77: addc
! 78: stws res_ptr
! 79: addc
! 80: stws res_ptr
! 81: addc
! 82: stws res_ptr
! 83:
! 84: addib
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>