Annotation of OpenXM/doc/issac2000/homogeneous-network.tex, Revision 1.6
1.6 ! takayama 1: % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.5 2000/01/15 00:20:45 takayama Exp $
1.2 takayama 2:
3: \section{Applications}
1.3 noro 4: \subsection{Distributed computation with homogeneous servers}
1.6 ! takayama 5: \label{section:homog}
1.2 takayama 6:
1.5 takayama 7: One of the aims of OpenXM is a parallel speedup by a distributed computation
1.3 noro 8: with homogeneous servers. As the current specification of OpenXM does
9: not include communication between servers, one cannot expect
10: the maximal parallel speedup. However it is possible to execute
11: several types of distributed computation as follows.
12:
13: \subsubsection{Product of univariate polynomials}
14:
15: Shoup \cite{Shoup} showed that the product of univariate polynomials
16: with large degrees and large coefficients can be computed efficiently
17: by FFT over small finite fields and Chinese remainder theorem.
18: It can be easily parallelized:
19:
20: \begin{tabbing}
21: Input :\= $f_1, f_2 \in Z[x]$\\
22: \> such that $deg(f_1), deg(f_2) < 2^M$\\
23: Output : $f = f_1f_2 \bmod p$\\
24: $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\
25: \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
26: Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
27: for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
28: Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
29: \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
30: \> ($f_1, f_2$ are regarded as integral.\\
31: \> The product is computed by FFT.)\\
32: return $\phi_m(\sum F_j)$\\
33: (For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
34: \end{tabbing}
35:
36: Figure \ref{speedup}
37: shows the speedup factor under the above distributed computation
38: on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$
39: with 3000bit coefficients are generated and the product is computed.
40: The machine is Fujitsu AP3000,
41: a cluster of Sun connected with a high speed network and MPI over the
42: network is used to implement OpenXM.
43: \begin{figure}[htbp]
44: \epsfxsize=8.5cm
45: \epsffile{speedup.ps}
46: \caption{Speedup factor}
47: \label{speedup}
48: \end{figure}
49:
50: The task of a client is the generation and partition of $P$, sending
51: and receiving of polynomials and the synthesis of the result. If the
1.5 takayama 52: number of servers is $L$ and the inputs are fixed, then the cost to
53: compute $F_j$ in parallel is $O(1/L)$, whereas the cost
54: to send and receive polynomials is $O(L)$
1.3 noro 55: because we don't have the broadcast and the reduce
56: operations. Therefore the speedup is limited and the upper bound of
1.4 noro 57: the speedup factor depends on the ratio of
58: the computational cost and the communication cost.
59: Figure \ref{speedup} shows that
1.5 takayama 60: the speedup is satisfactory if the degree is large and $L$
61: is not large, say, up to 10 under the above envionment.
62: If OpenXM provides the broadcast and the reduce operations, the cost of
63: sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(log_2L)$
64: and we will obtain better results in such a case.
1.3 noro 65:
1.5 takayama 66: \subsubsection{Competitive distributed computation by various strategies}
1.3 noro 67:
68: Singular \cite{Singular} implements {\tt MP} interface for distributed
69: computation and a competitive Gr\"obner basis computation is
1.5 takayama 70: illustrated as an example of distributed computation.
71: Such a distributed computation is also possible on OpenXM.
72: The following {\tt Risa/Asir} function computes a Gr\"obner basis by
1.3 noro 73: starting the computations simultaneously from the homogenized input and
74: the input itself. The client watches the streams by {\tt ox\_select()}
75: and The result which is returned first is taken. Then the remaining
76: server is reset.
77:
78: \begin{verbatim}
79: /* G:set of polys; V:list of variables */
80: /* O:type of order; P0,P1: id's of servers */
81: def dgr(G,V,O,P0,P1)
82: {
83: P = [P0,P1]; /* server list */
84: map(ox_reset,P); /* reset servers */
85: /* P0 executes non-homogenized computation */
86: ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
87: /* P1 executes homogenized computation */
88: ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
89: map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
90: F = ox_select(P); /* wait for data */
91: /* F[0] is a server's id which is ready */
92: R = ox_get(F[0]);
93: if ( F[0] == P0 ) {
94: Win = "nonhomo"; Lose = P1;
95: } else {
96: Win = "homo"; Lose = P0;
97: }
98: ox_reset(Lose); /* reset the loser */
99: return [Win,R];
100: }
101: \end{verbatim}
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>