OpenXM/doc/issac2000/homogeneous-network.tex - annotate

Return to homogeneous-network.tex CVS log
Up to [local] / OpenXM / doc / issac2000
Annotation of OpenXM/doc/issac2000/homogeneous-network.tex, Revision 1.9

1.9     ! noro        1: % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.8 2000/01/16 03:15:49 noro Exp $
1.2       takayama    2:
1.3       noro        3: \subsection{Distributed computation with homogeneous servers}
1.6       takayama    4: \label{section:homog}
1.2       takayama    5:
1.5       takayama    6: One of the aims of OpenXM is a parallel speedup by a distributed computation
1.3       noro        7: with homogeneous servers. As the current specification of OpenXM does
                      8: not include communication between servers, one cannot expect
                      9: the maximal parallel speedup. However it is possible to execute
                     10: several types of distributed computation as follows.
                     11:
                     12: \subsubsection{Product of univariate polynomials}
                     13:
                     14: Shoup \cite{Shoup} showed that the product of univariate polynomials
                     15: with large degrees and large coefficients can be computed efficiently
                     16: by FFT over small finite fields and Chinese remainder theorem.
                     17: It can be easily parallelized:
                     18:
                     19: \begin{tabbing}
1.8       noro       20: Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\
                     21: Output : $f = f_1f_2$ \\
                     22: $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\
1.3       noro       23: \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
                     24: Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
                     25: for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
                     26: Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
                     27: \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
1.8       noro       28: \> (The product is computed by FFT.)\\
1.3       noro       29: return $\phi_m(\sum F_j)$\\
1.8       noro       30: (For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
1.3       noro       31: \end{tabbing}
                     32:
                     33: Figure \ref{speedup}
                     34: shows the speedup factor under the above distributed computation
1.8       noro       35: on Risa/Asir. For each $n$, two polynomials of degree $n$
1.3       noro       36: with 3000bit coefficients are generated and the product is computed.
                     37: The machine is Fujitsu AP3000,
                     38: a cluster of Sun connected with a high speed network and MPI over the
                     39: network is used to implement OpenXM.
                     40: \begin{figure}[htbp]
                     41: \epsfxsize=8.5cm
                     42: \epsffile{speedup.ps}
                     43: \caption{Speedup factor}
                     44: \label{speedup}
                     45: \end{figure}
                     46:
1.8       noro       47: If the number of servers is $L$ and the inputs are fixed, then the cost to
1.5       takayama   48: compute $F_j$ in parallel is $O(1/L)$, whereas the cost
1.8       noro       49: to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and
                     50: {\tt ox\_pop\_cmo()} are repeatedly applied on the client.
                     51: Therefore the speedup is limited and the upper bound of
1.4       noro       52: the speedup factor depends on the ratio of
1.8       noro       53: the computational cost and the communication cost for each unit operation.
1.4       noro       54: Figure \ref{speedup} shows that
1.5       takayama   55: the speedup is satisfactory if the degree is large and $L$
                     56: is not large, say, up to 10 under the above envionment.
                     57: If OpenXM provides the broadcast and the reduce operations, the cost of
                     58: sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(log_2L)$
1.8       noro       59: and we can expect better results in such a case.
1.3       noro       60:
1.5       takayama   61: \subsubsection{Competitive distributed computation by various strategies}
1.3       noro       62:
1.9     ! noro       63: SINGULAR \cite{Singular} implements {\tt MP} interface for distributed
1.3       noro       64: computation and a competitive Gr\"obner basis computation is
1.5       takayama   65: illustrated as an example of distributed computation.
                     66: Such a distributed computation is also possible on OpenXM.
1.8       noro       67: The following Risa/Asir function computes a Gr\"obner basis by
1.3       noro       68: starting the computations simultaneously from the homogenized input and
                     69: the input itself.  The client watches the streams by {\tt ox\_select()}
1.9     ! noro       70: and the result which is returned first is taken. Then the remaining
1.3       noro       71: server is reset.
                     72:
                     73: \begin{verbatim}
                     74: /* G:set of polys; V:list of variables */
                     75: /* O:type of order; P0,P1: id's of servers */
                     76: def dgr(G,V,O,P0,P1)
                     77: {
                     78:   P = [P0,P1]; /* server list */
                     79:   map(ox_reset,P); /* reset servers */
                     80:   /* P0 executes non-homogenized computation */
                     81:   ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
                     82:   /* P1 executes homogenized computation */
                     83:   ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
                     84:   map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
                     85:   F = ox_select(P); /* wait for data */
                     86:   /* F[0] is a server's id which is ready */
                     87:   R = ox_get(F[0]);
                     88:   if ( F[0] == P0 ) {
                     89:     Win = "nonhomo"; Lose = P1;
                     90:   } else {
                     91:     Win = "homo"; Lose = P0;
                     92:   }
                     93:   ox_reset(Lose); /* reset the loser */
                     94:   return [Win,R];
                     95: }
                     96: \end{verbatim}
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>