=================================================================== RCS file: /home/cvs/OpenXM/doc/issac2000/homogeneous-network.tex,v retrieving revision 1.2 retrieving revision 1.7 diff -u -p -r1.2 -r1.7 --- OpenXM/doc/issac2000/homogeneous-network.tex 2000/01/02 07:32:12 1.2 +++ OpenXM/doc/issac2000/homogeneous-network.tex 2000/01/15 06:11:17 1.7 @@ -1,6 +1,100 @@ -% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.1 1999/12/23 10:25:08 takayama Exp $ +% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.6 2000/01/15 02:24:18 takayama Exp $ -\section{Applications} -\subsection{Homogeneous Network} (Noro) +\subsection{Distributed computation with homogeneous servers} +\label{section:homog} -Interactive distributed computation? +One of the aims of OpenXM is a parallel speedup by a distributed computation +with homogeneous servers. As the current specification of OpenXM does +not include communication between servers, one cannot expect +the maximal parallel speedup. However it is possible to execute +several types of distributed computation as follows. + +\subsubsection{Product of univariate polynomials} + +Shoup \cite{Shoup} showed that the product of univariate polynomials +with large degrees and large coefficients can be computed efficiently +by FFT over small finite fields and Chinese remainder theorem. +It can be easily parallelized: + +\begin{tabbing} +Input :\= $f_1, f_2 \in Z[x]$\\ +\> such that $deg(f_1), deg(f_2) < 2^M$\\ +Output : $f = f_1f_2 \bmod p$\\ +$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\ +\> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\ +Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\ +for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\ +Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\ +\> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\ +\> ($f_1, f_2$ are regarded as integral.\\ +\> The product is computed by FFT.)\\ +return $\phi_m(\sum F_j)$\\ +(For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$) +\end{tabbing} + +Figure \ref{speedup} +shows the speedup factor under the above distributed computation +on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$ +with 3000bit coefficients are generated and the product is computed. +The machine is Fujitsu AP3000, +a cluster of Sun connected with a high speed network and MPI over the +network is used to implement OpenXM. +\begin{figure}[htbp] +\epsfxsize=8.5cm +\epsffile{speedup.ps} +\caption{Speedup factor} +\label{speedup} +\end{figure} + +The task of a client is the generation and partition of $P$, sending +and receiving of polynomials and the synthesis of the result. If the +number of servers is $L$ and the inputs are fixed, then the cost to +compute $F_j$ in parallel is $O(1/L)$, whereas the cost +to send and receive polynomials is $O(L)$ +because we don't have the broadcast and the reduce +operations. Therefore the speedup is limited and the upper bound of +the speedup factor depends on the ratio of +the computational cost and the communication cost. +Figure \ref{speedup} shows that +the speedup is satisfactory if the degree is large and $L$ +is not large, say, up to 10 under the above envionment. +If OpenXM provides the broadcast and the reduce operations, the cost of +sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(log_2L)$ +and we will obtain better results in such a case. + +\subsubsection{Competitive distributed computation by various strategies} + +Singular \cite{Singular} implements {\tt MP} interface for distributed +computation and a competitive Gr\"obner basis computation is +illustrated as an example of distributed computation. +Such a distributed computation is also possible on OpenXM. +The following {\tt Risa/Asir} function computes a Gr\"obner basis by +starting the computations simultaneously from the homogenized input and +the input itself. The client watches the streams by {\tt ox\_select()} +and The result which is returned first is taken. Then the remaining +server is reset. + +\begin{verbatim} +/* G:set of polys; V:list of variables */ +/* O:type of order; P0,P1: id's of servers */ +def dgr(G,V,O,P0,P1) +{ + P = [P0,P1]; /* server list */ + map(ox_reset,P); /* reset servers */ + /* P0 executes non-homogenized computation */ + ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O); + /* P1 executes homogenized computation */ + ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O); + map(ox_push_cmd,P,262); /* 262 = OX_popCMO */ + F = ox_select(P); /* wait for data */ + /* F[0] is a server's id which is ready */ + R = ox_get(F[0]); + if ( F[0] == P0 ) { + Win = "nonhomo"; Lose = P1; + } else { + Win = "homo"; Lose = P0; + } + ox_reset(Lose); /* reset the loser */ + return [Win,R]; +} +\end{verbatim}