=================================================================== RCS file: /home/cvs/OpenXM/doc/issac2000/homogeneous-network.tex,v retrieving revision 1.5 retrieving revision 1.13 diff -u -p -r1.5 -r1.13 --- OpenXM/doc/issac2000/homogeneous-network.tex 2000/01/15 00:20:45 1.5 +++ OpenXM/doc/issac2000/homogeneous-network.tex 2000/01/17 08:50:56 1.13 @@ -1,7 +1,7 @@ -% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.4 2000/01/11 05:17:11 noro Exp $ +% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.12 2000/01/17 08:06:15 noro Exp $ -\section{Applications} \subsection{Distributed computation with homogeneous servers} +\label{section:homog} One of the aims of OpenXM is a parallel speedup by a distributed computation with homogeneous servers. As the current specification of OpenXM does @@ -17,28 +17,26 @@ by FFT over small finite fields and Chinese remainder It can be easily parallelized: \begin{tabbing} -Input :\= $f_1, f_2 \in Z[x]$\\ -\> such that $deg(f_1), deg(f_2) < 2^M$\\ -Output : $f = f_1f_2 \bmod p$\\ -$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\ +Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\ +Output : $f = f_1f_2$ \\ +$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\ \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\ Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\ for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\ Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\ \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\ -\> ($f_1, f_2$ are regarded as integral.\\ -\> The product is computed by FFT.)\\ +\> (The product is computed by FFT.)\\ return $\phi_m(\sum F_j)$\\ -(For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$) +(For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$) \end{tabbing} Figure \ref{speedup} shows the speedup factor under the above distributed computation -on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$ +on Risa/Asir. For each $n$, two polynomials of degree $n$ with 3000bit coefficients are generated and the product is computed. -The machine is Fujitsu AP3000, -a cluster of Sun connected with a high speed network and MPI over the -network is used to implement OpenXM. +The machine is FUJITSU AP3000, +a cluster of Sun workstations connected with a high speed network +and MPI over the network is used to implement OpenXM. \begin{figure}[htbp] \epsfxsize=8.5cm \epsffile{speedup.ps} @@ -46,32 +44,31 @@ network is used to implement OpenXM. \label{speedup} \end{figure} -The task of a client is the generation and partition of $P$, sending -and receiving of polynomials and the synthesis of the result. If the -number of servers is $L$ and the inputs are fixed, then the cost to +If the number of servers is $L$ and the inputs are fixed, then the cost to compute $F_j$ in parallel is $O(1/L)$, whereas the cost -to send and receive polynomials is $O(L)$ -because we don't have the broadcast and the reduce -operations. Therefore the speedup is limited and the upper bound of +to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and +{\tt ox\_pop\_cmo()} are repeatedly applied on the client. +Therefore the speedup is limited and the upper bound of the speedup factor depends on the ratio of -the computational cost and the communication cost. +the computational cost and the communication cost for each unit operation. Figure \ref{speedup} shows that the speedup is satisfactory if the degree is large and $L$ -is not large, say, up to 10 under the above envionment. -If OpenXM provides the broadcast and the reduce operations, the cost of -sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(log_2L)$ -and we will obtain better results in such a case. +is not large, say, up to 10 under the above environment. +If OpenXM provides operations for the broadcast and the reduction +such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of +sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(\log_2L)$ +and we can expect better results in such a case. \subsubsection{Competitive distributed computation by various strategies} -Singular \cite{Singular} implements {\tt MP} interface for distributed +SINGULAR \cite{Singular} implements {\it MP} interface for distributed computation and a competitive Gr\"obner basis computation is illustrated as an example of distributed computation. Such a distributed computation is also possible on OpenXM. -The following {\tt Risa/Asir} function computes a Gr\"obner basis by +The following Risa/Asir function computes a Gr\"obner basis by starting the computations simultaneously from the homogenized input and the input itself. The client watches the streams by {\tt ox\_select()} -and The result which is returned first is taken. Then the remaining +and the result which is returned first is taken. Then the remaining server is reset. \begin{verbatim}