=================================================================== RCS file: /home/cvs/OpenXM/doc/issac2000/homogeneous-network.tex,v retrieving revision 1.4 retrieving revision 1.8 diff -u -p -r1.4 -r1.8 --- OpenXM/doc/issac2000/homogeneous-network.tex 2000/01/11 05:17:11 1.4 +++ OpenXM/doc/issac2000/homogeneous-network.tex 2000/01/16 03:15:49 1.8 @@ -1,9 +1,9 @@ -% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.3 2000/01/07 06:27:55 noro Exp $ +% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.7 2000/01/15 06:11:17 takayama Exp $ -\section{Applications} \subsection{Distributed computation with homogeneous servers} +\label{section:homog} -OpenXM also aims at speedup by a distributed computation +One of the aims of OpenXM is a parallel speedup by a distributed computation with homogeneous servers. As the current specification of OpenXM does not include communication between servers, one cannot expect the maximal parallel speedup. However it is possible to execute @@ -17,24 +17,22 @@ by FFT over small finite fields and Chinese remainder It can be easily parallelized: \begin{tabbing} -Input :\= $f_1, f_2 \in Z[x]$\\ -\> such that $deg(f_1), deg(f_2) < 2^M$\\ -Output : $f = f_1f_2 \bmod p$\\ -$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\ +Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\ +Output : $f = f_1f_2$ \\ +$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\ \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\ Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\ for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\ Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\ \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\ -\> ($f_1, f_2$ are regarded as integral.\\ -\> The product is computed by FFT.)\\ +\> (The product is computed by FFT.)\\ return $\phi_m(\sum F_j)$\\ -(For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$) +(For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$) \end{tabbing} Figure \ref{speedup} shows the speedup factor under the above distributed computation -on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$ +on Risa/Asir. For each $n$, two polynomials of degree $n$ with 3000bit coefficients are generated and the product is computed. The machine is Fujitsu AP3000, a cluster of Sun connected with a high speed network and MPI over the @@ -46,30 +44,27 @@ network is used to implement OpenXM. \label{speedup} \end{figure} -The task of a client is the generation and partition of $P$, sending -and receiving of polynomials and the synthesis of the result. If the -number of servers is $L$ and the inputs are fixed, then the time to -compute $F_j$ in parallel is proportional to $1/L$, whereas the time -for sending and receiving of polynomials is proportional to $L$ -because we don't have the broadcast and the reduce -operations. Therefore the speedup is limited and the upper bound of +If the number of servers is $L$ and the inputs are fixed, then the cost to +compute $F_j$ in parallel is $O(1/L)$, whereas the cost +to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and +{\tt ox\_pop\_cmo()} are repeatedly applied on the client. +Therefore the speedup is limited and the upper bound of the speedup factor depends on the ratio of -the computational cost and the communication cost. +the computational cost and the communication cost for each unit operation. Figure \ref{speedup} shows that -the speedup is satisfactory if the degree is large and the number of -servers is not large, say, up to 10 under the above envionment. +the speedup is satisfactory if the degree is large and $L$ +is not large, say, up to 10 under the above envionment. +If OpenXM provides the broadcast and the reduce operations, the cost of +sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(log_2L)$ +and we can expect better results in such a case. -\subsubsection{Gr\"obner basis computation by various methods} +\subsubsection{Competitive distributed computation by various strategies} Singular \cite{Singular} implements {\tt MP} interface for distributed computation and a competitive Gr\"obner basis computation is -illustrated as an example of distributed computation. However, -interruption has not implemented yet and the looser process have to be -killed explicitly. As stated in Section \ref{secsession} OpenXM -provides such a function and one can safely reset the server and -continue to use it. Furthermore, if a client provides synchronous I/O -multiplexing by {\tt select()}, then a polling is not necessary. The -following {\tt Risa/Asir} function computes a Gr\"obner basis by +illustrated as an example of distributed computation. +Such a distributed computation is also possible on OpenXM. +The following Risa/Asir function computes a Gr\"obner basis by starting the computations simultaneously from the homogenized input and the input itself. The client watches the streams by {\tt ox\_select()} and The result which is returned first is taken. Then the remaining