OpenXM/doc/issac2000/homogeneous-network.tex - diff

Return to homogeneous-network.tex CVS log

Up to [local] / OpenXM / doc / issac2000

Diff for /OpenXM/doc/issac2000/homogeneous-network.tex between version 1.2 and 1.3

-version 1.2, 2000/01/02 07:32:12
+version 1.3, 2000/01/07 06:27:55
 Line 1
 Line 1
 Line 1
- % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.1 1999/12/23 10:25:08 takayama Exp $
+ % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.2 2000/01/02 07:32:12 takayama Exp $
  \section{Applications}
- \subsection{Homogeneous Network}  (Noro)
+ \subsection{Distributed computation with homogeneous servers}
- Interactive distributed computation?
+ OpenXM also aims at speedup by a distributed computation
+ with homogeneous servers. As the current specification of OpenXM does
+ not include communication between servers, one cannot expect
+ the maximal parallel speedup. However it is possible to execute
+ several types of distributed computation as follows.
+ \subsubsection{Product of univariate polynomials}
+ Shoup \cite{Shoup} showed that the product of univariate polynomials
+ with large degrees and large coefficients can be computed efficiently
+ by FFT over small finite fields and Chinese remainder theorem.
+ It can be easily parallelized:
+ \begin{tabbing}
+ Input :\= $f_1, f_2 \in Z[x]$\\
+ \> such that $deg(f_1), deg(f_2) < 2^M$\\
+ Output : $f = f_1f_2 \bmod p$\\
+ $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\
+ \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
+ Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
+ for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
+ Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
+ \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
+ \> ($f_1, f_2$ are regarded as integral.\\
+ \> The product is computed by FFT.)\\
+ return $\phi_m(\sum F_j)$\\
+ (For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
+ \end{tabbing}
+ Figure \ref{speedup}
+ shows the speedup factor under the above distributed computation
+ on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$
+ with 3000bit coefficients are generated and the product is computed.
+ The machine is Fujitsu AP3000,
+ a cluster of Sun connected with a high speed network and MPI over the
+ network is used to implement OpenXM.
+ \begin{figure}[htbp]
+ \epsfxsize=8.5cm
+ \epsffile{speedup.ps}
+ \caption{Speedup factor}
+ \label{speedup}
+ \end{figure}
+ The task of a client is the generation and partition of $P$, sending
+ and receiving of polynomials and the synthesis of the result. If the
+ number of servers is $L$ and the inputs are fixed, then the time to
+ compute $F_j$ in parallel is proportional to $1/L$, whereas the time
+ for sending and receiving of polynomials is proportional to $L$
+ because we don't have the broadcast and the reduce
+ operations. Therefore the speedup is limited and the upper bound of
+ the speedup factor depends on the communication cost and the degree
+ of inputs. Figure \ref{speedup} shows that
+ the speedup is satisfactory if the degree is large and the number of
+ servers is not large, say, up to 10.
+ \subsubsection{Order counting of an elliptic curve}
+ \subsubsection{Gr\"obner basis computation by various methods}
+ Singular \cite{Singular} implements {\tt MP} interface for distributed
+ computation and a competitive Gr\"obner basis computation is
+ illustrated as an example of distributed computation.  However,
+ interruption has not implemented yet and the looser process have to be
+ killed explicitly. As stated in Section \ref{secsession} OpenXM
+ provides such a function and one can safely reset the server and
+ continue to use it.  Furthermore, if a client provides synchronous I/O
+ multiplexing by {\tt select()}, then a polling is not necessary.  The
+ following {\tt Risa/Asir} function computes a Gr\"obner basis by
+ starting the computations simultaneously from the homogenized input and
+ the input itself.  The client watches the streams by {\tt ox\_select()}
+ and The result which is returned first is taken. Then the remaining
+ server is reset.
+ \begin{verbatim}
+ /* G:set of polys; V:list of variables */
+ /* O:type of order; P0,P1: id's of servers */
+ def dgr(G,V,O,P0,P1)
+ {
+   P = [P0,P1]; /* server list */
+   map(ox_reset,P); /* reset servers */
+   /* P0 executes non-homogenized computation */
+   ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
+   /* P1 executes homogenized computation */
+   ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
+   map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
+   F = ox_select(P); /* wait for data */
+   /* F[0] is a server's id which is ready */
+   R = ox_get(F[0]);
+   if ( F[0] == P0 ) {
+     Win = "nonhomo"; Lose = P1;
+   } else {
+     Win = "homo"; Lose = P0;
+   }
+   ox_reset(Lose); /* reset the loser */
+   return [Win,R];
+ }
+ \end{verbatim}

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>