Annotation of OpenXM/doc/ascm2001p/homogeneous-network.tex, Revision 1.1
1.1 ! noro 1: % $OpenXM$
! 2:
! 3: \subsection{Distributed computation with homogeneous servers}
! 4: \label{section:homog}
! 5:
! 6: One of the aims of OpenXM is a parallel speedup by a distributed computation
! 7: with homogeneous servers. As the current specification of OpenXM does
! 8: not include communication between servers, one cannot expect
! 9: the maximal parallel speedup. However it is possible to execute
! 10: several types of distributed computation as follows.
! 11:
! 12: \subsubsection{Product of univariate polynomials}
! 13:
! 14: Shoup \cite{Shoup} showed that the product of univariate polynomials
! 15: with large degrees and large coefficients can be computed efficiently
! 16: by FFT over small finite fields and Chinese remainder theorem,
! 17: which can be easily parallelized.
! 18: %
! 19: %\begin{tabbing}
! 20: %Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\
! 21: %Output : $f = f_1f_2$ \\
! 22: %$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\
! 23: %\> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
! 24: %Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
! 25: %for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
! 26: %Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
! 27: %\> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
! 28: %\> (The product is computed by FFT.)\\
! 29: %return $\phi_m(\sum F_j)$\\
! 30: %(For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
! 31: %\end{tabbing}
! 32: %
! 33: Figure \ref{speedup}
! 34: shows the speedup factor under the above distributed computation
! 35: on Risa/Asir. For each $n$, two polynomials of degree $n$
! 36: with 3000bit coefficients are generated and the product is computed.
! 37: The machine is FUJITSU AP3000,
! 38: a cluster of Sun workstations connected with a high speed network
! 39: and MPI over the network is used to implement OpenXM.
! 40: \begin{figure}[htbp]
! 41: \epsfxsize=10cm
! 42: \epsffile{speedup.ps}
! 43: \caption{Speedup factor}
! 44: \label{speedup}
! 45: \end{figure}
! 46: If the number of servers is $L$ and the inputs are fixed, then the cost to
! 47: compute the products modulo some integers in parallel is $O(1/L)$,
! 48: whereas the cost
! 49: to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and
! 50: {\tt ox\_pop\_cmo()} are repeatedly applied on the client.
! 51: Therefore the speedup is limited and the upper bound of
! 52: the speedup factor depends on the ratio of
! 53: the computational cost and the communication cost for each unit operation.
! 54: Figure \ref{speedup} shows that
! 55: the speedup is satisfactory if the degree is large and $L$
! 56: is not large, say, up to 10 under the above environment.
! 57: If OpenXM provides collective operations for broadcast and reduction
! 58: such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of
! 59: broadcasting the inputs and gathering the results on the servers
! 60: may be reduced to $O(\log_2L)$
! 61: and we can expect better results in such a case. In order to implement
! 62: such operations we need new specifications for inter-sever communication
! 63: and the session management, which will be proposed as OpenXM-RFC 102.
! 64: We note that preliminary experiments show the collective operations
! 65: work well on OpenXM.
! 66:
! 67: %\subsubsection{Competitive distributed computation by various strategies}
! 68: %
! 69: %SINGULAR \cite{Singular} implements {\it MP} interface for distributed
! 70: %computation and a competitive Gr\"obner basis computation is
! 71: %illustrated as an example of distributed computation.
! 72: %Such a distributed computation is also possible on OpenXM as follows:
! 73: %
! 74: %The client creates two servers and it requests
! 75: %Gr\"obner basis comutations from the homogenized input and the input itself
! 76: %to the servers.
! 77: %The client watches the streams by {\tt ox\_select()}
! 78: %and the result which is returned first is taken. Then the remaining
! 79: %server is reset.
! 80: %
! 81: %\begin{verbatim}
! 82: %/* G:set of polys; V:list of variables */
! 83: %/* O:type of order; P0,P1: id's of servers */
! 84: %def dgr(G,V,O,P0,P1)
! 85: %{
! 86: % P = [P0,P1]; /* server list */
! 87: % map(ox_reset,P); /* reset servers */
! 88: % /* P0 executes non-homogenized computation */
! 89: % ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
! 90: % /* P1 executes homogenized computation */
! 91: % ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
! 92: % map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
! 93: % F = ox_select(P); /* wait for data */
! 94: % /* F[0] is a server's id which is ready */
! 95: % R = ox_get(F[0]);
! 96: % if ( F[0] == P0 ) {
! 97: % Win = "nonhomo"; Lose = P1;
! 98: % } else {
! 99: % Win = "homo"; Lose = P0;
! 100: % }
! 101: % ox_reset(Lose); /* reset the loser */
! 102: % return [Win,R];
! 103: %}
! 104: %\end{verbatim}
! 105:
! 106: \subsubsection{Nesting of client-server communication}
! 107:
! 108: Under OpenXM-RFC 100 an OpenXM server can be a client of other servers.
! 109: Figure \ref{tree} illustrates a tree-like structure of an OpenXM
! 110: client-server communication.
! 111:
! 112: \begin{figure}
! 113: \label{tree}
! 114: \begin{center}
! 115: \begin{picture}(200,140)(0,0)
! 116: \put(70,120){\framebox(40,15){client}}
! 117: \put(20,60){\framebox(40,15){server}}
! 118: \put(70,60){\framebox(40,15){server}}
! 119: \put(120,60){\framebox(40,15){server}}
! 120: \put(0,0){\framebox(40,15){server}}
! 121: \put(50,0){\framebox(40,15){server}}
! 122: \put(135,0){\framebox(40,15){server}}
! 123:
! 124: \put(90,120){\vector(-1,-1){43}}
! 125: \put(90,120){\vector(0,-1){43}}
! 126: \put(90,120){\vector(1,-1){43}}
! 127: \put(40,60){\vector(-1,-2){22}}
! 128: \put(40,60){\vector(1,-2){22}}
! 129: \put(140,60){\vector(1,-3){14}}
! 130: \end{picture}
! 131: \caption{Tree-like structure of client-server communication}
! 132: \end{center}
! 133: \end{figure}
! 134:
! 135: Such a computational model is useful for parallel implementation of
! 136: algorithms whose task can be divided into subtasks recursively.
! 137:
! 138: %A typical example is {\it quicksort}, where an array to be sorted is
! 139: %partitioned into two sub-arrays and the algorithm is applied to each
! 140: %sub-array. In each level of recursion, two subtasks are generated
! 141: %and one can ask other OpenXM servers to execute them.
! 142: %Though it makes little contribution to the efficiency in the case of
! 143: %quicksort, we present an Asir program of this distributed quicksort
! 144: %to demonstrate that OpenXM gives an easy way to test this algorithm.
! 145: %In the program, a predefined constant {\tt LevelMax} determines
! 146: %whether new servers are launched or whole subtasks are done on the server.
! 147: %
! 148: %\begin{verbatim}
! 149: %#define LevelMax 2
! 150: %extern Proc1, Proc2;
! 151: %Proc1 = -1$ Proc2 = -1$
! 152: %
! 153: %/* sort [A[P],...,A[Q]] by quicksort */
! 154: %def quickSort(A,P,Q,Level) {
! 155: % if (Q-P < 1) return A;
! 156: % Mp = idiv(P+Q,2); M = A[Mp]; B = P; E = Q;
! 157: % while (1) {
! 158: % while (A[B] < M) B++;
! 159: % while (A[E] > M && B <= E) E--;
! 160: % if (B >= E) break;
! 161: % else { T = A[B]; A[B] = A[E]; A[E] = T; E--; }
! 162: % }
! 163: % if (E < P) E = P;
! 164: % if (Level < LevelMax) {
! 165: % /* launch new servers if necessary */
! 166: % if (Proc1 == -1) Proc1 = ox_launch(0);
! 167: % if (Proc2 == -1) Proc2 = ox_launch(0);
! 168: % /* send the requests to the servers */
! 169: % ox_rpc(Proc1,"quickSort",A,P,E,Level+1);
! 170: % ox_rpc(Proc2,"quickSort",A,E+1,Q,Level+1);
! 171: % if (E-P < Q-E) {
! 172: % A1 = ox_pop_local(Proc1);
! 173: % A2 = ox_pop_local(Proc2);
! 174: % }else{
! 175: % A2 = ox_pop_local(Proc2);
! 176: % A1 = ox_pop_local(Proc1);
! 177: % }
! 178: % for (I=P; I<=E; I++) A[I] = A1[I];
! 179: % for (I=E+1; I<=Q; I++) A[I] = A2[I];
! 180: % return(A);
! 181: % }else{
! 182: % /* everything is done on this server */
! 183: % quickSort(A,P,E,Level+1);
! 184: % quickSort(A,E+1,Q,Level+1);
! 185: % return(A);
! 186: % }
! 187: %}
! 188: %\end{verbatim}
! 189:
! 190: A typical example is a parallelization of the Cantor-Zassenhaus
! 191: algorithm for polynomial factorization over finite fields.
! 192: which is a recursive algorithm.
! 193: At each level of the recursion, a given polynomial can be
! 194: divided into two non-trivial factors with some probability by using
! 195: a randomly generated polynomial as a {\it separator}.
! 196: We can apply the following simple parallelization:
! 197: When two non-trivial factors are generated on a server,
! 198: one is sent to another server and the other factor is factorized on the server
! 199: itself.
! 200: %\begin{verbatim}
! 201: %/* factorization of F */
! 202: %/* E = degree of irreducible factors in F */
! 203: %def c_z(F,E,Level)
! 204: %{
! 205: % V = var(F); N = deg(F,V);
! 206: % if ( N == E ) return [F];
! 207: % M = field_order_ff(); K = idiv(N,E); L = [F];
! 208: % while ( 1 ) {
! 209: % /* gererate a random polynomial */
! 210: % W = monic_randpoly_ff(2*E,V);
! 211: % /* compute a power of the random polynomial */
! 212: % T = generic_pwrmod_ff(W,F,idiv(M^E-1,2));
! 213: % if ( !(W = T-1) ) continue;
! 214: % /* G = GCD(F,W^((M^E-1)/2)) mod F) */
! 215: % G = ugcd(F,W);
! 216: % if ( deg(G,V) && deg(G,V) < N ) {
! 217: % /* G is a non-trivial factor of F */
! 218: % if ( Level >= LevelMax ) {
! 219: % /* everything is done on this server */
! 220: % L1 = c_z(G,E,Level+1);
! 221: % L2 = c_z(sdiv(F,G),E,Level+1);
! 222: % } else {
! 223: % /* launch a server if necessary */
! 224: % if ( Proc1 < 0 ) Proc1 = ox_launch();
! 225: % /* send a request with Level = Level+1 */
! 226: % /* ox_c_z is a wrapper of c_z on the server */
! 227: % ox_cmo_rpc(Proc1,"ox_c_z",lmptop(G),E,
! 228: % setmod_ff(),Level+1);
! 229: % /* the rest is done on this server */
! 230: % L2 = c_z(sdiv(F,G),E,Level+1);
! 231: % L1 = map(simp_ff,ox_pop_cmo(Proc1));
! 232: % }
! 233: % return append(L1,L2);
! 234: % }
! 235: % }
! 236: %}
! 237: %\end{verbatim}
! 238: %
! 239: %
! 240: %
! 241: %
! 242: %
! 243: %
! 244: %
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>