Annotation of OpenXM/doc/ascm2001p/homogeneous-network.tex, Revision 1.8
1.8 ! noro 1: % $OpenXM: OpenXM/doc/ascm2001p/homogeneous-network.tex,v 1.7 2001/06/20 05:42:47 takayama Exp $
1.1 noro 2:
3: \subsection{Distributed computation with homogeneous servers}
4: \label{section:homog}
5:
6: One of the aims of OpenXM is a parallel speedup by a distributed computation
1.7 takayama 7: with homogeneous servers. Let us see some examples.
1.5 takayama 8: %As the current specification of OpenXM does
9: %not include communication between servers, one cannot expect
10: %the maximal parallel speedup. However it is possible to execute
11: %several types of distributed computation as follows.
1.1 noro 12:
1.3 noro 13: \subsubsection{Competitive distributed computation by various strategies}
1.1 noro 14:
1.8 ! noro 15: SINGULAR \cite{Singular} implements MP interface for distributed
1.3 noro 16: computation and a competitive Gr\"obner basis computation is
1.8 ! noro 17: illustrated as an example of distributed computation by the interface.
1.6 noro 18: Such a distributed computation is also possible on OpenXM.
1.3 noro 19:
20: \begin{verbatim}
1.8 ! noro 21: extern Proc1,Proc2$
! 22: Proc1 = -1$ Proc2 = -1$
1.3 noro 23: /* G:set of polys; V:list of variables */
24: /* Mod: the Ground field GF(Mod); O:type of order */
25: def dgr(G,V,Mod,O)
26: {
27: /* invoke servers if necessary */
28: if ( Proc1 == -1 ) Proc1 = ox_launch();
29: if ( Proc2 == -1 ) Proc2 = ox_launch();
30: P = [Proc1,Proc2];
31: map(ox_reset,P); /* reset servers */
32: /* P0 executes Buchberger algorithm over GF(Mod) */
33: ox_cmo_rpc(P[0],"dp_gr_mod_main",G,V,0,Mod,O);
34: /* P1 executes F4 algorithm over GF(Mod) */
35: ox_cmo_rpc(P[1],"dp_f4_mod_main",G,V,Mod,O);
36: map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
37: F = ox_select(P); /* wait for data */
38: /* F[0] is a server's id which is ready */
39: R = ox_get(F[0]);
40: if ( F[0] == P[0] ) { Win = "Buchberger"; Lose = P[1]; }
41: else { Win = "F4"; Lose = P[0]; }
42: ox_reset(Lose); /* reset the loser */
43: return [Win,R];
44: }
45: \end{verbatim}
1.6 noro 46: In the above Asir program, the client creates two servers and it requests
1.7 takayama 47: Gr\"obner basis computations by the Buchberger algorithm
48: and the $F_4$ algorithm to the servers for the same input.
1.6 noro 49: The client watches the streams by {\tt ox\_select()}
50: and the result which is returned first is taken. Then the remaining
51: server is reset.
1.1 noro 52:
1.4 noro 53: \subsubsection{Nesting of client-server communication}
54:
55: \begin{figure}
56: \label{tree}
57: \begin{center}
58: \begin{picture}(200,70)(0,0)
59: \put(70,70){\framebox(40,15){client}}
60: \put(20,30){\framebox(40,15){server}}
61: \put(70,30){\framebox(40,15){server}}
62: \put(120,30){\framebox(40,15){server}}
63: \put(0,0){\framebox(40,15){server}}
64: \put(50,0){\framebox(40,15){server}}
65: \put(150,0){\framebox(40,15){server}}
66:
67: \put(90,70){\vector(-2,-1){43}}
68: \put(90,70){\vector(0,-1){21}}
69: \put(90,70){\vector(2,-1){43}}
70: \put(40,30){\vector(-2,-1){22}}
71: \put(40,30){\vector(2,-1){22}}
72: \put(140,30){\vector(2,-1){22}}
73: \end{picture}
74: \caption{Tree-like structure of client-server communication}
75: \end{center}
76: \end{figure}
1.8 ! noro 77: %%Prog: load ("dfff"); df_demo(); enter 100.
! 78: Under OpenXM-RFC 100 an OpenXM server can be a client of other servers.
! 79: %Figure \ref{tree}
! 80: Figure 2
! 81: illustrates a tree-like structure of an OpenXM
! 82: client-server communication.
1.4 noro 83: Such a computational model is useful for parallel implementation of
84: algorithms whose task can be divided into subtasks recursively.
85:
1.1 noro 86: %A typical example is {\it quicksort}, where an array to be sorted is
87: %partitioned into two sub-arrays and the algorithm is applied to each
88: %sub-array. In each level of recursion, two subtasks are generated
89: %and one can ask other OpenXM servers to execute them.
90: %Though it makes little contribution to the efficiency in the case of
91: %quicksort, we present an Asir program of this distributed quicksort
92: %to demonstrate that OpenXM gives an easy way to test this algorithm.
93: %In the program, a predefined constant {\tt LevelMax} determines
94: %whether new servers are launched or whole subtasks are done on the server.
95: %
96: %\begin{verbatim}
97: %#define LevelMax 2
98: %extern Proc1, Proc2;
99: %Proc1 = -1$ Proc2 = -1$
100: %
101: %/* sort [A[P],...,A[Q]] by quicksort */
102: %def quickSort(A,P,Q,Level) {
103: % if (Q-P < 1) return A;
104: % Mp = idiv(P+Q,2); M = A[Mp]; B = P; E = Q;
105: % while (1) {
106: % while (A[B] < M) B++;
107: % while (A[E] > M && B <= E) E--;
108: % if (B >= E) break;
109: % else { T = A[B]; A[B] = A[E]; A[E] = T; E--; }
110: % }
111: % if (E < P) E = P;
112: % if (Level < LevelMax) {
113: % /* launch new servers if necessary */
114: % if (Proc1 == -1) Proc1 = ox_launch(0);
115: % if (Proc2 == -1) Proc2 = ox_launch(0);
116: % /* send the requests to the servers */
117: % ox_rpc(Proc1,"quickSort",A,P,E,Level+1);
118: % ox_rpc(Proc2,"quickSort",A,E+1,Q,Level+1);
119: % if (E-P < Q-E) {
120: % A1 = ox_pop_local(Proc1);
121: % A2 = ox_pop_local(Proc2);
122: % }else{
123: % A2 = ox_pop_local(Proc2);
124: % A1 = ox_pop_local(Proc1);
125: % }
126: % for (I=P; I<=E; I++) A[I] = A1[I];
127: % for (I=E+1; I<=Q; I++) A[I] = A2[I];
128: % return(A);
129: % }else{
130: % /* everything is done on this server */
131: % quickSort(A,P,E,Level+1);
132: % quickSort(A,E+1,Q,Level+1);
133: % return(A);
134: % }
135: %}
136: %\end{verbatim}
1.3 noro 137: %
1.4 noro 138: A typical example is a parallelization of the Cantor-Zassenhaus
1.7 takayama 139: algorithm for polynomial factorization over finite fields,
1.4 noro 140: which is a recursive algorithm.
141: At each level of the recursion, a given polynomial can be
142: divided into two non-trivial factors with some probability by using
143: a randomly generated polynomial as a {\it separator}.
144: We can apply the following simple parallelization:
1.8 ! noro 145: when two non-trivial factors are generated on a server,
1.4 noro 146: one is sent to another server and the other factor is factorized on the server
147: itself.
1.1 noro 148: %\begin{verbatim}
149: %/* factorization of F */
150: %/* E = degree of irreducible factors in F */
151: %def c_z(F,E,Level)
152: %{
153: % V = var(F); N = deg(F,V);
154: % if ( N == E ) return [F];
155: % M = field_order_ff(); K = idiv(N,E); L = [F];
156: % while ( 1 ) {
1.7 takayama 157: % /* generate a random polynomial */
1.1 noro 158: % W = monic_randpoly_ff(2*E,V);
159: % /* compute a power of the random polynomial */
160: % T = generic_pwrmod_ff(W,F,idiv(M^E-1,2));
161: % if ( !(W = T-1) ) continue;
162: % /* G = GCD(F,W^((M^E-1)/2)) mod F) */
163: % G = ugcd(F,W);
164: % if ( deg(G,V) && deg(G,V) < N ) {
165: % /* G is a non-trivial factor of F */
166: % if ( Level >= LevelMax ) {
167: % /* everything is done on this server */
168: % L1 = c_z(G,E,Level+1);
169: % L2 = c_z(sdiv(F,G),E,Level+1);
170: % } else {
171: % /* launch a server if necessary */
172: % if ( Proc1 < 0 ) Proc1 = ox_launch();
173: % /* send a request with Level = Level+1 */
174: % /* ox_c_z is a wrapper of c_z on the server */
175: % ox_cmo_rpc(Proc1,"ox_c_z",lmptop(G),E,
176: % setmod_ff(),Level+1);
177: % /* the rest is done on this server */
178: % L2 = c_z(sdiv(F,G),E,Level+1);
179: % L1 = map(simp_ff,ox_pop_cmo(Proc1));
180: % }
181: % return append(L1,L2);
182: % }
183: % }
184: %}
185: %\end{verbatim}
186: %
187: %
188: %
189: %
190: %
191: %
192: %
1.2 noro 193:
194: \subsubsection{Product of univariate polynomials}
195:
196: Shoup \cite{Shoup} showed that the product of univariate polynomials
197: with large degrees and large coefficients can be computed efficiently
198: by FFT over small finite fields and Chinese remainder theorem.
199: It can be easily parallelized:
200:
201: \begin{tabbing}
202: Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\
203: Output : $f = f_1f_2$ \\
204: $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\
205: \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
206: Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
207: for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
208: Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
209: \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
210: \> (The product is computed by FFT.)\\
211: return $\phi_m(\sum F_j)$\\
212: (For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
213: \end{tabbing}
214:
215: Figure \ref{speedup}
216: shows the speedup factor under the above distributed computation
217: on Risa/Asir. For each $n$, two polynomials of degree $n$
218: with 3000bit coefficients are generated and the product is computed.
219: The machine is FUJITSU AP3000,
220: a cluster of Sun workstations connected with a high speed network
221: and MPI over the network is used to implement OpenXM.
222: \begin{figure}[htbp]
223: \epsfxsize=8.5cm
224: \epsffile{speedup.ps}
225: \caption{Speedup factor}
226: \label{speedup}
227: \end{figure}
228:
229: If the number of servers is $L$ and the inputs are fixed, then the cost to
230: compute $F_j$ in parallel is $O(1/L)$, whereas the cost
231: to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and
232: {\tt ox\_pop\_cmo()} are repeatedly applied on the client.
233: Therefore the speedup is limited and the upper bound of
234: the speedup factor depends on the ratio of
235: the computational cost and the communication cost for each unit operation.
236: Figure \ref{speedup} shows that
237: the speedup is satisfactory if the degree is large and $L$
238: is not large, say, up to 10 under the above environment.
239: If OpenXM provides collective operations for broadcast and reduction
240: such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of
241: sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(\log_2L)$
242: and we can expect better results in such a case. In order to implement
243: such operations we need new specifications for inter-sever communication
244: and the session management, which will be proposed as OpenXM-RFC 102.
245: We note that preliminary experiments show the collective operations
246: work well on OpenXM.
247:
248: %\subsubsection{Competitive distributed computation by various strategies}
249: %
250: %SINGULAR \cite{Singular} implements {\it MP} interface for distributed
251: %computation and a competitive Gr\"obner basis computation is
252: %illustrated as an example of distributed computation.
253: %Such a distributed computation is also possible on OpenXM as follows:
254: %
255: %The client creates two servers and it requests
1.7 takayama 256: %Gr\"obner basis computations from the homogenized input and the input itself
1.2 noro 257: %to the servers.
258: %The client watches the streams by {\tt ox\_select()}
259: %and the result which is returned first is taken. Then the remaining
260: %server is reset.
261: %
262: %\begin{verbatim}
263: %/* G:set of polys; V:list of variables */
264: %/* O:type of order; P0,P1: id's of servers */
265: %def dgr(G,V,O,P0,P1)
266: %{
267: % P = [P0,P1]; /* server list */
268: % map(ox_reset,P); /* reset servers */
269: % /* P0 executes non-homogenized computation */
270: % ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
271: % /* P1 executes homogenized computation */
272: % ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
273: % map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
274: % F = ox_select(P); /* wait for data */
275: % /* F[0] is a server's id which is ready */
276: % R = ox_get(F[0]);
277: % if ( F[0] == P0 ) {
278: % Win = "nonhomo"; Lose = P1;
279: % } else {
280: % Win = "homo"; Lose = P0;
281: % }
282: % ox_reset(Lose); /* reset the loser */
283: % return [Win,R];
284: %}
285: %\end{verbatim}
286:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>