OpenXM/doc/ascm2001/homogeneous-network.tex - annotate

Return to homogeneous-network.tex CVS log
Up to [local] / OpenXM / doc / ascm2001
Annotation of OpenXM/doc/ascm2001/homogeneous-network.tex, Revision 1.2

1.2     ! noro        1: % $OpenXM: OpenXM/doc/ascm2001/homogeneous-network.tex,v 1.1 2001/03/07 02:42:10 noro Exp $
1.1       noro        2:
                      3: \subsection{Distributed computation with homogeneous servers}
                      4: \label{section:homog}
                      5:
                      6: One of the aims of OpenXM is a parallel speedup by a distributed computation
                      7: with homogeneous servers. As the current specification of OpenXM does
                      8: not include communication between servers, one cannot expect
                      9: the maximal parallel speedup. However it is possible to execute
                     10: several types of distributed computation as follows.
                     11:
                     12: \subsubsection{Product of univariate polynomials}
                     13:
                     14: Shoup \cite{Shoup} showed that the product of univariate polynomials
                     15: with large degrees and large coefficients can be computed efficiently
                     16: by FFT over small finite fields and Chinese remainder theorem.
                     17: It can be easily parallelized:
                     18:
                     19: \begin{tabbing}
                     20: Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\
                     21: Output : $f = f_1f_2$ \\
                     22: $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\
                     23: \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
                     24: Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
                     25: for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
                     26: Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
                     27: \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
                     28: \> (The product is computed by FFT.)\\
                     29: return $\phi_m(\sum F_j)$\\
                     30: (For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
                     31: \end{tabbing}
                     32:
                     33: Figure \ref{speedup}
                     34: shows the speedup factor under the above distributed computation
                     35: on Risa/Asir. For each $n$, two polynomials of degree $n$
                     36: with 3000bit coefficients are generated and the product is computed.
                     37: The machine is FUJITSU AP3000,
                     38: a cluster of Sun workstations connected with a high speed network
                     39: and MPI over the network is used to implement OpenXM.
                     40: \begin{figure}[htbp]
                     41: \epsfxsize=8.5cm
                     42: \epsffile{speedup.ps}
                     43: \caption{Speedup factor}
                     44: \label{speedup}
                     45: \end{figure}
                     46:
                     47: If the number of servers is $L$ and the inputs are fixed, then the cost to
                     48: compute $F_j$ in parallel is $O(1/L)$, whereas the cost
                     49: to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and
                     50: {\tt ox\_pop\_cmo()} are repeatedly applied on the client.
                     51: Therefore the speedup is limited and the upper bound of
                     52: the speedup factor depends on the ratio of
                     53: the computational cost and the communication cost for each unit operation.
                     54: Figure \ref{speedup} shows that
                     55: the speedup is satisfactory if the degree is large and $L$
                     56: is not large, say, up to 10 under the above environment.
1.2     ! noro       57: If OpenXM provides collective operations for broadcast and reduction
1.1       noro       58: such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of
                     59: sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(\log_2L)$
1.2     ! noro       60: and we can expect better results in such a case. In order to implement
        !            61: such operations we need new specifications for inter-sever communication
        !            62: and the session management. The will be proposed as OpenXM-RFC-102 in future.
        !            63: We note that preliminary experiments shows the collective operations
        !            64: works well on OpenXM.
1.1       noro       65:
                     66: \subsubsection{Competitive distributed computation by various strategies}
                     67:
                     68: SINGULAR \cite{Singular} implements {\it MP} interface for distributed
                     69: computation and a competitive Gr\"obner basis computation is
                     70: illustrated as an example of distributed computation.
                     71: Such a distributed computation is also possible on OpenXM.
                     72: The following Risa/Asir function computes a Gr\"obner basis by
                     73: starting the computations simultaneously from the homogenized input and
                     74: the input itself.  The client watches the streams by {\tt ox\_select()}
                     75: and the result which is returned first is taken. Then the remaining
                     76: server is reset.
                     77:
                     78: \begin{verbatim}
                     79: /* G:set of polys; V:list of variables */
                     80: /* O:type of order; P0,P1: id's of servers */
                     81: def dgr(G,V,O,P0,P1)
                     82: {
                     83:   P = [P0,P1]; /* server list */
                     84:   map(ox_reset,P); /* reset servers */
                     85:   /* P0 executes non-homogenized computation */
                     86:   ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
                     87:   /* P1 executes homogenized computation */
                     88:   ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
                     89:   map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
                     90:   F = ox_select(P); /* wait for data */
                     91:   /* F[0] is a server's id which is ready */
                     92:   R = ox_get(F[0]);
                     93:   if ( F[0] == P0 ) {
                     94:     Win = "nonhomo"; Lose = P1;
                     95:   } else {
                     96:     Win = "homo"; Lose = P0;
                     97:   }
                     98:   ox_reset(Lose); /* reset the loser */
                     99:   return [Win,R];
                    100: }
                    101: \end{verbatim}
1.2     ! noro      102:
        !           103: \subsubsection{Nesting of client-server communication}
        !           104:
        !           105: Under OpenXM-RFC-100 an OpenXM server can be a client of other servers.
        !           106: Figure \ref{tree} illustrates a tree-like structure of an OpenXM
        !           107: client-server communication.
        !           108: \begin{figure}
        !           109: \label{tree}
        !           110: \begin{center}
        !           111: \begin{picture}(200,140)(0,0)
        !           112: \put(70,120){\framebox(40,15){client}}
        !           113: \put(20,60){\framebox(40,15){server}}
        !           114: \put(70,60){\framebox(40,15){server}}
        !           115: \put(120,60){\framebox(40,15){server}}
        !           116: \put(0,0){\framebox(40,15){server}}
        !           117: \put(50,0){\framebox(40,15){server}}
        !           118: \put(135,0){\framebox(40,15){server}}
        !           119:
        !           120: \put(90,120){\vector(-1,-1){43}}
        !           121: \put(90,120){\vector(0,-1){43}}
        !           122: \put(90,120){\vector(1,-1){43}}
        !           123: \put(40,60){\vector(-1,-2){22}}
        !           124: \put(40,60){\vector(1,-2){22}}
        !           125: \put(140,60){\vector(1,-3){14}}
        !           126: \end{picture}
        !           127: \caption{Tree-like structure of client-server communication}
        !           128: \end{center}
        !           129: \end{figure}
        !           130: Such a computational model is useful for parallel implementation of
        !           131: algorithms whose task can be divided into subtasks recursively.  A
        !           132: typical example is {\it quicksort}, where an array to be sorted is
        !           133: partitioned into two sub-arrays and the algorithm is applied to each
        !           134: sub-array. In each level of recursion, two subtasks are generated
        !           135: and one can ask other OpenXM servers to execute them. Though
        !           136: this make little contribution to the efficiency, it is worth
        !           137: to show that such an attempt is very easy under OpenXM.
        !           138: Here is an Asir program.
        !           139: A predefined constant {\tt LevelMax} determines
        !           140: whether new servers are launched or whole subtasks are done on the server.
        !           141:
        !           142: \begin{verbatim}
        !           143: #define LevelMax 2
        !           144: extern Proc1, Proc2;
        !           145: Proc1 = -1$ Proc2 = -1$
        !           146:
        !           147: /* sort [A[P],...,A[Q]] by quicksort */
        !           148: def quickSort(A,P,Q,Level) {
        !           149:   if (Q-P < 1) return A;
        !           150:   Mp = idiv(P+Q,2); M = A[Mp]; B = P; E = Q;
        !           151:   while (1) {
        !           152:     while (A[B] < M) B++;
        !           153:     while (A[E] > M && B <= E) E--;
        !           154:     if (B >= E) break;
        !           155:     else { T = A[B]; A[B] = A[E]; A[E] = T; E--; }
        !           156:   }
        !           157:   if (E < P) E = P;
        !           158:   if (Level < LevelMax) {
        !           159:    /* launch new servers if necessary */
        !           160:    if (Proc1 == -1) Proc1 = ox_launch(0);
        !           161:    if (Proc2 == -1) Proc2 = ox_launch(0);
        !           162:    /* send the requests to the servers */
        !           163:    ox_rpc(Proc1,"quickSort",A,P,E,Level+1);
        !           164:    ox_rpc(Proc2,"quickSort",A,E+1,Q,Level+1);
        !           165:    if (E-P < Q-E) {
        !           166:      A1 = ox_pop_local(Proc1);
        !           167:      A2 = ox_pop_local(Proc2);
        !           168:    }else{
        !           169:      A2 = ox_pop_local(Proc2);
        !           170:      A1 = ox_pop_local(Proc1);
        !           171:    }
        !           172:    for (I=P; I<=E; I++) A[I] = A1[I];
        !           173:    for (I=E+1; I<=Q; I++) A[I] = A2[I];
        !           174:    return(A);
        !           175:   }else{
        !           176:    /* everything is done on this server */
        !           177:    quickSort(A,P,E,Level+1);
        !           178:    quickSort(A,E+1,Q,Level+1);
        !           179:    return(A);
        !           180:   }
        !           181: }
        !           182: \end{verbatim}
        !           183:
        !           184: Another example is a parallelization of the Cantor-Zassenhaus
        !           185: algorithm for polynomial factorization over finite fields. Its
        !           186: fundamental structure is similar to that of quicksort. By choosing a
        !           187: random polynomial, a polynomial is divided into two sub-factors with
        !           188: some probability. Then each subfactor is factorized recursively.  In
        !           189: the following program, one of the two sub-factors generated on a server
        !           190: is sent to another server and the other subfactor is factorized on the server
        !           191: itself.
        !           192: \begin{verbatim}
        !           193: /* factorization of F */
        !           194: /* E = degree of irreducible factors in F */
        !           195: def c_z(F,E,Level)
        !           196: {
        !           197:   V = var(F); N = deg(F,V);
        !           198:   if ( N == E ) return [F];
        !           199:   M = field_order_ff(); K = idiv(N,E); L = [F];
        !           200:   while ( 1 ) {
        !           201:     W = monic_randpoly_ff(2*E,V);
        !           202:     T = generic_pwrmod_ff(W,F,idiv(M^E-1,2));
        !           203:     if ( !(W = T-1) ) continue;
        !           204:     G = ugcd(F,W);
        !           205:     if ( deg(G,V) && deg(G,V) < N ) {
        !           206:       if ( Level >= LevelMax ) {
        !           207:         /* everything is done on this server */
        !           208:         L1 = c_z(G,E,Level+1);
        !           209:         L2 = c_z(sdiv(F,G),E,Level+1);
        !           210:       } else {
        !           211:         /* launch a server if necessary */
        !           212:         if ( Proc1 < 0 ) Proc1 = ox_launch();
        !           213:         /* send a request with Level = Level+1 */
        !           214:         ox_cmo_rpc(Proc1,"ox_c_z",lmptop(G),E,
        !           215:             setmod_ff(),Level+1);
        !           216:         /* the rest is done on this server */
        !           217:         L2 = c_z(sdiv(F,G),E,Level+1);
        !           218:         L1 = map(simp_ff,ox_pop_cmo(Proc1));
        !           219:       }
        !           220:       return append(L1,L2);
        !           221:     }
        !           222:   }
        !           223: }
        !           224: \end{verbatim}
        !           225:
        !           226:
        !           227:
        !           228:
        !           229:
        !           230:
        !           231:
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>