OpenXM/doc/ascm2001p/homogeneous-network.tex - annotate

Return to homogeneous-network.tex CVS log
Up to [local] / OpenXM / doc / ascm2001p
Annotation of OpenXM/doc/ascm2001p/homogeneous-network.tex, Revision 1.1

1.1     ! noro        1: % $OpenXM$
        !             2:
        !             3: \subsection{Distributed computation with homogeneous servers}
        !             4: \label{section:homog}
        !             5:
        !             6: One of the aims of OpenXM is a parallel speedup by a distributed computation
        !             7: with homogeneous servers. As the current specification of OpenXM does
        !             8: not include communication between servers, one cannot expect
        !             9: the maximal parallel speedup. However it is possible to execute
        !            10: several types of distributed computation as follows.
        !            11:
        !            12: \subsubsection{Product of univariate polynomials}
        !            13:
        !            14: Shoup \cite{Shoup} showed that the product of univariate polynomials
        !            15: with large degrees and large coefficients can be computed efficiently
        !            16: by FFT over small finite fields and Chinese remainder theorem,
        !            17: which can be easily parallelized.
        !            18: %
        !            19: %\begin{tabbing}
        !            20: %Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\
        !            21: %Output : $f = f_1f_2$ \\
        !            22: %$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\
        !            23: %\> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
        !            24: %Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
        !            25: %for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
        !            26: %Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
        !            27: %\> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
        !            28: %\> (The product is computed by FFT.)\\
        !            29: %return $\phi_m(\sum F_j)$\\
        !            30: %(For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
        !            31: %\end{tabbing}
        !            32: %
        !            33: Figure \ref{speedup}
        !            34: shows the speedup factor under the above distributed computation
        !            35: on Risa/Asir. For each $n$, two polynomials of degree $n$
        !            36: with 3000bit coefficients are generated and the product is computed.
        !            37: The machine is FUJITSU AP3000,
        !            38: a cluster of Sun workstations connected with a high speed network
        !            39: and MPI over the network is used to implement OpenXM.
        !            40: \begin{figure}[htbp]
        !            41: \epsfxsize=10cm
        !            42: \epsffile{speedup.ps}
        !            43: \caption{Speedup factor}
        !            44: \label{speedup}
        !            45: \end{figure}
        !            46: If the number of servers is $L$ and the inputs are fixed, then the cost to
        !            47: compute the products modulo some integers in parallel is $O(1/L)$,
        !            48: whereas the cost
        !            49: to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and
        !            50: {\tt ox\_pop\_cmo()} are repeatedly applied on the client.
        !            51: Therefore the speedup is limited and the upper bound of
        !            52: the speedup factor depends on the ratio of
        !            53: the computational cost and the communication cost for each unit operation.
        !            54: Figure \ref{speedup} shows that
        !            55: the speedup is satisfactory if the degree is large and $L$
        !            56: is not large, say, up to 10 under the above environment.
        !            57: If OpenXM provides collective operations for broadcast and reduction
        !            58: such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of
        !            59: broadcasting the inputs and gathering the results on the servers
        !            60: may be reduced to $O(\log_2L)$
        !            61: and we can expect better results in such a case. In order to implement
        !            62: such operations we need new specifications for inter-sever communication
        !            63: and the session management, which will be proposed as OpenXM-RFC 102.
        !            64: We note that preliminary experiments show the collective operations
        !            65: work well on OpenXM.
        !            66:
        !            67: %\subsubsection{Competitive distributed computation by various strategies}
        !            68: %
        !            69: %SINGULAR \cite{Singular} implements {\it MP} interface for distributed
        !            70: %computation and a competitive Gr\"obner basis computation is
        !            71: %illustrated as an example of distributed computation.
        !            72: %Such a distributed computation is also possible on OpenXM as follows:
        !            73: %
        !            74: %The client creates two servers and it requests
        !            75: %Gr\"obner basis comutations from the homogenized input and the input itself
        !            76: %to the servers.
        !            77: %The client watches the streams by {\tt ox\_select()}
        !            78: %and the result which is returned first is taken. Then the remaining
        !            79: %server is reset.
        !            80: %
        !            81: %\begin{verbatim}
        !            82: %/* G:set of polys; V:list of variables */
        !            83: %/* O:type of order; P0,P1: id's of servers */
        !            84: %def dgr(G,V,O,P0,P1)
        !            85: %{
        !            86: %  P = [P0,P1]; /* server list */
        !            87: %  map(ox_reset,P); /* reset servers */
        !            88: %  /* P0 executes non-homogenized computation */
        !            89: %  ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
        !            90: %  /* P1 executes homogenized computation */
        !            91: %  ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
        !            92: %  map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
        !            93: %  F = ox_select(P); /* wait for data */
        !            94: %  /* F[0] is a server's id which is ready */
        !            95: %  R = ox_get(F[0]);
        !            96: %  if ( F[0] == P0 ) {
        !            97: %    Win = "nonhomo"; Lose = P1;
        !            98: %  } else {
        !            99: %    Win = "homo"; Lose = P0;
        !           100: %  }
        !           101: %  ox_reset(Lose); /* reset the loser */
        !           102: %  return [Win,R];
        !           103: %}
        !           104: %\end{verbatim}
        !           105:
        !           106: \subsubsection{Nesting of client-server communication}
        !           107:
        !           108: Under OpenXM-RFC 100 an OpenXM server can be a client of other servers.
        !           109: Figure \ref{tree} illustrates a tree-like structure of an OpenXM
        !           110: client-server communication.
        !           111:
        !           112: \begin{figure}
        !           113: \label{tree}
        !           114: \begin{center}
        !           115: \begin{picture}(200,140)(0,0)
        !           116: \put(70,120){\framebox(40,15){client}}
        !           117: \put(20,60){\framebox(40,15){server}}
        !           118: \put(70,60){\framebox(40,15){server}}
        !           119: \put(120,60){\framebox(40,15){server}}
        !           120: \put(0,0){\framebox(40,15){server}}
        !           121: \put(50,0){\framebox(40,15){server}}
        !           122: \put(135,0){\framebox(40,15){server}}
        !           123:
        !           124: \put(90,120){\vector(-1,-1){43}}
        !           125: \put(90,120){\vector(0,-1){43}}
        !           126: \put(90,120){\vector(1,-1){43}}
        !           127: \put(40,60){\vector(-1,-2){22}}
        !           128: \put(40,60){\vector(1,-2){22}}
        !           129: \put(140,60){\vector(1,-3){14}}
        !           130: \end{picture}
        !           131: \caption{Tree-like structure of client-server communication}
        !           132: \end{center}
        !           133: \end{figure}
        !           134:
        !           135: Such a computational model is useful for parallel implementation of
        !           136: algorithms whose task can be divided into subtasks recursively.
        !           137:
        !           138: %A typical example is {\it quicksort}, where an array to be sorted is
        !           139: %partitioned into two sub-arrays and the algorithm is applied to each
        !           140: %sub-array. In each level of recursion, two subtasks are generated
        !           141: %and one can ask other OpenXM servers to execute them.
        !           142: %Though it makes little contribution to the efficiency in the case of
        !           143: %quicksort, we present an Asir program of this distributed quicksort
        !           144: %to demonstrate that OpenXM gives an easy way to test this algorithm.
        !           145: %In the program, a predefined constant {\tt LevelMax} determines
        !           146: %whether new servers are launched or whole subtasks are done on the server.
        !           147: %
        !           148: %\begin{verbatim}
        !           149: %#define LevelMax 2
        !           150: %extern Proc1, Proc2;
        !           151: %Proc1 = -1$ Proc2 = -1$
        !           152: %
        !           153: %/* sort [A[P],...,A[Q]] by quicksort */
        !           154: %def quickSort(A,P,Q,Level) {
        !           155: %  if (Q-P < 1) return A;
        !           156: %  Mp = idiv(P+Q,2); M = A[Mp]; B = P; E = Q;
        !           157: %  while (1) {
        !           158: %    while (A[B] < M) B++;
        !           159: %    while (A[E] > M && B <= E) E--;
        !           160: %    if (B >= E) break;
        !           161: %    else { T = A[B]; A[B] = A[E]; A[E] = T; E--; }
        !           162: %  }
        !           163: %  if (E < P) E = P;
        !           164: %  if (Level < LevelMax) {
        !           165: %   /* launch new servers if necessary */
        !           166: %   if (Proc1 == -1) Proc1 = ox_launch(0);
        !           167: %   if (Proc2 == -1) Proc2 = ox_launch(0);
        !           168: %   /* send the requests to the servers */
        !           169: %   ox_rpc(Proc1,"quickSort",A,P,E,Level+1);
        !           170: %   ox_rpc(Proc2,"quickSort",A,E+1,Q,Level+1);
        !           171: %   if (E-P < Q-E) {
        !           172: %     A1 = ox_pop_local(Proc1);
        !           173: %     A2 = ox_pop_local(Proc2);
        !           174: %   }else{
        !           175: %     A2 = ox_pop_local(Proc2);
        !           176: %     A1 = ox_pop_local(Proc1);
        !           177: %   }
        !           178: %   for (I=P; I<=E; I++) A[I] = A1[I];
        !           179: %   for (I=E+1; I<=Q; I++) A[I] = A2[I];
        !           180: %   return(A);
        !           181: %  }else{
        !           182: %   /* everything is done on this server */
        !           183: %   quickSort(A,P,E,Level+1);
        !           184: %   quickSort(A,E+1,Q,Level+1);
        !           185: %   return(A);
        !           186: %  }
        !           187: %}
        !           188: %\end{verbatim}
        !           189:
        !           190: A typical example is a parallelization of the Cantor-Zassenhaus
        !           191: algorithm for polynomial factorization over finite fields.
        !           192: which is a recursive algorithm.
        !           193: At each level of the recursion, a given polynomial can be
        !           194: divided into two non-trivial factors with some probability by using
        !           195: a randomly generated polynomial as a {\it separator}.
        !           196: We can apply the following simple parallelization:
        !           197: When two non-trivial factors are generated on a server,
        !           198: one is sent to another server and the other factor is factorized on the server
        !           199: itself.
        !           200: %\begin{verbatim}
        !           201: %/* factorization of F */
        !           202: %/* E = degree of irreducible factors in F */
        !           203: %def c_z(F,E,Level)
        !           204: %{
        !           205: %  V = var(F); N = deg(F,V);
        !           206: %  if ( N == E ) return [F];
        !           207: %  M = field_order_ff(); K = idiv(N,E); L = [F];
        !           208: %  while ( 1 ) {
        !           209: %    /* gererate a random polynomial */
        !           210: %    W = monic_randpoly_ff(2*E,V);
        !           211: %    /* compute a power of the random polynomial */
        !           212: %    T = generic_pwrmod_ff(W,F,idiv(M^E-1,2));
        !           213: %    if ( !(W = T-1) ) continue;
        !           214: %    /* G = GCD(F,W^((M^E-1)/2)) mod F) */
        !           215: %    G = ugcd(F,W);
        !           216: %    if ( deg(G,V) && deg(G,V) < N ) {
        !           217: %      /* G is a non-trivial factor of F */
        !           218: %      if ( Level >= LevelMax ) {
        !           219: %        /* everything is done on this server */
        !           220: %        L1 = c_z(G,E,Level+1);
        !           221: %        L2 = c_z(sdiv(F,G),E,Level+1);
        !           222: %      } else {
        !           223: %        /* launch a server if necessary */
        !           224: %        if ( Proc1 < 0 ) Proc1 = ox_launch();
        !           225: %        /* send a request with Level = Level+1 */
        !           226: %        /* ox_c_z is a wrapper of c_z on the server */
        !           227: %        ox_cmo_rpc(Proc1,"ox_c_z",lmptop(G),E,
        !           228: %            setmod_ff(),Level+1);
        !           229: %        /* the rest is done on this server */
        !           230: %        L2 = c_z(sdiv(F,G),E,Level+1);
        !           231: %        L1 = map(simp_ff,ox_pop_cmo(Proc1));
        !           232: %      }
        !           233: %      return append(L1,L2);
        !           234: %    }
        !           235: %  }
        !           236: %}
        !           237: %\end{verbatim}
        !           238: %
        !           239: %
        !           240: %
        !           241: %
        !           242: %
        !           243: %
        !           244: %
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>