- 2007-04-20: Get HYPRE compiling on my system
- 2007-04-23: copy down the notes I got from Luke when I talked with him eariler.
- 2007-04-26: read multigrid stuff Luke has given us in class
- 2007-04-28: get the HYPRE examples working on the Turing cluster.
Reference
Organization of the paper
- motivation
- methods
- numerics
- conclusion
Analysis of parallel algebraic multigrid methods in theory and in practice: a survey and analysis of the Hypre framework.
- motivation
- multigrid has good time and space complexity in theory. Therefore, we want to be able t outilize it on a parallel computer.
- survey of the literature
- domain decomposition
- what is it?
- other alternatives
- fault tolerance
- easing the implementation of some of these matrix systems
- ghost rows?
- constructing coarse grids
- multi-grid book and other sources say that constructing the grids can take as long as actually constructing the matrix
- what's worse, the traditional RS implementation is a recursive and sequential algorithm, so a naive implemenation is even slower in parallel.
- two main familes of methods
- adaptations of RS
- maximal independent set methods
- triple matrix product
- restriction and prolongation implemented as matrix vector products in Hypre
- implementation of sparse matrix vector product in Hypre
- unknown if there are better implementations.
- also unknown if better methods can be written due to formula in original RS paper
- perhaps the original course grid connectiviity information could help?
- does this formula differ from matrix transposition?
- parallel smoothers
- hypre seems to choose hybrid methods for convenience.
- coarse grid parallelism
- V-cycles preferred because they offer better parallelism
- methods for study
- Examined both Hypre, compared to knowledge already obtained through study of petsc
- decided on hypre because the implementation was simple and easy to experiment with
- Looked at various profiling tools, examined their strengths and weaknesses.
- mpe
- Tau
- eventually decided to roll my own.
- decided to track all the points of interest:
- time spent in coarsening
- to test if the parallel performance of coarsening schemes was as bad as was predicted
- time spent in the triple matrix product
- used to test the speed of this operation in comparison to the coarsening
- time spent in various phases of the V cycle
- needed to test the time tradeoffs between various V-cycle solves, and to see if parallelization degraded at coarser levels
- conclusions
- necessity of a general krylov solver standard
- hypre is successful because it is simple and easy to plug into existing code bases
- PETSc is not simple, but very well designed and organized.
- Krylov methods feed off one another. AMG can be implemented as a preconditioner, and it takes other iterative methods as arguments. There is clearly a need for a united Krylov framework.
- In this author's experience, people don't want to link against a big framework. How can these contradicting goals be united?
- both PETSC and hypre use the same techniques to implement the equations.
Experiments with parallel multigrid algorithms using the suprenum communication library Hempel and Schuller
Componets of AMG in parallel
- setup
- triple matrix product
- give results on how hypre does it
- Compute the local transpose for the R
- Do a matrix vector transpose to find out where the nonzeros in the corse grid are
- repeat the process again, actually compute the values
- for C/F grids, RS possibly has a more effective implementation
- matrix summation of A(N(C),N(D)). Therefore, for most reasonable smoothers we can compute the local parts while waiting for values to propagate.
- is this done in practice? Not in Hypre, as far as I can tell.
- picking course grid points
- David's results
- PMIS, CJLP, David's improvements.
- other results in the summary paper.
- solve
- efficient use of idle time on corse grids
- alternative formulations that use multiple grids, combine the results.
- efficient comunication construction of matrix vector multiplication
- O(p) MPI_ALLGATHER, DTD systems
- efficient matrix transpose vector multiplication
- inverse of communication patterns for matrix vector
- minimization of communication?
- grid methods versus domain decomposition
- parmetis usually used
- research that parmetis may not be best, hypergraphs could be better.
- efficient decomposition of matricies and vectors
- standard "ownership" model
- other versions? Caching of data? I didn't see anything.
- possible duplication of matrix values? Possible source of improvement? Couldn't find anything, not refeenced.
- Parallel SC in March + petascale iterative solvers paper
- setup *solve
- load balancing
- parallel triple matrix products?
Focus paper on components of AMG in parallel.
Topic revision: r7 - 05 May 2007 - 07:23:41 -
RobBlake