Automatic Performance Tuning (Autotuning)

Autotuning uses a combination of empirical search and performance modeling to create highly optimized libraries tailored to specific machines. We will identify critical libraries based on application needs and build autotuners in the efficiency layer. We will also develop fundamental technology, including optimization and code generation strategies for manycore, search algorithms, and performance models to guide search and aid in identifying hardware bottlenecks. We will leverage existing autotuners from ourselves and others, adding support for novel hardware features, and build new autotuners when none exist. We will expand our tuning activities in the BeBOP and LAPACK [Anderson, Bai et al. 1995] groups in several dwarfs linear algebra, regular and irregular meshes, as well as collective communication routines. Priorities will come from application needs, e.g., the health application requires repeated sparse matrix-vector multiplies, which can be treated as a single operation to save repeated reads of the matrix; the resulting code is very complicated, so we plan to use sketching to fill in pieces of the optimized version.