An Auto-Tuning Framework for Parallel Multicore Stencil Computations

TitleAn Auto-Tuning Framework for Parallel Multicore Stencil Computations
Publication TypeJournal Article
Year of Publication2010
AuthorsKamil, S., Chan C., Oliker L., Shalf J., & Williams S.

Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantia- tions; in addition, the large variety of stencil kernels used in practice makes this computation pattern dif- ficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly ad- vances programmer productivity by automatically con- verting a straightforward sequential Fortran 95 stencil expression into tuned parallel implementations in For- tran, C, or CUDA, thus allowing performance portabil- ity across diverse computer architectures, including the AMD Barcelona, Intel Nehalem, Sun Victoria Falls, and the latest NVIDIA GPUs. Results show that our gen- eralized methodology delivers significant performance gains of up to 22× speedup over the reference se- rial implementation. Overall we demonstrate that such domain-specific auto-tuners hold enormous promise for architectural efficiency, programmer productivity, performance portability, and algorithmic adaptability on existing and emerging multicore systems.

PID1129389.pdf1.08 MB