Research > Efficiency Layer

Programming at the Efficiency Layer.  We offer two novel techniques for program synthesis that shifts some of the burden of library and framework construction from human time to computer time: Sketching [Solar-Lezama and Rabbah 2005] allows programmers to specify the skeleton of an optimized algorithm and have the sketching system correctly and verifiably fill in missing pieces relative to an executable specification of the function. Autotuning [Demmel, Dongarra et al. 2005] uses performance feedback from models and experiments to automatically search a space of proposed implementations for a given function and select the one that is best for a given machine and problem class. Both of these are described in more detail in Appendix A. (See Sections A.9 and A.10)
Programs in the efficiency layer are written very close to the machine with the goal of allowing the best possible algorithm to be written in the primitives of this layer. Unfortunately, existing multicore and planned manycore systems do not offer a common low-level programming model for parallel code. Using the idea of a software "Research Accelerator for Multi-Processors", we will define a thin portability layer (software RAMP) that runs efficiently across single socket platforms and has features for parallel job creation, synchronization, memory allocation, and bulk memory access. To provide a common model of memory across machines with coherent caches, local stores, and relatively slow off-chip memory, we will define an API based on the idea of logically partitioned shared memory, inspired by our experience with Unified Parallel C [UPC 2005], which partitions memory between processors but currently not between on and off-chip. This efficiency language may be implemented either as a set of runtime primitives or as a language extension of C. It will be extensible with libraries to experiment with various architectural features, such as transactions, dynamic multithreading, active messages, and collective communication. This API will be implemented on some existing multicore and manycore platforms and our own emulated manycore design.

Projects