A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers

TitleA Predictive Model for Solving Small Linear Algebra Problems in GPU Registers
Publication TypeConference Paper
Year of Publication2012
AuthorsAnderson, M., Sheffield D., & Keutzer K.

We examine the problem of solving many thousands
of small dense linear algebra factorizations simultaneously
on Graphics Processing Units (GPUs). We are interested
in problems ranging from several hundred of rows and columns
to 4 × 4 matrices. Problems of this size are common, especially
in signal processing. However, they have received very
little attention from current numerical linear algebra libraries
for GPUs, which have thus far focused only on very large
problems found in traditional supercomputing applications
and benchmarks. To solve small problems efficiently we tailor
our implementation to the GPUs inverted memory hierarchy
and multi-level parallelism hierarchy. We provide a model
of the GPU memory subsystem that can accurately predict
and explain the performance of our approach across different
problem sizes.
As a motivating example, we look at space-time adaptive
radar processing, a real-time application that requires
hundreds of independent QR factorizations of small complex
matrices (e.g. 240 × 66). For realistic matrix sizes from a
standard radar processing benchmark, our implementation on
an NVIDIA Quadro 6000 GPU runs 2.8× to 25× faster than
Intel’s Math Kernel Library (MKL) on an Intel Core i7-2600.
For the QR factorizations of 5,000 56 × 56 single-precision
matrices, our approach runs 29× faster than MKL and 140× faster than the state-of-the-art linear algebra library for GPUs.
In each of these cases we are using the GPU’s hardwareaccelerated
division and square root functions that are accurate
up to 22 mantissa bits.

A Predictive Model for Solving Small Linear Problems.pdf884.56 KB