James Demmel
Publications
- 2013, Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
- 2013, Graph Expansion and Communication Costs of Algorithms
- 2013, Perfect Strong Scaling Using No Additional Energy
- 2013, Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures
- 2013, Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout
- 2013, Communication Costs of Strassen's Matrix Multiplication
- 2012, Communication-Avoiding Parallel Strassen: Implementation and Performance
- 2012, Communication-Optimal Parallel Algorithm for Strassen’s Matrix Multiplication
- 2012, Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning
- 2012, Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication
- 2012, LU Factorization with Panel Rank Revealing Pivoting and its Communication Avoiding Version
- 2012, Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Lower Bounds
- 2012, Matrix Multiplication on Multidimensional Torus Networks
- 2012, Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication
- 2011, Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
- 2011, BEST PAPER AWARD: Graph Expansion and Communication Costs of Fast Matrix Multiplication
- 2011, Brief Announcement: Communication Bounds for Heterogeneous Architectures
- 2011, DISTINGUISHED PAPER: Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms
- 2011, Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
- 2011, Impoving Communication Performance In Dense linear Algebra Via Topology Aware Collectives
- 2011, Rethinking Algorithms for Future Architectrues: Communication-Avoiding Algorithms
- 2011, Avoiding Communication in Two-Sided Krylov Subspace Methods
- 2010, Brief Announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem
- 2010, SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization
- 2010, CALU: A COMMUNICATION OPTIMAL LU FACTORIZATION ALGORITHM
- 2009, Minimizing Communication in Sparse Matrix Solvers
- 2009, A View of the Parallel Computing Landscape
- 2009, SEJITS: Getting Productivity and PerformanceWith Selective Embedded JIT Specialization
- 2009, Communication Optimal Parallel and Sequential Cholesky Factorization
- 2009, Accelerating Time-to-Solution for Computational Science and Engineering
- 2009, Acceleratig Time-to-Solution for Computational Science and Engineering
- 2009, Minimizing Communication in Linear Algebra
- 2009, Communication-Optimal Parallel and Sequential Eigenvalue/SVD Algorithms
- 2008, Benchmarking GPUs to tune dense linear algebra
- 2008, Communication-Avoiding Gaussian Elimination
- 2008, LU, QR and Cholesky factorizations using vector capabilities of GPUs
- 2008, Avoiding Communication in Sparse Matrix Computations
- 2008, The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View
- 2008, Communication-optimal parallel and sequential QR and LU factorizations
- 2007, Optimization of Sparse-Matrix-Vector Multiplication on Emerging Multicore Platform