Publications
Sort Publications by:
Köksal, A. S., Pu Y., Srivastava S., Bodik R., Fisher J., & Piterman N.
(2013). Synthesis of Biological Models from Mutation Experiments.
POPL 2013. Abstract
Battenberg, E., Huang V., & Wessel D.
(2013). Live Drum Separation Using Probabilistic Spectral Clustering Based on the Itakuea-Saito Divergence.
45th Conference of the Audio Engineering Society. Abstract
Beamer, S., Buluc A., Asanović K., & Patterson D.
(2013). Distributed Memory Breadth-First Search Revisited: Enabling Bottom-p Search.
IPDPS 2013. Abstract
Demmel, J., Elaihu D., Fox A., Kamil S., Lipshitz B., Schwartz O., et al.
(2013). Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication.
27th IEEE International Parallel & Distributed Processing Symposium 2013. Abstract
Demmel, J., Gearhart A., Lipshitz B., & Schwartz O.
(2013). Perfect Strong Scaling Using No Additional Energy.
27th IEEE International Parallel & Distributed Processing Symposium 2013. Abstract
Benson, A., Gleich D., & Demmel J.
(2013). Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures.
IPDPS 2013. Abstract
Kamil, A., & Yelick K. A.
(2013). Hierarchical Additions to the SPMD Programming Model.
PPoPP 2013. Abstract
Lipshitz, B., Ballard G., Demmel J., & Schwartz O.
(2012). Communication-Avoiding Parallel Strassen: Implementation and Performance.
Supercomputing 2012. Abstract
Kamil, S., Coetzee D., Beamer S., Cook H., Gonina E., Harper J., et al.
(2012). Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Language.
SPLASH 2012. Abstract
Meyerovich, L., & Rabkin A.
(2012). Socio-PLT: Sociological Principles for Programming Language Adoption.
Onward! 2012. Abstract
Su, B. - Y., & Keutzer K.
(2012). clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs.
ICS 2012. Abstract
Friedland, G.
(2012). On a GPU Online Diarization = Offline Diarization.
ISCA Interspeech 2011.
Bachrach, J., Vo H., Richards B., Avizienis R., Wawryznek J., & Asanović K.
(2012). Chisel: Constructing Hardware in a Scala Embedded Language.
DAC 2012. Abstract
Ballard, G., Demmel J., Holtz O., Lipshitz B., & Schwartz O.
(2012). Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication.
24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012). Abstract
Ballard, G., Demmel J., Holtz O., Lipshitz B., & Schwartz O.
(2012). Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Lower Bounds.
24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012). Abstract
Maas, M., Reames P., Morlan J., Asanovic K., Joseph A., & Kubiatowicz J.
(2012). GPUs: An Opprtunity for Offloading Garbage Collection.
International Symposium on Memory Management - ISMM'12. Abstract
Solomonik, E., & Demmel J.
(2012). Matrix Multiplication on Multidimensional Torus Networks.
10th International Meeting on High-Performance Computing for Computational Science (VECPAR 2012) . Abstract
Burnim, J., Elmas T., Necula G., & Sen K.
(2012). CONCURRIT: Testing Concurrent Programs with Programmable State-Space Exploration.
Hot Par 2012. Abstract
Sheffield, D., Anderson M., & Keutzer K.
(2012). Automatic Generation of Application-Specific Accelerators for FPGAs from Python Loop Nests.
Field Programmable Logic (FPL) 2012. Abstract
Prasad, A., Howard D., Kamil S., & Fox A.
(2012). Parallel High-Performance Statistical Bootstrapping in Python.
SciPy 2012.
Lee, Y., Avizienis R., Bishara A., Xia R., Lockhart D., Batten C., et al.
(2012). Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators.
ACM Transactions on Computing Systems. Abstract
Chong, J., Gonina E., You K., & Keutzer K.
(2011). Scalable Parallelization of Automatic Speech Recognition.
Scaling Up Machine Learning. Abstract
Alcantara, D. A., Volkov V., Sengupta S., Mitzenmacher M., Owens J. D., & Amenta N.
(2011). Building an Efficient Hash Table on the GPU.
GPU Computing Gems 2 Jade Edition.
Bauer, M., Cook H. M., & Khailany B.
(2011). CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization.
Intertantional Conference on Super Computing, SC'11. Abstract
Kamil, S., Coetzee D., & Fox A.
(2011). Bringing Parallel Performance to Python with Domain-Specific Selective Just-in-Time Specialization.
Python for Scientific Computing Conference 2011. Abstract
Battenberg, E., Freed A., & Wessel D.
(2010). Advances in the Parallelization of Music and Audio Applications.
Proceedings of the International Computer Music Conference (2010). Abstract
Chong, J., Friedland G., Janin A., Morgan N., & Oei C.
(2010). Opportunities and Challenges of Parallelizing Speech Recognition.
Second USENIX Workshop on Hot Topics in Parallelism (HotPar 2010). Abstract
Arnold, G., Holzl J., Koksal A. S., Bodik R., & Sagiv M.
(2010). Specifying and Verifying Sparse Matrix Codes.
The 15th Annual ACM SIGPLAN International Conference on Functional Programming (ICFP 2010).
Colmenares, J. A., Bird S., Cook H. M., Pearce P., Zhu D., Shalf J., et al.
(2010). Resource Management in the Tessellation Manycore OS.
2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10). Abstract
Tan, Z., Waterman A., Cook H. M., Bird S., Asanović K., & Patterson D.
(2010). A Case for FAME: FPGA Architecture Model Execution.
International Symposium on Computer Architecture (ISCA-2010). Abstract
Pan, H., Hindman B., & Asanovi´c K.
(2010). Composing Parallel Software Efficiently with Lithe.
Programming Language Design and implementation (PLDI-2010). Abstract
Burnim, J., Necula G., & Sen K.
(2010). Separating Functional and Parallel Correctness using Nondeterministic Sequential Specifications.
2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10). Abstract
Asanović, K., Patterson D., Tan Z., Waterman A., Avizienis R., & Lee Y.
(2010). RAMP Gold: An FPGA-based Architecture Simulator for Multiprocessors.
Design Automation Conference (DAC-2010). Abstract
Burnim, J., & Sen K.
(2010). DETERMIN: Inferring Likely Deterministic Specifications of Multithreaded Programs.
32nd International Conference on Software Engineering (ICSE'10). Abstract
Stojanovic, V., Joshi P., Batten C., Kwon Y. - J., Beamer S., Chen S., et al.
(2010). Design-Space Exploration for CMOS Photonic Processor Networks.
Optical Fiber Communication Conference and Exposition and the National Optic Engineers Conference (OFC/ NFOEC). Abstract
Tan, Z., Asanović K., & Patterson D.
(2010). An FPGA-based Simulator for Datacenter Networks.
The Exascale Evaluation and Research Techniques Workshop (EXERT 2010), at the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010). Abstract
Barman, S., Bodik R., Chandra S., Galenson J., Kimelman D., Rodarmor C., et al.
(2010). Programming with Angelic Nondeterminism.
37th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '10). Abstract
Grigori, L., David P. - Y., Demmel J., & Peyronnet S.
(2010). Brief Announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem.
22nd ACM Symposium on Parallelism in Algorithms and Architectures. Abstract
Schmeder, A., Freed A., & Wessel D.
(2010). Best Practices for Open Sound Control.
Proceeding of the Linux Audio Conference (2010). Abstract
Catanzaro, B., Fox A., Keutzer K., Patterson D., & Su B. - Y.
(2010). Ubiquitous Parallel Computing from Berkeley, Illinois and Stanford.
IEEE Micro. Abstract
Tan, Z., Waterman A., Asanović K., & Patterson D.
(2009). Midas: An FPGA-based Architecture Simulator for Multiprocessors.
The 47th Design Automation Conference. Abstract
Ganapathi, A., Datta K., Fox A., & Patterson D.
(2009). A Case for Machine Learning to Optimize Multicore Performance.
HotPar09. Abstract
Shalf, J., Carloni L., Bergman K., Jain A., Lee B. G., Kubiatowicz J. D., et al.
(2009). Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications.
ACM/IEEE International Symposium on Networks-on-Chip. Abstract
Fuerlinger, K., & Moore S.
(2009). Recording the control flow of parallel applications to determine iterative and phase-based behavior.
Future Generation Computing Systems. Abstract
Burnim, J., Juvekar S., & Sen K.
(2009). WISE: Automated Test Generation for Worst-Case Complexity.
Proc. 31st International Conference on Software Engineering (ICSE'09). 463-473.
[Anonymous]
(2009). SNIFF: A Search Engine for Java using Free-Form Queries.
Proc. Fundamental Approaches to Software Engineering (FASE'09), 2009. 385-400..
Mohiyuddin, M., Hoemmen M., Demmel J., & Yelick K.
(2009). Minimizing Communication in Sparse Matrix Solvers.
Supercomputing '09. Abstract
Madduri, K., Williams S., Ethier S., Oliker L., Shalf J., Strohmaier E., et al.
(2009). Memory-efficient Optimization of Gyrokinetic Particle-to-Grid Interpolation for Multicore Processors.
Supercomputing '09. Abstract
Mohiyuddin, M., Murphy M., Oliker L., Shalf J., Wawrzynek J., & Williams S.
(2009). A Design Methodology for Domain-Optimized Power-Efficient Supercomputing.
Supercomputing '09. Abstract
Catanzaro, B., Kamil S., Lee Y., Asanović K., Demmel J., Keutzer K., et al.
(2009). SEJITS: Getting Productivity and PerformanceWith Selective Embedded JIT Specialization.
First Workshop on Programmable Models for Emerging Architecture (PMEA) at the 18th International Conference on Parallel Architectures and Compilation Techniques.
Ballard, G., Demmel J., Holtz O., & Schwartz O.
(2009). Communication Optimal Parallel and Sequential Cholesky Factorization.
Symposium on Parallelism in Algorithms and Architectures. Abstract
Beamer, S., Stojanovic V., Asanović K., Batten C., & Joshi P.
(2009). Advantages of Silicon Photonics for Multi-socket Systems.
23rd International Conference on Supercomputing (ICS-09) . Abstract
Asanović, K., Stojanovic V., Joshi P., Batten C., & Kwon Y. - J.
(2009). Manycore processor networks with monolithic integrated CMOS photonics.
29th Conference on Lasers and Electro-Optics (CLEO'09) . Abstract
Nishtala, R., Hargrove P., Bonachea D., & Yelick K.
(2009). Scaling communication intensive applications in BlueGene/P using one-sided communication and overlap.
International Parallel and Distributed Processing Symposium. Abstract
Joshi, P., Batten C., Kwon Y. - J., Beamer S., Shamim I., Asanović K., et al.
(2009). Silicon-Photonic Clos Networks for Global On-Chip Communication.
3rd ACM/IEEE International Symposium on Networks-on-Chip (NoCS) 2009. Abstract
Naik, M., Park C. - S., Sen K., & Gay D.
(2009). Effective Static Deadlock Detection.
31st International Conference on Software Engineering, Vancouver (ICSE 09). Abstract
Satish, N., Harris M., & Garland M.
(2009). Designing Efficient Sorting Algorithms for Manycore GPUs.
International Parallel and Distributed Processing Symposium.
Jones, C. G., Liu R., Meyerovich L., Asanović K., & Bodik R.
(2009). Parallelizing the Web Browser.
HotPar09.
Liu, R., Klues K., Bird S., Hofmeyr S., Asanović K., & Kubiatowicz J. D.
(2009). Tessellation: Space-Time Partitioning in a Manycore Client OS.
HotPar09.
Pan, H., Hindman B., & Asanović K.
(2009). Lithe: Enabling Efficient Composition of Parallel Libraries.
HotPar09.
Nishtala, R., & Yelick K. A.
(2009). Optimizing Collective Communication on Multicores.
HotPar09.
Yelick, K. A., Gebis J., Oliker L., Shalf J., & Williams S.
(2009). Improving Memory Subsystem Performance Using ViVA.
Architecture of Computing Systems - ARCS 2009, 22nd International Conference.
Joshi, P., Naik M., Park C. - S., & Sen K.
(2009). An Extensible Active Testing Framework for Concurrent Programs.
Computer Aided Verification 2009 (CAV2009). Abstract
Burnim, J., Jalbert N., Sterigou C., & Sen K.
(2009). Looper: Lightweight Detection of Infinite Loops at Runtime.
Proc. 24th IEEE/ACM nternational Conference on Automated Software Engineering (ASE'09).
Catanzaro, B., Su B. - Y., Sundaram N., Lee Y., Murphy M., & Keutzer K.
(2009). Efficient, High-Quality Image Contour Detection.
International Conference on Computer Vision (ICCV). 2381-2388.
Burnim, J., & Sen K.
(2009). Asserting and Checking Determinism for Multithreaded Programs.
7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium. Abstract
Joshi, P., Park C. - S., Naik M., & Sen K.
(2009). A randomized dynamic program analysis technique for detecting real deadlocks.
Conference on Programming Language Design and Implementation (PLDI'09). 110 - 120. Abstract
Meyerovich, L., Man C. S., On C. S., Pan H., Asanović K., & Bodik R.
(2009). Parallel Web Page Layout.
HotPar.
Oei, C., Friedland G., & Janin A.
(2009). Parallel Training of a Multi-Layer Perceptron on a GPU.
ICSI Technical Report. Abstract
Asanović, K., Bodik R., Demmel J., Keaveny T., Keutzer K., Kubiatowicz J. D., et al.
(2009). A View of the Parallel Computing Landscape.
Communications of the ACM. 52,
Williams, S., Carter J., Oliker L., Shalf J., & Yelick K.
(2009). Optimization of a Lattice Boltzmann Computation on State-of-the-Art Multicore Platforms.
Journal of Parallel and Distributed Computing. 69, 762-777.
Fuerlinger, K., & Moore S.
(2009). Capturing and analyzing the execution control flow of OpenMP Applications.
International Journel of Parallel Programming. Abstract
Sundaram, N., Raghunathan A., & Chakradhar S.
(2009). A framework for efficient and scalable execution of domain-specific templates on GPUs.
IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1-12. Abstract
Datta, K., Kamil S., Williams S., Oliker L., Shalf J., & Yelick K.
(2009). Optimizations and Performance Modeling of Stencil Computations on Modern Microprocessors.
SIAM Review (SIREV). 51, 129-159.
Shalf, J., Asanović K., Patterson D., Keutzer K., Mattson T., & Yelick K.
(2009). The Manycore Revolution: Will the HPC Community Lead or Follow?.
SciDAC Review. Abstract
Williams, S., Patterson D., & Waterman A.
(2009). Roofline: an insightful visual performance model for multicore architectures.
Communications of the ACM. 52, 65-76.
Wilkerson, D., Molnar D. A., Harren M., & Kubiatowicz J. D.
(2009). Hard-Object: Enforcing Object Interfaces Using Code-Range Data Protection.
Technical Report.
Chatterjee, K., Sen K., & Henzinger T. A.
(2008). Model-Checking omega-Regular Properties of Interval Markov Chains.
Proc. 11th International Conference on Foundations of Software Science and Computation Structures (FoSSaCS'08),. 302-317.
Batten, C., Aoki H., & Asanović K.
(2008). The Case for Malleable Stream Architectures.
Workshop on Streaming Systems at 41st International Symposium on Microarchitecture (MICRO-41).
Park, C. - S., & Sen K.
(2008). Randomized Active Atomicity Violation Detection in Concurrent Programs.
16th International Symposium on Foundations of Software Engineering (FSE'08). Abstract
Joshi, P., & Sen K.
(2008). Predictive Typestate Checking of Multithreaded Java Programs.
23rd IEEE/ACM International Conference on Automated Software Engineering (ASE'08). Abstract
Zhao, S., & Morgan N.
(2008). Multi-stream spectro-temporal features for robust speech recognition.
Proceedings of Interspeech 2008. Abstract
Burnim, J., & Sen K.
(2008). Heuristics for Scalable Dynamic Test Generation.
23rd IEE/ACM International Conference on Automated Software Engineering (ASE '08).
Jain, A., Kamil S., Mohiyuddin M., Shalf J., & Kubiatowicz J. D.
(2008). Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs.
Workshop on High Performance Embedded Computing. Abstract
Batten, C., Joshi P., Orcutt J., Khilo A., Moss B., Holzwarth C., et al.
(2008). Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics.
Hot Interconnects. Abstract
Garland, M., Legrand S., Nickolls J., Anderson J., Hardwick J., Morton S., et al.
(2008). Parallel Computing Experiences with CUDA.
IEEE Micro 28. Abstract
Burke, D., Wawrzynek J., Asanović K., Krasnov A., Shultz A., Gibeling G., et al.
(2008). RAMP Blue: Implementation of a Multicore 1008 Processor FPGA System.
Reconfigurable Systems Summer Institute.
Kannan, Y., & Sen K.
(2008). Universal Symbolic Execution and its Application to Likely Data Structure Invariant Generation.
International Symposium on Software Testing and Analysis (ISSTA '08).
Tan, Z., Asanović K., & Patterson D.
(2008). An FPGA Host-Multithreaded Functional Model for Sparc v8.
35th International Symposium on Computer Architecture (ISCA-35).
Sen, K.
(2008). Race Directed Randomized Dynamic Analysis of Concurrent Programs.
ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI'08). Abstract
Solar-Lezama, A., Jones C., & Bodik R.
(2008). Sketching Concurrent Data Structures.
ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI'08).
Lee, J. W., Ng M. C., & Asanović K.
(2008). Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks.
35th International Symposium on Computer Architecture (ISCA-35).
Chong, J., Yi Y., Faria A., Satish N., & Keutzer K.
(2008). "Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors.
Proceedings of the 1st Annual Workshop on Emerging Applications and Many Core Architecture (EAMA).
Cook, H. M., & Skadron K.
(2008). Predictive Design Space Exploration Using Genetically Programmed Response Surfaces.
Design Automation Conference. Abstract
Catanzaro, B., Sundaram N., & Keutzer K.
(2008). A Map Reduce Framework for Programming Graphics Processors.
Third Workshop on Software Tools for MultiCore Systems (STMCS). Abstract
Hampton, M., & Asanović K.
(2008). Compiling for Vector-thread Architectures.
International Symposium on Code Generation and Optimization (CGO. Abstract
Faria, A., & Morgan N.
(2008). Corrected Tandem Features for Acoustic Model Training.
ICASSP 2008. Abstract
Demmel, J., Hoemmen M., Mohiyuddin M., & Yelick K. A.
(2008). Avoiding Communication in Sparse Matrix Computations.
IPDPS.
Nishtala, R., Almasi G., & Cascaval G.
(2008). Performance without Pain = Productivity, Data Layouts and Collectives in UPC.
Principles and Practices of Parallel Programming (PPoPP) 2008. Abstract
Wessel, D.
(2008). Reinventing Audio and Music Computation for Many-Core Processors.
Proceedings of the International Computer Music Conference 2008,.
Catanzaro, B., Keutzer K., & Su B. - Y.
(2008). Parallelizing CAD: A Timely Research Agenda for EDA.
DAC 08. Abstract
Kamil, S., Shalf J., & Strohmaier E.
(2008). Power Efficiency in High Performance Computing.
High-Performance, Power-Aware Computing (HPPAC 2008). Abstract
Ramanathan, M. K., Sen K., Grama A., & Jagannathan1 S.
(2008). Protocol Inference Using Static Path Profiles.
Proc. 15th International Static Analysis Symposium (SAS'08),. 78-92.
Fuerlinger, K., & Moore S.
(2008). OpenMP-centric Performance Analysis of Hybrid Applications.
2008 IEEE International Conference on Cluster Computing (CLUSTER 2008).
Catanzaro, B., Sundaram N., & Keutzer K.
(2008). Fast Support Vector Machine Training and Classification on Graphics Processors.
International Conference on Machine Learning (ICML). Abstract
Demmel, J., Grigori L., & Xiang H.
(2008). Communication-Avoiding Gaussian Elimination.
Supercomputing08. Abstract
Schwenke, R., Zotter F., Wessel D., & Schmeder A.
(2008). Room Acoustics Measurements with and Approximately Spherical Source of 120 Drivers.
Journal of the Acoustical Society of America. Abstract
Krashinsky, R., Batten C., & Asanović K.
(2008). Implementing the Scale Vector-Thread Processor.
ACM Transaction on Design Automation of Electronic Systems (TODAES). 13(3), Abstract
Keutzer, K., Hwu W. - M., & Mattson T.
(2008). The Concurrency Challenge.
IEEE Design and Test of Computers. 25, Abstract
Volkov, V., & Demmel J.
(2008). LU, QR and Cholesky factorizations using vector capabilities of GPUs.
Technical Report No. UCB/EECS-2008-49. Abstract
Williams, S., Datta K., Carter J., Oliker L., Shalf J., Yelick K. A., et al.
(2008). PERI: Autotuning Memory Intensive Kernels for Multicore.
Journal of Physics, SciDAC PI Conference: Conference Series: 123012001. Abstract
Williams, S., Oliker L., Vuduc R., Shalf J., Yelick K. A., & Demmel J.
(2007). Optimization of Sparse-Matrix-Vector Multiplication on Emerging Multicore Platform.
Parallel Computing- Special Issue on Revolutionary Hardware. Abstract