Sort Publications by:
Live Drum Separation Using Probabilistic Spectral Clustering Based on the Itakuea-Saito Divergence. 45th Conference of the Audio Engineering Society.(2013).
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. 27th IEEE International Parallel & Distributed Processing Symposium 2013.(2013).
Perfect Strong Scaling Using No Additional Energy. 27th IEEE International Parallel & Distributed Processing Symposium 2013.(2013).
Communication-Avoiding Parallel Strassen: Implementation and Performance. Supercomputing 2012.(2012).
Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012).(2012).
Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Lower Bounds. 24th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2012).(2012).
GPUs: An Opprtunity for Offloading Garbage Collection. International Symposium on Memory Management - ISMM'12.(2012).
Matrix Multiplication on Multidimensional Torus Networks. 10th International Meeting on High-Performance Computing for Computational Science (VECPAR 2012) .(2012).
Automatic Generation of Application-Specific Accelerators for FPGAs from Python Loop Nests. Field Programmable Logic (FPL) 2012.(2012).
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators. ACM Transactions on Computing Systems.(2012).
CudaDMA: Optimizing GPU Memory Bandwidth via Warp Specialization. Intertantional Conference on Super Computing, SC'11.(2011).
Bringing Parallel Performance to Python with Domain-Specific Selective Just-in-Time Specialization. Python for Scientific Computing Conference 2011.(2011).
Advances in the Parallelization of Music and Audio Applications. Proceedings of the International Computer Music Conference (2010).(2010).
Opportunities and Challenges of Parallelizing Speech Recognition. Second USENIX Workshop on Hot Topics in Parallelism (HotPar 2010).(2010).
Specifying and Verifying Sparse Matrix Codes. The 15th Annual ACM SIGPLAN International Conference on Functional Programming (ICFP 2010).(2010).
RAMP Gold: An FPGA-based Architecture Simulator for Multiprocessors. Design Automation Conference (DAC-2010).(2010).
Resource Management in the Tessellation Manycore OS. 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10).(2010).
A Case for FAME: FPGA Architecture Model Execution. International Symposium on Computer Architecture (ISCA-2010).(2010).
Composing Parallel Software Efﬁciently with Lithe. Programming Language Design and implementation (PLDI-2010).(2010).
Separating Functional and Parallel Correctness using Nondeterministic Sequential Speciﬁcations. 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar '10).(2010).
DETERMIN: Inferring Likely Deterministic Speciﬁcations of Multithreaded Programs. 32nd International Conference on Software Engineering (ICSE'10).(2010).
An FPGA-based Simulator for Datacenter Networks. The Exascale Evaluation and Research Techniques Workshop (EXERT 2010), at the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010).(2010).
Design-Space Exploration for CMOS Photonic Processor Networks. Optical Fiber Communication Conference and Exposition and the National Optic Engineers Conference (OFC/ NFOEC).(2010).
Programming with Angelic Nondeterminism. 37th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '10).(2010).
Brief Announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem. 22nd ACM Symposium on Parallelism in Algorithms and Architectures.(2010).
Midas: An FPGA-based Architecture Simulator for Multiprocessors. The 47th Design Automation Conference.(2009).
Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. ACM/IEEE International Symposium on Networks-on-Chip.(2009).
Recording the control flow of parallel applications to determine iterative and phase-based behavior. Future Generation Computing Systems.(2009).
WISE: Automated Test Generation for Worst-Case Complexity. Proc. 31st International Conference on Software Engineering (ICSE'09). 463-473.(2009).
SNIFF: A Search Engine for Java using Free-Form Queries. Proc. Fundamental Approaches to Software Engineering (FASE'09), 2009. 385-400..(2009).
A Design Methodology for Domain-Optimized Power-Efficient Supercomputing. Supercomputing '09.(2009).
SEJITS: Getting Productivity and PerformanceWith Selective Embedded JIT Specialization. First Workshop on Programmable Models for Emerging Architecture (PMEA) at the 18th International Conference on Parallel Architectures and Compilation Techniques.(2009).
Communication Optimal Parallel and Sequential Cholesky Factorization. Symposium on Parallelism in Algorithms and Architectures.(2009).
Advantages of Silicon Photonics for Multi-socket Systems. 23rd International Conference on Supercomputing (ICS-09) .(2009).
Manycore processor networks with monolithic integrated CMOS photonics. 29th Conference on Lasers and Electro-Optics (CLEO'09) .(2009).
Scaling communication intensive applications in BlueGene/P using one-sided communication and overlap. International Parallel and Distributed Processing Symposium.(2009).
Silicon-Photonic Clos Networks for Global On-Chip Communication. 3rd ACM/IEEE International Symposium on Networks-on-Chip (NoCS) 2009.(2009).
Effective Static Deadlock Detection. 31st International Conference on Software Engineering, Vancouver (ICSE 09).(2009).
Designing Efficient Sorting Algorithms for Manycore GPUs. International Parallel and Distributed Processing Symposium.(2009).
Improving Memory Subsystem Performance Using ViVA. Architecture of Computing Systems - ARCS 2009, 22nd International Conference.(2009).
An Extensible Active Testing Framework for Concurrent Programs. Computer Aided Verification 2009 (CAV2009).(2009).
Scalable HMM-based Inference Engine in Large Vocabulary Continuous Speech Recogntion. IEEE International Conference on Multimedia and Expo.(2009).
Looper: Lightweight Detection of Infinite Loops at Runtime. Proc. 24th IEEE/ACM nternational Conference on Automated Software Engineering (ASE'09).(2009).
Efficient, High-Quality Image Contour Detection. International Conference on Computer Vision (ICCV). 2381-2388.(2009).
Asserting and Checking Determinism for Multithreaded Programs. 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering on European software engineering conference and foundations of software engineering symposium.(2009).
A randomized dynamic program analysis technique for detecting real deadlocks. Conference on Programming Language Design and Implementation (PLDI'09). 110 - 120.(2009).
Optimization of a Lattice Boltzmann Computation on State-of-the-Art Multicore Platforms. Journal of Parallel and Distributed Computing. 69, 762-777.(2009).
Capturing and analyzing the execution control flow of OpenMP Applications. International Journel of Parallel Programming.(2009).
A framework for efﬁcient and scalable execution of domain-speciﬁc templates on GPUs. IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1-12.(2009).
Communication Requirements and Interconnect Optimization of High-End Scientific Applications.. IEEE Transactions on Parallel and Distributed Systems. 21,(2009).
Optimizations and Performance Modeling of Stencil Computations on Modern Microprocessors. SIAM Review (SIREV). 51, 129-159.(2009).
Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM. 52, 65-76.(2009).
Hard-Object: Enforcing Object Interfaces Using Code-Range Data Protection. Technical Report.(2009).
Model-Checking omega-Regular Properties of Interval Markov Chains. Proc. 11th International Conference on Foundations of Software Science and Computation Structures (FoSSaCS'08),. 302-317.(2008).
The Case for Malleable Stream Architectures. Workshop on Streaming Systems at 41st International Symposium on Microarchitecture (MICRO-41).(2008).
Randomized Active Atomicity Violation Detection in Concurrent Programs. 16th International Symposium on Foundations of Software Engineering (FSE'08).(2008).
Predictive Typestate Checking of Multithreaded Java Programs. 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE'08).(2008).
Multi-stream spectro-temporal features for robust speech recognition. Proceedings of Interspeech 2008.(2008).
Heuristics for Scalable Dynamic Test Generation. 23rd IEE/ACM International Conference on Automated Software Engineering (ASE '08).(2008).
Hybrid Electric/Photonic Networks for Scientific Applications on Tiled CMPs. Workshop on High Performance Embedded Computing.(2008).
Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics. Hot Interconnects.(2008).
RAMP Blue: Implementation of a Multicore 1008 Processor FPGA System. Reconfigurable Systems Summer Institute.(2008).
Universal Symbolic Execution and its Application to Likely Data Structure Invariant Generation. International Symposium on Software Testing and Analysis (ISSTA '08).(2008).
An FPGA Host-Multithreaded Functional Model for Sparc v8. 35th International Symposium on Computer Architecture (ISCA-35).(2008).
Race Directed Randomized Dynamic Analysis of Concurrent Programs. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI'08).(2008).
Sketching Concurrent Data Structures. ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI'08).(2008).
Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks. 35th International Symposium on Computer Architecture (ISCA-35).(2008).
"Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors. Proceedings of the 1st Annual Workshop on Emerging Applications and Many Core Architecture (EAMA).(2008).
Predictive Design Space Exploration Using Genetically Programmed Response Surfaces. Design Automation Conference.(2008).
A Map Reduce Framework for Programming Graphics Processors. Third Workshop on Software Tools for MultiCore Systems (STMCS).(2008).
Compiling for Vector-thread Architectures. International Symposium on Code Generation and Optimization (CGO.(2008).
Performance without Pain = Productivity, Data Layouts and Collectives in UPC. Principles and Practices of Parallel Programming (PPoPP) 2008.(2008).
Reinventing Audio and Music Computation for Many-Core Processors. Proceedings of the International Computer Music Conference 2008,.(2008).
Power Efficiency in High Performance Computing. High-Performance, Power-Aware Computing (HPPAC 2008).(2008).
Protocol Inference Using Static Path Profiles. Proc. 15th International Static Analysis Symposium (SAS'08),. 78-92.(2008).
OpenMP-centric Performance Analysis of Hybrid Applications. 2008 IEEE International Conference on Cluster Computing (CLUSTER 2008).(2008).
Fast Support Vector Machine Training and Classification on Graphics Processors. International Conference on Machine Learning (ICML).(2008).
Room Acoustics Measurements with and Approximately Spherical Source of 120 Drivers. Journal of the Acoustical Society of America.(2008).
Implementing the Scale Vector-Thread Processor. ACM Transaction on Design Automation of Electronic Systems (TODAES). 13(3),(2008).
LU, QR and Cholesky factorizations using vector capabilities of GPUs. Technical Report No. UCB/EECS-2008-49.(2008).
PERI: Autotuning Memory Intensive Kernels for Multicore. Journal of Physics, SciDAC PI Conference: Conference Series: 123012001.(2008).
Optimization of Sparse-Matrix-Vector Multiplication on Emerging Multicore Platform. Parallel Computing- Special Issue on Revolutionary Hardware.(2007).