Richard W. Vuduc, Ph.D. - Publications

Affiliations:

1997-2004

Computer Science Division

University of California, Berkeley, Berkeley, CA, United States

Area:

High-performance computing, performance engineering, autotuning

Website:

https://vuduc.org

Year	Citation	Score
2020	Li Z, Jia H, Zhang Y, Chen T, Yuan L, Vuduc R. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs Ieee Transactions On Parallel and Distributed Systems. 31: 1925-1941. DOI: 10.1109/Tpds.2020.2977629	0.522
2019	Sao P, Li XS, Vuduc R. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems Journal of Parallel and Distributed Computing. 131: 218-234. DOI: 10.1016/J.Jpdc.2019.03.004	0.358
2019	Ma Y, Li J, Wu X, Yan C, Sun J, Vuduc R. Optimizing sparse tensor times matrix on GPUs Journal of Parallel and Distributed Computing. 129: 99-109. DOI: 10.1016/J.Jpdc.2018.07.018	0.576
2018	Hossain MM, Nath C, Tucker TM, Vuduc RW, Kurfess TR. A Graphics Processor Unit-Accelerated Freeform Surface Offsetting Method for High-Resolution Subtractive Three-Dimensional Printing (Machining) Journal of Manufacturing Science and Engineering. 140. DOI: 10.1115/1.4038599	0.459
2017	Du Z, Ge R, Lee VW, Vuduc R, Bader DA, He L. Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems Scientific Programming. 2017: 1-13. DOI: 10.1155/2017/8686971	0.354
2017	You Y, Demmel J, Czechowski K, Song L, Vuduc R. Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines Ieee Transactions On Parallel and Distributed Systems. 28: 974-988. DOI: 10.1109/Tpds.2016.2608823	0.706
2016	Wu Z, Tucker TM, Nath C, Kurfess TR, Vuduc RW. Step Ring-Based Three-Dimensional Path Planning Via Graphics Processing Unit Simulation for Subtractive Three-Dimensional Printing Journal of Manufacturing Science and Engineering. 139. DOI: 10.1115/1.4034662	0.402
2016	Hossain MM, Tucker TM, Kurfess TR, Vuduc RW. Hybrid Dynamic Trees for Extreme-Resolution 3D Sparse Data Modeling Proceedings - 2016 Ieee 30th International Parallel and Distributed Processing Symposium, Ipdps 2016. 132-141. DOI: 10.1109/IPDPS.2016.75	0.305
2015	Park S, Vuduc R, Harrold MJ. UNICORN: A unified approach for localizing non-deadlock concurrency bugs Software Testing Verification and Reliability. 25: 167-190. DOI: 10.1002/Stvr.1523	0.414
2014	Choi J, Chandramowlishwaran A, Madduri K, Vuduc R. A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method Acm International Conference Proceeding Series. 64-71. DOI: 10.1145/2576779.2576787	0.442
2014	Choi J, Dukhan M, Liu X, Vuduc R. Algorithmic time, energy, and power on candidate HPC compute building blocks Proceedings of the International Parallel and Distributed Processing Symposium, Ipdps. 447-457. DOI: 10.1109/IPDPS.2014.54	0.349
2014	Dukhan M, Vuduc R. Methods for high-throughput computation of elementary functions Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8384: 86-95. DOI: 10.1007/978-3-642-55224-3_9	0.366
2014	Sao P, Vuduc R, Li XS. A distributed CPU-GPU sparse direct solver Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8632: 487-498. DOI: 10.1007/978-3-319-09873-9_41	0.305
2014	Lee D, Sao P, Vuduc R, Gray AG. A distributed kernel summation framework for general-dimension machine learning Statistical Analysis and Data Mining. 7: 1-13. DOI: 10.1002/Sam.11207	0.463
2013	Czechowski K, Vuduc R. A theoretical framework for algorithm-architecture co-design Proceedings - Ieee 27th International Parallel and Distributed Processing Symposium, Ipdps 2013. 791-802. DOI: 10.1109/IPDPS.2013.99	0.332
2012	Kim H, Vuduc R, Baghsorkhi S, Hwu WM, Jee Choi. Performance analysis and tuning for general purpose graphics processing units (GPGPU) Synthesis Lectures On Computer Architecture. 20: 1-94. DOI: 10.2200/S00451ED1V01Y201209CAC020	0.409
2012	Sim J, Dasgupta A, Kim H, Vuduc R. A performance analysis framework for identifying potential benefits in GPGPU applications Acm Sigplan Notices. 47: 11-21. DOI: 10.1145/2370036.2145819	0.362
2012	Chandramowlishwaran A, Choi JW, Madduri K, Vuduc R. Brief announcement: Towards a communication optimal Fast Multipole Method and its implications at exascale Annual Acm Symposium On Parallelism in Algorithms and Architectures. 182-184. DOI: 10.1145/2312005.2312039	0.412
2012	Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G. A massively parallel adaptive fast multipole method on heterogeneous architectures Communications of the Acm. 55: 101-109. DOI: 10.1145/2160718.2160740	0.428
2012	Lee J, Kim H, Vuduc R. When prefetching works, when it doesn't, and why Transactions On Architecture and Code Optimization. 9. DOI: 10.1145/2133382.2133384	0.386
2012	Chandramowlishwaran A, Vuduc RW. Communication-optimal parallel N-body solvers Proceedings of the 2012 Ieee 26th International Parallel and Distributed Processing Symposium Workshops, Ipdpsw 2012. 2462-2465. DOI: 10.1109/IPDPSW.2012.303	0.374
2012	Park S, Vuduc R, Harrold MJ. A unified approach for localizing non-deadlock concurrency bugs Proceedings - Ieee 5th International Conference On Software Testing, Verification and Validation, Icst 2012. 51-60. DOI: 10.1109/ICST.2012.85	0.323
2011	Vuduc R, Czechowski K. What GPU computing means for high-end systems Ieee Micro. 31: 74-78. DOI: 10.1109/Mm.2011.78	0.427
2010	Lishwaran AC, Knobe K, Vuduc R. Applying the concurrent collections programming model to asynchronous parallel dense linear algebra Acm Sigplan Notices. 45: 345-346. DOI: 10.1145/1837853.1693506	0.355
2010	Choi JW, Singh A, Vuduc RW. Model-driven autotuning of sparse matrix-vector multiply on GPUs Acm Sigplan Notices. 45: 115-125. DOI: 10.1145/1837853.1693471	0.47
2010	Chandramowlishwaran A, Madduri K, Vuduc R. Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method 2010 Acm/Ieee International Conference For High Performance Computing, Networking, Storage and Analysis, Sc 2010. DOI: 10.1109/SC.2010.19	0.349
2010	Lee J, Lakshminarayana NB, Kim H, Vuduc R. Many-thread aware prefetching mechanisms for GPGPU applications Proceedings of the Annual International Symposium On Microarchitecture, Micro. 213-224. DOI: 10.1109/MICRO.2010.44	0.39
2010	Chandramowlishwaran A, Williams S, Oliker L, Lashuk I, Biros G, Vuduc R. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures Proceedings of the 2010 Ieee International Symposium On Parallel and Distributed Processing, Ipdps 2010. DOI: 10.1109/IPDPS.2010.5470415	0.394
2009	Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G. A massively parallel adaptive fast-multipole method on heterogeneous architectures Proceedings of the Conference On High Performance Computing Networking, Storage and Analysis, Sc '09. DOI: 10.1145/1654059.1654118	0.391
2009	Kang S, Bader DA, Vuduc R. Understanding the design trade-offs among current multicore systems for numerical computations Ipdps 2009 - Proceedings of the 2009 Ieee International Parallel and Distributed Processing Symposium. DOI: 10.1109/IPDPS.2009.5161055	0.325
2009	Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of sparse matrix-vector multiplication on emerging multicore platforms Parallel Computing. 35: 178-194. DOI: 10.1016/j.parco.2008.12.006	0.728
2007	Nishtala R, Vuduc RW, Demmel JW, Yelick KA. When cache blocking of sparse matrix vector multiply works and why Applicable Algebra in Engineering, Communications and Computing. 18: 297-311. DOI: 10.1007/S00200-007-0038-9	0.646
2005	Demmel J, Dongarra J, Eijkhout V, Fuentes E, Petitet A, Vuduc R, Whaley RC, Yelick K. Self-adapting Linear Algebra algorithms and software Proceedings of the Ieee. 93: 293-311. DOI: 10.1109/JPROC.2004.840848	0.601
2005	Vuduc R, Demmel JW, Yelick KA. OSKI: A library of automatically tuned sparse matrix kernels Journal of Physics: Conference Series. 16: 521-530. DOI: 10.1088/1742-6596/16/1/071	0.724
2005	Vuduc RW, Moon HJ. Fast sparse matrix-vector multiplication by exploiting variable block structure Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 3726: 807-816. DOI: 10.1007/11557654_91	0.497
2004	Im EJ, Yelick K, Vuduc R. Sparsity: Optimization framework for sparse matrix kernels International Journal of High Performance Computing Applications. 18: 135-158.	0.72
2004	Vuduc R, Demmel JW, Bilmes JA. Statistical models for empirical search-based performance tuning International Journal of High Performance Computing Applications. 18: 65-94.	0.674
2004	Lee Benjamin BC, Vuduc RW, Demmel JW, Yelick KA. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply Proceedings of the International Conference On Parallel Processing. 169-176.	0.75
2003	Vuduc R, Gyulassy A, Demmel JW, Yelick KA. Memory hierarchy optimizations and performance bounds for sparse ATAx Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2659: 705-714.	0.74
2001	Vuduc R, Demmel JW, Bilmes J. Statistical models for automatic performance tuning Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2073: 117-126.	0.655
2000	Vuduc R, Demmel JW. Code generators for automatic tuning of Numerical Kernels: Experiences with FFTW position paper Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 1924: 190-211.	0.637
Show low-probability matches.