Richard W. Vuduc, Ph.D. - Publications

Affiliations: 
1997-2004 Computer Science Division University of California, Berkeley, Berkeley, CA, United States 
Area:
High-performance computing, performance engineering, autotuning
Website:
https://vuduc.org

41 high-probability publications. We are testing a new system for linking publications to authors. You can help! If you notice any inaccuracies, please sign in and mark papers as correct or incorrect matches. If you identify any major omissions or other inaccuracies in the publication list, please let us know.

Year Citation  Score
2020 Li Z, Jia H, Zhang Y, Chen T, Yuan L, Vuduc R. Automatic Generation of High-Performance FFT Kernels on Arm and X86 CPUs Ieee Transactions On Parallel and Distributed Systems. 31: 1925-1941. DOI: 10.1109/Tpds.2020.2977629  0.522
2019 Sao P, Li XS, Vuduc R. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems Journal of Parallel and Distributed Computing. 131: 218-234. DOI: 10.1016/J.Jpdc.2019.03.004  0.358
2019 Ma Y, Li J, Wu X, Yan C, Sun J, Vuduc R. Optimizing sparse tensor times matrix on GPUs Journal of Parallel and Distributed Computing. 129: 99-109. DOI: 10.1016/J.Jpdc.2018.07.018  0.576
2018 Hossain MM, Nath C, Tucker TM, Vuduc RW, Kurfess TR. A Graphics Processor Unit-Accelerated Freeform Surface Offsetting Method for High-Resolution Subtractive Three-Dimensional Printing (Machining) Journal of Manufacturing Science and Engineering. 140. DOI: 10.1115/1.4038599  0.459
2017 Du Z, Ge R, Lee VW, Vuduc R, Bader DA, He L. Modeling the Power Variability of Core Speed Scaling on Homogeneous Multicore Systems Scientific Programming. 2017: 1-13. DOI: 10.1155/2017/8686971  0.354
2017 You Y, Demmel J, Czechowski K, Song L, Vuduc R. Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines Ieee Transactions On Parallel and Distributed Systems. 28: 974-988. DOI: 10.1109/Tpds.2016.2608823  0.706
2016 Wu Z, Tucker TM, Nath C, Kurfess TR, Vuduc RW. Step Ring-Based Three-Dimensional Path Planning Via Graphics Processing Unit Simulation for Subtractive Three-Dimensional Printing Journal of Manufacturing Science and Engineering. 139. DOI: 10.1115/1.4034662  0.402
2016 Hossain MM, Tucker TM, Kurfess TR, Vuduc RW. Hybrid Dynamic Trees for Extreme-Resolution 3D Sparse Data Modeling Proceedings - 2016 Ieee 30th International Parallel and Distributed Processing Symposium, Ipdps 2016. 132-141. DOI: 10.1109/IPDPS.2016.75  0.305
2015 Park S, Vuduc R, Harrold MJ. UNICORN: A unified approach for localizing non-deadlock concurrency bugs Software Testing Verification and Reliability. 25: 167-190. DOI: 10.1002/Stvr.1523  0.414
2014 Choi J, Chandramowlishwaran A, Madduri K, Vuduc R. A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method Acm International Conference Proceeding Series. 64-71. DOI: 10.1145/2576779.2576787  0.442
2014 Choi J, Dukhan M, Liu X, Vuduc R. Algorithmic time, energy, and power on candidate HPC compute building blocks Proceedings of the International Parallel and Distributed Processing Symposium, Ipdps. 447-457. DOI: 10.1109/IPDPS.2014.54  0.349
2014 Dukhan M, Vuduc R. Methods for high-throughput computation of elementary functions Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8384: 86-95. DOI: 10.1007/978-3-642-55224-3_9  0.366
2014 Sao P, Vuduc R, Li XS. A distributed CPU-GPU sparse direct solver Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 8632: 487-498. DOI: 10.1007/978-3-319-09873-9_41  0.305
2014 Lee D, Sao P, Vuduc R, Gray AG. A distributed kernel summation framework for general-dimension machine learning Statistical Analysis and Data Mining. 7: 1-13. DOI: 10.1002/Sam.11207  0.463
2013 Czechowski K, Vuduc R. A theoretical framework for algorithm-architecture co-design Proceedings - Ieee 27th International Parallel and Distributed Processing Symposium, Ipdps 2013. 791-802. DOI: 10.1109/IPDPS.2013.99  0.332
2012 Kim H, Vuduc R, Baghsorkhi S, Hwu WM, Jee Choi. Performance analysis and tuning for general purpose graphics processing units (GPGPU) Synthesis Lectures On Computer Architecture. 20: 1-94. DOI: 10.2200/S00451ED1V01Y201209CAC020  0.409
2012 Sim J, Dasgupta A, Kim H, Vuduc R. A performance analysis framework for identifying potential benefits in GPGPU applications Acm Sigplan Notices. 47: 11-21. DOI: 10.1145/2370036.2145819  0.362
2012 Chandramowlishwaran A, Choi JW, Madduri K, Vuduc R. Brief announcement: Towards a communication optimal Fast Multipole Method and its implications at exascale Annual Acm Symposium On Parallelism in Algorithms and Architectures. 182-184. DOI: 10.1145/2312005.2312039  0.412
2012 Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G. A massively parallel adaptive fast multipole method on heterogeneous architectures Communications of the Acm. 55: 101-109. DOI: 10.1145/2160718.2160740  0.428
2012 Lee J, Kim H, Vuduc R. When prefetching works, when it doesn't, and why Transactions On Architecture and Code Optimization. 9. DOI: 10.1145/2133382.2133384  0.386
2012 Chandramowlishwaran A, Vuduc RW. Communication-optimal parallel N-body solvers Proceedings of the 2012 Ieee 26th International Parallel and Distributed Processing Symposium Workshops, Ipdpsw 2012. 2462-2465. DOI: 10.1109/IPDPSW.2012.303  0.374
2012 Park S, Vuduc R, Harrold MJ. A unified approach for localizing non-deadlock concurrency bugs Proceedings - Ieee 5th International Conference On Software Testing, Verification and Validation, Icst 2012. 51-60. DOI: 10.1109/ICST.2012.85  0.323
2011 Vuduc R, Czechowski K. What GPU computing means for high-end systems Ieee Micro. 31: 74-78. DOI: 10.1109/Mm.2011.78  0.427
2010 Lishwaran AC, Knobe K, Vuduc R. Applying the concurrent collections programming model to asynchronous parallel dense linear algebra Acm Sigplan Notices. 45: 345-346. DOI: 10.1145/1837853.1693506  0.355
2010 Choi JW, Singh A, Vuduc RW. Model-driven autotuning of sparse matrix-vector multiply on GPUs Acm Sigplan Notices. 45: 115-125. DOI: 10.1145/1837853.1693471  0.47
2010 Chandramowlishwaran A, Madduri K, Vuduc R. Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method 2010 Acm/Ieee International Conference For High Performance Computing, Networking, Storage and Analysis, Sc 2010. DOI: 10.1109/SC.2010.19  0.349
2010 Lee J, Lakshminarayana NB, Kim H, Vuduc R. Many-thread aware prefetching mechanisms for GPGPU applications Proceedings of the Annual International Symposium On Microarchitecture, Micro. 213-224. DOI: 10.1109/MICRO.2010.44  0.39
2010 Chandramowlishwaran A, Williams S, Oliker L, Lashuk I, Biros G, Vuduc R. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures Proceedings of the 2010 Ieee International Symposium On Parallel and Distributed Processing, Ipdps 2010. DOI: 10.1109/IPDPS.2010.5470415  0.394
2009 Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G. A massively parallel adaptive fast-multipole method on heterogeneous architectures Proceedings of the Conference On High Performance Computing Networking, Storage and Analysis, Sc '09. DOI: 10.1145/1654059.1654118  0.391
2009 Kang S, Bader DA, Vuduc R. Understanding the design trade-offs among current multicore systems for numerical computations Ipdps 2009 - Proceedings of the 2009 Ieee International Parallel and Distributed Processing Symposium. DOI: 10.1109/IPDPS.2009.5161055  0.325
2009 Williams S, Oliker L, Vuduc R, Shalf J, Yelick K, Demmel J. Optimization of sparse matrix-vector multiplication on emerging multicore platforms Parallel Computing. 35: 178-194. DOI: 10.1016/j.parco.2008.12.006  0.728
2007 Nishtala R, Vuduc RW, Demmel JW, Yelick KA. When cache blocking of sparse matrix vector multiply works and why Applicable Algebra in Engineering, Communications and Computing. 18: 297-311. DOI: 10.1007/S00200-007-0038-9  0.646
2005 Demmel J, Dongarra J, Eijkhout V, Fuentes E, Petitet A, Vuduc R, Whaley RC, Yelick K. Self-adapting Linear Algebra algorithms and software Proceedings of the Ieee. 93: 293-311. DOI: 10.1109/JPROC.2004.840848  0.601
2005 Vuduc R, Demmel JW, Yelick KA. OSKI: A library of automatically tuned sparse matrix kernels Journal of Physics: Conference Series. 16: 521-530. DOI: 10.1088/1742-6596/16/1/071  0.724
2005 Vuduc RW, Moon HJ. Fast sparse matrix-vector multiplication by exploiting variable block structure Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 3726: 807-816. DOI: 10.1007/11557654_91  0.497
2004 Im EJ, Yelick K, Vuduc R. Sparsity: Optimization framework for sparse matrix kernels International Journal of High Performance Computing Applications. 18: 135-158.  0.72
2004 Vuduc R, Demmel JW, Bilmes JA. Statistical models for empirical search-based performance tuning International Journal of High Performance Computing Applications. 18: 65-94.  0.674
2004 Lee Benjamin BC, Vuduc RW, Demmel JW, Yelick KA. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply Proceedings of the International Conference On Parallel Processing. 169-176.  0.75
2003 Vuduc R, Gyulassy A, Demmel JW, Yelick KA. Memory hierarchy optimizations and performance bounds for sparse ATAx Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2659: 705-714.  0.74
2001 Vuduc R, Demmel JW, Bilmes J. Statistical models for automatic performance tuning Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2073: 117-126.  0.655
2000 Vuduc R, Demmel JW. Code generators for automatic tuning of Numerical Kernels: Experiences with FFTW position paper Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 1924: 190-211.  0.637
Show low-probability matches.