Year |
Citation |
Score |
2020 |
Zimmer B, Venkatesan R, Shao YS, Clemons J, Fojtik M, Jiang N, Keller B, Klinefelter A, Pinckney N, Raina P, Tell SG, Zhang Y, Dally WJ, Emer JS, Gray CT, ... Keckler SW, et al. A 0.32–128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm Ieee Journal of Solid-State Circuits. 55: 920-932. DOI: 10.1109/Jssc.2019.2960488 |
0.754 |
|
2019 |
Crago NC, Stephenson M, Keckler SW. Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs Acm Transactions On Architecture and Code Optimization. 15: 45. DOI: 10.1145/3280851 |
0.517 |
|
2018 |
Voitsechov D, Zulfiqar A, Stephenson M, Gebhart M, Keckler SW. Software-Directed Techniques for Improved GPU Register File Utilization Acm Transactions On Architecture and Code Optimization. 15: 38. DOI: 10.1145/3243905 |
0.448 |
|
2016 |
Agarwal N, Nellans D, Ebrahimi E, Wenisch TF, Danskin J, Keckler SW. Selective GPU caches to eliminate CPU-GPU HW cache coherence Proceedings - International Symposium On High-Performance Computer Architecture. 2016: 494-506. DOI: 10.1109/HPCA.2016.7446089 |
0.454 |
|
2016 |
Zheng T, Nellans D, Zulfiqar A, Stephenson M, Keckler SW. Towards high performance paged memory for GPUs Proceedings - International Symposium On High-Performance Computer Architecture. 2016: 345-357. DOI: 10.1109/HPCA.2016.7446077 |
0.446 |
|
2015 |
Jog A, Kayiran O, Kesten T, Pattnaik A, Bolotin E, Chatterjee N, Keckler SW, Kandemir MT, Das CR. Anatomy of GPU memory system for multi-application execution Acm International Conference Proceeding Series. 5: 223-234. DOI: 10.1145/2818950.2818979 |
0.41 |
|
2015 |
Rogers TG, Johnson DR, O'Connor M, Keckler SW. A variable warp size architecture Proceedings - International Symposium On Computer Architecture. 13: 489-501. DOI: 10.1145/2749469.2750410 |
0.402 |
|
2015 |
Agarwal N, Nellans D, Stephenson M, O'Connor M, Keckler SW. Page placement strategies for GPUS within heterogeneous memory systems International Conference On Architectural Support For Programming Languages and Operating Systems - Asplos. 2015: 607-618. DOI: 10.1145/2694344.2694381 |
0.339 |
|
2015 |
Bolotin E, Nellans D, Villa O, O'Connor M, Ramirez A, Keckler SW. Designing Efficient Heterogeneous Memory Architectures Ieee Micro. 35: 60-68. DOI: 10.1109/Mm.2015.72 |
0.534 |
|
2015 |
Lee Y, Grover V, Krashinsky R, Stephenson M, Keckler SW, Asanovic K. Exploring the design space of SPMD divergence management on data-parallel architectures Proceedings of the Annual International Symposium On Microarchitecture, Micro. 2015: 101-113. DOI: 10.1109/MICRO.2014.48 |
0.3 |
|
2015 |
Keckler SW. Increasing interconnection network throughput with virtual channels Computer. 48: 10. DOI: 10.1109/Mc.2015.191 |
0.389 |
|
2015 |
Pekhimenko G, Bolotin E, O'Connor M, Mutlu O, Mowry TC, Keckler SW. Toggle-Aware Compression for GPUs Ieee Computer Architecture Letters. 14: 164-168. DOI: 10.1109/Lca.2015.2430853 |
0.447 |
|
2015 |
Hestness J, Keckler SW, Wood DA. GPU computing pipeline inefficiencies and optimization opportunities in heterogeneous CPU-GPU processors Proceedings - 2015 Ieee International Symposium On Workload Characterization, Iiswc 2015. 87-97. DOI: 10.1109/IISWC.2015.15 |
0.483 |
|
2014 |
Keckler SW. Rethinking caches for throughput processors: technical perspective Communications of the Acm. 57: 90-90. DOI: 10.1145/2682585 |
0.437 |
|
2014 |
Keckler SW. Rethinking caches for throughput processors Communications of the Acm. 57: 90. DOI: 10.1145/2682583 |
0.447 |
|
2014 |
Huh J, Kim C, Shafi H, Zhang L, Burger D, Keckler SW. Author retrospective for a NUCA substrate for flexible CMP cache sharing Proceedings of the International Conference On Supercomputing. 74-76. DOI: 10.1145/2591635.2591667 |
0.373 |
|
2014 |
Jog A, Bolotin E, Guz Z, Parker M, Keckler SW, Kandemir MT, Das CR. Application-aware memory system for fair and efficient execution of concurrent GPGPU applications Acm International Conference Proceeding Series. 1-8. DOI: 10.1145/2576779.2576780 |
0.456 |
|
2014 |
Huh J, Kim C, Shafi H, Zhang L, Burger D, Keckler SW. A NUCA substrate for flexible CMP cache sharing Proceedings of the International Conference On Supercomputing. 380-389. DOI: 10.1109/Tpds.2007.1091 |
0.508 |
|
2014 |
Govindan MSS, Robatmili B, Li D, Maher BA, Smith A, Keckler SW, Burger D. Scaling power and performance viaprocessor composability Ieee Transactions On Computers. 63: 2025-2038. DOI: 10.1109/Tc.2013.48 |
0.341 |
|
2014 |
Keckler SW. 2014 International symposium on computer architecture influential paper award Ieee Micro. 34: 95-96. DOI: 10.1109/Mm.2014.91 |
0.335 |
|
2014 |
Hestness J, Keckler SW, Wood DA. A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior Iiswc 2014 - Ieee International Symposium On Workload Characterization. 150-160. DOI: 10.1109/IISWC.2014.6983054 |
0.459 |
|
2013 |
Lee Y, Krashinsky R, Grover V, Keckler SW, Asanovic K. Convergence and scalarization for data-parallel architectures Proceedings of the 2013 Ieee/Acm International Symposium On Code Generation and Optimization, Cgo 2013. DOI: 10.1109/CGO.2013.6494995 |
0.426 |
|
2012 |
Gebhart M, Johnson DR, Tarjan D, Keckler SW, Dally WJ, Lindholm E, Skadron K. A hierarchical thread scheduler and register file for energy-efficient throughput processors Acm Transactions On Computer Systems. 30. DOI: 10.1145/2166879.2166882 |
0.47 |
|
2012 |
Grot B, Hestness J, Keckler S, Mutlu O. A QoS-enabled on-die interconnect fabric for kilo-node chips Ieee Micro. 32: 17-25. DOI: 10.1109/Mm.2012.18 |
0.441 |
|
2012 |
Gebhart M, Keckler SW, Khailany B, Krashinsky R, Dally WJ. Unifying primary cache, scratch, and register file memories in a throughput processor Proceedings - 2012 Ieee/Acm 45th International Symposium On Microarchitecture, Micro 2012. 96-106. DOI: 10.1109/MICRO.2012.18 |
0.482 |
|
2012 |
Grot B, Keckler SW, Mutlu O. Topology-aware quality-of-service support in highly integrated chip multiprocessors Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 6161: 357-375. DOI: 10.1007/978-3-642-24322-6_28 |
0.313 |
|
2011 |
Gebhart M, Keckler SW, Dally WJ. A compile-time managed multi-level register file hierarchy Proceedings of the Annual International Symposium On Microarchitecture, Micro. 465-476. DOI: 10.1145/2155620.2155675 |
0.32 |
|
2011 |
Grot B, Hestness J, Keckler SW, Mutlu O. Kilo-NOC: A heterogeneous network-on-chip architecture for scalability and service guarantees Proceedings - International Symposium On Computer Architecture. 401-412. DOI: 10.1145/2000064.2000112 |
0.332 |
|
2011 |
Keckler SW, Dally WJ, Khailany B, Garland M, Glasco D. GPUs and the future of parallel computing Ieee Micro. 31: 7-17. DOI: 10.1109/Mm.2011.89 |
0.744 |
|
2010 |
Hestness J, Grot B, Keckler SW. Netrace: Dependency-driven trace-based network-on-chip simulation 3rd International Workshop On Network On Chip Architectures, Nocarc 2010, in Conjunction With the 43rd Annual Ieee/Acm International Symposium On Microarchitecture, Micro-43. 31-36. DOI: 10.1145/1921249.1921258 |
0.327 |
|
2009 |
Grot B, Keckler SW, Mutlu O. Preemptive virtual clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip Proceedings of the Annual International Symposium On Microarchitecture, Micro. 268-279. DOI: 10.1145/1669112.1669149 |
0.415 |
|
2009 |
Grot B, Hestness J, Keckler SW, Mutlu O. Express cube topologies for on-chip interconnects Proceedings - International Symposium On High-Performance Computer Architecture. 163-174. DOI: 10.1109/HPCA.2009.4798251 |
0.312 |
|
2008 |
Gulati DP, Kim C, Sethumadhavan S, Keckler SW, Burger D. Multitasking workload scheduling on flexible-core chip multiprocessors Parallel Architectures and Compilation Techniques - Conference Proceedings, Pact. 187-196. DOI: 10.1145/1399972.1399981 |
0.443 |
|
2008 |
Roesner F, Burger D, Keckler SW. Counting dependence predictors Proceedings - International Symposium On Computer Architecture. 215-226. DOI: 10.1109/ISCA.2008.6 |
0.305 |
|
2008 |
Diamond J, Robatmili B, Keckler SW, Van De Geijn R, Goto K, Burger D. High performance dense linear algebra on a spatially distributed processor Proceedings of the Acm Sigplan Symposium On Principles and Practice of Parallel Programming, Ppopp. 63-72. |
0.347 |
|
2007 |
Mudigonda J, Vin HM, Keckler SW. Reconciling performance and programmability in networking systems Acm Sigcomm 2007: Conference On Computer Communications. 73-84. DOI: 10.1145/1282380.1282390 |
0.5 |
|
2007 |
Owens JD, Dally WJ, Ho R, Jayashima DN, Keckler SW, Peh LS. Research challenges for on-chip interconnection networks Ieee Micro. 27: 96-108. DOI: 10.1109/Mm.2007.91 |
0.621 |
|
2007 |
Gratz P, Kim C, Sankaralingam K, Hanson H, Shivakumar P, Keckler SW, Burger D. On-chip interconnection networks of the TRIPS chip Ieee Micro. 27: 41-50. DOI: 10.1109/Mm.2007.90 |
0.771 |
|
2007 |
Kim C, Sethumadhavan S, Gulati D, Burger D, Govindan MS, Ranganathan N, Keckler SW. Composable lightweight processors Proceedings of the Annual International Symposium On Microarchitecture, Micro. 381-393. DOI: 10.1109/MICRO.2007.41 |
0.384 |
|
2006 |
Agaram KK, Keckler SW, Lin C, McKinley KS. Decomposing memory performance: Data structures and phases International Symposium On Memory Management, Ismm. 2006: 95-103. DOI: 10.1145/1133956.1133970 |
0.713 |
|
2006 |
Sankaralingam K, Nagarajan R, McDonald R, Desikan R, Drolia S, Govindan MS, Gratz P, Gulati D, Hanson H, Kim C, Liu H, Ranganathan N, Sethumadhavan S, Sharif S, Shivakumar P, ... Keckler SW, et al. Distributed microarchitectural protocols in the TRIPS prototype processor Proceedings of the Annual International Symposium On Microarchitecture, Micro. 480-491. DOI: 10.1109/MICRO.2006.19 |
0.749 |
|
2006 |
Smith A, Nagarajan R, Sankaralingam K, McDonald R, Burger D, Keckler SW, McKinley KS. Dataflow predication Proceedings of the Annual International Symposium On Microarchitecture, Micro. 89-100. DOI: 10.1109/MICRO.2006.17 |
0.66 |
|
2006 |
Gratz P, Kim C, McDonald R, Keckler SW, Burger D. Implementation and evaluation of on-chip network architectures Ieee International Conference On Computer Design, Iccd 2006. 477-484. DOI: 10.1109/ICCD.2006.4380859 |
0.359 |
|
2006 |
Sethumadhavan S, McDonald R, Desikan R, Burger D, Keckler SW. Design and implementation of the TRIPS primary memory system Ieee International Conference On Computer Design, Iccd 2006. 470-476. DOI: 10.1109/ICCD.2006.4380858 |
0.423 |
|
2006 |
Agaram KK, Keckler SW, Lin C, McKinley KS. The memory behavior of data structures in C SPEC CPU2000 benchmarks 2006 Spec Benchmark Workshop. |
0.716 |
|
2006 |
Nagarajan R, Xia C, McDonald RG, Burger D, Keckler SW. Critical path analysis of the TRIPS architecture Ispass 2006: Ieee International Symposium On Performance Analysis of Systems and Software, 2006. 2006: 37-47. |
0.345 |
|
2004 |
Sankaralingam K, Nagarajan R, Liu H, Kim C, Huh J, Ranganathan N, Burger D, Keckler SW, McDonald RG, Moore CR. TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP Acm Transactions On Architecture and Code Optimization. 1: 62-93. DOI: 10.1145/980152.980156 |
0.71 |
|
2004 |
Desikan R, Sethumadhavan S, Burger D, Keckler SW. Scalable selective re-execution for EDGE architectures Operating Systems Review (Acm). 38: 120-132. DOI: 10.1145/1037949.1024408 |
0.387 |
|
2004 |
Nagarajan R, Kushwaha SK, Burger D, McKinley KS, Lin C, Keckler SW. Static Placement, Dynamic Issue (SPDI) scheduling for EDGE architectures Parallel Architectures and Compilation Techniques - Conference Proceedings, Pact. 74-84. DOI: 10.1109/PACT.2004.1342543 |
0.34 |
|
2004 |
Sethumadhavan S, Desikan R, Burger D, Moore CR, Keckler SW. Scalable hardware memory disambiguation for HIGH-ILP processors Ieee Micro. 24: 118-127. DOI: 10.1109/MM.2004.87 |
0.38 |
|
2004 |
Burger D, Keckler SW, McKinley KS, Dahlin M, John LK, Lin C, Moore CR, Burrill J, McDonald RG, Yoder W. Scaling to the end of silicon with EDGE architectures Computer. 37: 44-55. DOI: 10.1109/Mc.2004.65 |
0.352 |
|
2004 |
Desikan R, Sethumadhavan S, Burger D, Keckler SW. Scalable selective re-execution for EDGE architectures 11th International Conference On Architectural Support For Programming Languages and Operating Systems, Asplos Xi. 120-132. |
0.387 |
|
2003 |
Hanson H, Hrishikesh MS, Agarwal V, Keckler SW, Burger D. Static energy reduction techniques for microprocessor caches Ieee Transactions On Very Large Scale Integration (Vlsi) Systems. 11: 303-313. DOI: 10.1109/Tvlsi.2003.812370 |
0.702 |
|
2003 |
Kim C, Burger D, Keckler SW. Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches Ieee Micro. 23. DOI: 10.1109/Mm.2003.1261393 |
0.5 |
|
2003 |
Sankaralingam K, Nagarajan R, Liu H, Kim C, Huh J, Burger D, Keckler SW, Moore CR. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture Conference Proceedings - Annual International Symposium On Computer Architecture, Isca. 422-433. DOI: 10.1109/Mm.2003.1261386 |
0.707 |
|
2003 |
Karthikeyan Sankaralingam, Keckler SW, Mark WR, Burger D. Universal mechanisms for data-parallel architectures Proceedings of the Annual International Symposium On Microarchitecture, Micro. 2003: 303-314. DOI: 10.1109/MICRO.2003.1253204 |
0.458 |
|
2003 |
Shivakumar P, Keckler SW, Moore CR, Burger D. Exploiting microarchitectural redundancy for defect tolerance Proceedings - Ieee International Conference On Computer Design: Vlsi in Computers and Processors. 481-488. DOI: 10.1109/ICCD.2012.6378613 |
0.604 |
|
2003 |
Keckler SW, Burger D, Moore CR, Nagarajan R, Sankaralingam K, Agarwal V, Hrishikesh MS, Ranganathan N, Shivakumar P. A wire-delay scalable microprocessor architecture for high performance systems Digest of Technical Papers - Ieee International Solid-State Circuits Conference. |
0.742 |
|
2003 |
Sankaralingam K, Singh VA, Keckler SW, Burger D. Routed Inter-ALU Networks for ILP scalability and performance Proceedings - Ieee International Conference On Computer Design: Vlsi in Computers and Processors. 170-177. |
0.653 |
|
2002 |
Kim C, Burger D, Keckler SW. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches International Conference On Architectural Support For Programming Languages and Operating Systems - Asplos. 211-222. DOI: 10.1145/635508.605420 |
0.376 |
|
2002 |
Shivakumar P, Kistler M, Keckler SW, Burger D, Alvisi L. Modeling the effect of technology trends on the soft error rate of combinational logic Proceedings of the 2002 International Conference On Dependable Systems and Networks. 389-398. DOI: 10.1109/DSN.2002.1028924 |
0.602 |
|
2002 |
Hrishikesh MS, Jouppi NP, Farkas KI, Burger D, Keckler SW, Shivakumar P. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays Conference Proceedings - Annual International Symposium On Computer Architecture, Isca. 14-24. |
0.599 |
|
2001 |
Agaram K, Keckler SW, Burger D. A characterization of speech recognition on modern computer systems 2001 Ieee International Workshop On Workload Characterization, Wwc 2001. 45-53. DOI: 10.1109/WWC.2001.990743 |
0.355 |
|
2001 |
Nagarajan R, Sankaralingam K, Burger D, Keckler SW. A design space evaluation of grid processor architectures Proceedings of the Annual International Symposium On Microarchitecture. 40-51. |
0.725 |
|
2000 |
Carter NP, Dally WJ, Lee WS, Keckler SW, Chang A. Processor mechanisms for software shared memory Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 1940: 120-133. |
0.412 |
|
1999 |
Keckler SW, Chang A, Lee WS, Chatterjee S, Dally WJ. Concurrent event handling through multithreading Ieee Transactions On Computers. 48: 903-916. DOI: 10.1109/12.795220 |
0.367 |
|
1998 |
Lee WS, Dally WJ, Keckler SW, Carter NP, Chang A. An efficient, protected message interface Computer. 31: 69-75. DOI: 10.1109/2.730739 |
0.345 |
|
1997 |
Fillo M, Keckler SW, Dally WJ, Carter NP, Chang A, Gurevich Y, Lee WS. The M-machine multicomputer International Journal of Parallel Programming. 25: 183-212. DOI: 10.1007/Bf02700035 |
0.434 |
|
Show low-probability matches. |