1987 — 1990 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Concurrent Error Detection Circuits For Vlsi Applications |
0.915 |
1989 — 1991 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Design For Robust Testability of Cmos Vlsi Circuits
This research is to develop test procedures for robust testing and design for testability (DFT) of CMOS circuits. The most likely faults in CMOS circuits are stuck-open, stuck-at and bridging faults. Thus the traditional stuck-at fault model is inadequate for MOS Technologies. Furthermore, a test set for a CMOS circuit may be invalidated by arbitrary delay signals and charge distribution among the internal nodes of the CMOS gates in the circuit. Professor Jha is developing methods to generate test sets to robustly test for faults in spite of delays and charge distribution. His approach is based on two- pattern tests to detect faults from a comprehensive fault model via logic and current monitoring. This avoids an increase in test generation time in order to gain comprehensive fault coverage. The DFT research is based on a theoretical result of Professor Jha that a universal test set can be found which is valid for all implementations from a class of CMOS circuits. Finally, testing issues related to dynamic CMOS circuits are being investigated because they enjoy the advantage of greater testability over fully complementary CMOS circuits.
|
0.915 |
1990 — 1993 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Design and Synthesis of Self-Checking Vlsi Circuits and Systems
This research is on developing a new fault tolerant design method, based on the self-checking checker designs. The focus is on combinational and sequential circuits based on self-check concepts and exploitation of self-checking systems for testability. A design methodology based on theoretical results for the design of self-checking circuits is being investigated. Efficient codes, which can be used to encode the outputs of functional circuits such that the overhead is low, are being used in the research. The work is oriented primarily to CMOS technology.
|
0.915 |
1994 — 1997 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
High-Level Synthesis For Hierarchical Testability
This research is concerned with finding efficient hierarchical testability techniques for controller-data path systems. The approach is to start with module level test sets, derived for any suitable fault model, and use high level synthesis to ensure that these test sets can be combined into a system level test set which can provide complete test coverage of all the embedded modules. The aim is to invent algorithms that reduce test generation and application times, yet obtain complete, or nearly complete, system level test coverage with little or no area and delay overhead. The testability techniques are being embedded into algorithms for scheduling and allocation. Also being explored are methods for performing synthesis with both low power and testability as design criteria.
|
0.915 |
1995 — 1998 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Fault Tolerance in Distributed Systems
This project is concerned with a novel, cost-effective scheme, called task-based fault tolerance (TBFT), for providing fault tolerance to heterogeneous distributed computing systems. The technique considers computations at the task level and places assertion checks on the tasks to achieve fault tolerance. It requires no hardware overhead and relatively small amount of delay overhead. Allocation and scheduling of the taks is performed in the task graph to support fault tolerance, while at the same time an attempt is made minimize the schedule length and obtain a balanced load. Simulation results show that the overhead in schedule length incurred due to the introduction of assertion checks is very small compared with the case when all the tasks in the system are duplicated. The topics under investigation include the following: (1) applying TBFT techniques to multiple faults, (2) developing mixed allocation/scheduling methods to reduce schedule length and fault tolerance overhead, (3) developing methods to ensure safety of operation without much adverse effect on the reliability of the system, (4) developing allocation and scheduling methods which would permit the use of redundancy at the task level to provide fast on-line diagnosis, (5) develop probabilistic diagnosis methods, (6) integrating detection/diagnosis results with a backward error recovery method to minimize the rollback domino effect using asynchronous checkpointing, and (7) developing allocation and scheduling methods for periodic and aperiodic tasks in fault-tolerant real-time distributed systems based on the notion of forward error recovery for safety-critical tasks.
|
0.915 |
1998 — 2001 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Hierarchical Test Generation and Design For Testability of Core-Based Systems
This research is on hierarchical and symbolic test generation and design for testability techniques for core- based systems. The system can be a combination of soft, firm and hard cores. In the case of untestable soft cores, hierarchical testability techniques are being applied to make them testable. Also techniques of adding hardware so the cores are functionally transparent and can propagate test data without information loss are being developed. The approach is to integrate testable and transparent cores to ensure justification of precomputed test sequences. This test generation approach is independent of the bit-width of the cores. The case of firm and hard cores which have testability features built into them by core providers is also being explored. Approaches here include: using information supplied by the core vendor; use of in-built testability features; and providing transparency through glue logic outside the core.
|
0.915 |
2000 — 2004 |
Jha, Niraj Wolf, Marilyn Malik, Sharad (co-PI) [⬀] Martonosi, Margaret [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Instrumentation: Support For System-On-a-Chip and Embedded System Research
EIA-9911078 Margaret Martonosi Princeton University
CISE Research Instrumentation: Instrumentation Support for System-On-A-Chip and Embedded System Research
The Department of Electrical Engineering at Princeton University will purchase a high-end server, workstations, networking hardware, and CAD tools, which will be dedicated to support research in computer engineering. The equipment will be used for several research projects, all generally in the areas of Computer Architecture, Computer-Aided Design, and particularly focused on advancing design and architecture techniques for embedded systems and systems-on-a chip.
In a fundamental paradigm shift system design in the semiconductor industry, entire systems are being built on a single chip, using multiple embedded functional blocks called cores. This has been made possible by the ever-increasing density of chips. The current 0.25-micron technology has made it possible to integrate tens of millions of transistors on one chip, and considerable interest is focused on discussing what the contents of billion-transistor systems-on- a-chip (SOCs) ought to be.
We propose to develop algorithms and tools to provide key technologies with breakthrough potential to semiconductor companies,and to develop efficient software environments and tools to deal with all aspects of the SOC design problem.
|
0.915 |
2003 — 2006 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Application-Specific Instruction Set Processor Synthesis For Low Energy and High Performance Using Extensible Processor Platforms
The objective of the proposed research is to investigate and develop a comprehensive framework for the automatic synthesis of custom processors. The scope of the proposed framework includes all the necessary steps to generate the custom processor's architecture, starting from one or more embedded software programs that it is required to execute. The problems that we will tackle are as follows. We will develop techniques for efficient automatic generation of hardware extensions -- custom instructions and co-processors -- to a base processor platform, for large hierarchical application programs. The complexity of realistic application software, together with the multi-granular nature of the hardware extensions, necessitate the development of new techniques that start from a hierarchical program representation and identify custom instructions and co-processors that maximize performance or minimize energy consumption under given design constraints. The number of individual candidate custom instructions and co-processors can be quite large for even moderately sized programs, and hence the number of combinations thereof is even larger. We will develop techniques to efficiently explore the unified custom instruction and co-processor design space, and select an optimal combination of custom instructions and co-processors (from individual candidates) that maximizes performance or energy efficiency. High levels of hardware re-use are critical for deriving efficient implementations of custom processors. Various custom instructions derived for a given application may exhibit commonality that can be exploited to either reduce the area overhead or improve performance/energy impact under a given area constraint. In addition to conventional fine-grained resource sharing techniques, we will develop new coarse-grained sharing techniques to obtain high quality designs. Software transformations, if appropriately applied to the target application program before attempting to derive hardware extensions, can facilitate the generation of higher quality custom instructions or co-processors, leading to much higher performance and energy gains. We will develop a method to automatically apply a suitable sequence of enabling transformations to the application.
Efficiency and flexibility are two major requirements driving embedded system design. Unfortunately, these two requirements are typically at conflict with each other - performance and energy efficiency are often obtained by hardwiring functionality and optimizing the system in an application-specific manner, which limits flexibility, while flexibility is obtained through configurability and/or programmability, which carry associated overheads. Negotiating this tradeoff is critical in a wide variety of applications, ranging from high-performance systems to battery-driven systems.
Application-specific instruction set processors (ASIPs) offer a good tradeoff between efficiency and flexibility by realizing only the critical operations in the application(s) of interest using custom hardware. Conventional approaches to ASIP design are based on designing and implementing a new instruction set and processor architecture from scratch for each application. Unfortunately, the design turnaround time for such approaches may be large and is comparable to design cycles for custom hardware implementations. The recent evolution of customizable and extensible processor technology, such as Tensilica's Xtensa and ARC's ARCTangent processor cores, has provided embedded system designers with a mechanism to design ASIPs with rapid turnaround times through the use of re-targetable software development tool flows, and configurable soft intellectual property (IP). However, the task of customizing the processor and extending it with custom hardware (instruction units, co-processors, peripherals) are still largely manual and left to the designer's expertise. In order to realize the potential for energy efficiency as well as flexibility that ASIPs offer, it is necessary to develop high-level methodologies that automatically identify application hot-spots and map them to custom hardware that extends the underlying configurable platform.
No algorithms or tools exist for extensible processor platforms that address any of the above problems. We have taken the first step in this direction by developing algorithms and a tool to automatically identify custom instructions for extensible processors. This tool results in an average performance improvement of 3.4X (up to 5.4X) and an average energy-delay product improvement of 12.6X (up to 24.2X).
|
0.915 |
2003 — 2007 |
Jha, Niraj Lee, Ruby [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Architectures and Design Methodologies For Secure Low-Power Embedded Systems
Embedded systems, for example in information appliances and networked sensors, face some of the most demanding security concerns - they are resource constrained while frequently needing to handle sensitive information in physically insecure environments. Security processing can easily overwhelm the limited computation and memory resources of embedded processors, especially with the escalation in the amount of data to be processed and the data rates of high-speed networks. This "performance gap" is compounded by the "battery gap", which is the disparity between energy requirements and slow improvements in battery technology, for secure low-power embedded systems.
This project is an inter-disciplinary study of several core technologies that will enable the design of secure, low-power embedded systems. It spans the fields of security, cryptographic algorithms, embedded processor architecture, computer arithmetic, low power design, and enabling design methodologies and tools. It addresses the performance, energy and security requirements, and their tradeoffs, in embedded processors and systems. The research goals include a comprehensive analysis of the performance requirements and power consumption for security in embedded systems. The project is developing efficient architectures for security processing in low-power embedded systems, including configurable security modules for system-on-chip designs, and architectural guidelines for tiny cryptographic processors for embedded systems. The performance, power and security tradeoffs based on customizations at the protocol, cryptographic algorithm, and hardware and software implementation levels are studied. Design methodology and tools include processor design tools to facilitate the design of new security processing architectures based on open frameworks such as the PLX (hosted at Princeton) and SimpleScalar toolkits. The project explores domain-specific design methodologies that jointly co-design security protocols and processing architectures to meet the required security, performance and power priorities and constraints.
The research enables the design of embedded systems with higher levels of security, while achieving an order of magnitude or more in performance and battery life, compared to conventional approaches. Broad security impacts are expected for embedded system design, in addition to impact for future research and education. Results are disseminated to industry through the Princeton Architecture Lab for Multimedia and Security (PALMS) and the NJCST Center for Embedded System-on-a-Chip Design. The research results are being woven into graduate and undergraduate courses.
|
0.915 |
2003 — 2008 |
Peh, Li-Shiuan [⬀] Prucnal, Paul (co-PI) [⬀] Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Collaborative Research: a Multi-Level Approach to Power-Efficient Opto-Electronic Interconnection Networks
Abstract: Interconnection network fabrics are becoming an ever-more critical communication backbone of many digital systems, providing the means for systems to scale up in capacity. As these systems become power-constrained, networks stand to be the weakest link, unless there is a shift from prior performance-driven approaches to one that focuses on the power efficiency of networks.
This project takes a multi-level approach to power-efficient interconnection networks that synergistically bridges research in circuits, architecture and software, providing a complete solution that will make power-aware network fabrics a reality. Research into novel low-power circuit techniques develops basic network building blocks for new power-efficient network architecture designs. New link and switch mechanisms uncovered at the circuit-level are leveraged for run-time architectural tradeoffs in network power and performance. Higher up in the hierarchy, the software level closest to users analyzes and factors in user power-performance requirements.
This research targets a diverse range of systems -- (1) both general-purpose and embedded systems; (2) both electrical and optical networking technologies, seeking to determine the optimal choice between optical versus electrical interconnection at each level in the network hierarchy from a power perspective; (3) from tiny on-chip networks to data center-wide chassis-to-chassis networks.
The research is integrated into the education curriculum, through existing and new graduate courses, and in undergraduate research programs. The project seeks to further broaden the participation of underrepresented groups in research activities. As a pioneering effort in power-efficient interconnection networks, the project seeks to facilitate future research and education in this area through the release of simulation tools and other results.
|
0.915 |
2003 — 2004 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ner: Nanocad - Computer-Aided Design Algorithms and Tools For Nanotechnologies
Proposal No: 303789 Title: NER: NanoCAD - Computer-Aided Design Algorithms and Tools for Nanotechnologies
There has been a huge surge of research activity in recent years in the area of nanoelectronic device design and fabrication. Such ``nanodevices'' include solid-state quantum-effect devices and molecular devices. In terms of work at the logic and architecture levels, quantum cellular automata (QCA) based on quantum dot cells, and resonant tunelling diode (RTD) based implementations seem ahead of others, and chips based on these devices are likely to be feasible within this decade. While working devices and logic gates, and even some small circuits, have been demonstrated for various nanotechnologies, no logic synthesis methodology has evolved yet for automatically synthesizing large logic circuits in these technologies. The aim of this proposal is to develop automatic logic synthesis algorithms and implement them as software tools to bridge this gap. In this exploratory phase of research, we will target QCA and RTD based logic circuits.
Specifically, we will develop a general multi-level logic synthesis algorithm and software tool using threshold gates and majority gates as primitive gates. We will derive bit-level pipelined versions of different arithmetic units to exploit the concept of nanopipelining. We will also derive physical synthesis rules for QCA to ensure its correct functionality.
The software tools developed in this research will be made available on the world-wide web for wide dissemination. The material will be included in a new graduate-level course on Computer-Aided Design for Emerging Nanotechnologies. Undergraduates will also be involved in this research. Princeton encourages applications from women and minority students through special fellowships. Such students will be specially welcome to join this research.
|
0.915 |
2004 — 2007 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Modeling, Simulation, Circuit Design, Logic Synthesis, Testing and Defect Tolerance of Resonant Tunneling Device Based Nanotechnology
Proposal No: 429745 Prinipal Investigator: Niraj Jha (Princeton U) and Pinaki Mazumder (U Michigan) Title: COLLABORATIVE RESEARCH: Modeling, Simulation, Circuit Design, Logic Synthesis, Testing and Defect Tolerance of Resonant Tunneling Device based Nanotechnology
Abstract: Nanotechnologies, such as resonant-tunneling devices (RTDs), quantum cellular automata, single electron transistors, atom relays, refined molecular relays, carbon nanotube transistors, etc., have seen significant advances in the last few years. They offer orders of magnitude improvements in chip density, performance and power consumption, making many as-yet undreamed-of applications feasible. However, comprehensive circuit analysis/synthesis methodologies and tools have not yet been developed for any nanotechnology. Such tools are urgently needed in order to realize their potential. In this proposal, the investigators will target RTD-based nanotechnology, which has already shown industrial promise. Specifically, we will develop an electronic circuit design optimization environment, and techniques for circuit design, logic synthesis, testing and fault tolerance for this nanotechnology.
|
0.915 |
2004 — 2007 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Energy-Aware Gui Design and Run-Time Application Adaptation, and Power Management/Scaling For Mobile Computers
Energy-aware GUI Design and Run-time Application Adaptation, and Power Management/Scaling for Mobile Computers
Abstract
Modern mobile computing systems are an integral part of the lives of millions of users. Although the latest microprocessors enable these devices to provide improved services, reducing energy consumption remains a major design challenge. Energy reduction techniques for interactive mobile systems are not well understood. The objective of this work is to develop methodologies and tools to reduce energy consumption of interactive systems from the display, application and operating system (OS) perspectives. The problems that we will tackle are as follows: Energy-efficient graphical user interface (GUI) design, Energy-aware application adaptation framework, and OS-supported dynamic power management (DPM) and dynamic voltage scaling (DVS).
The results of the research will be made available on the Web. It will also be disseminated to the industry through the companies affiliated with the NJCST Center for Embedded System-on-a-Chip Design that the PI heads. Currently, three companies are commercializing tools developed in the PI's group. This offers additional avenues for technology transfer.
|
0.915 |
2006 — 2010 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Sma: Thermal Modeling, Management, and Characterization of Chip Multiprocessors
Ever-increasing power densities and cooling costs have led to the consideration of the power and thermal impact of chip multiprocessor (CMP) designs during early design stages. The aim of this project is to perform thermal modeling and simulation of CMPs, develop dynamic thermal management (DTM) techniques and perform thermal characterization of such designs. Thermal models being developed can accurately capture the thermal characteristics of the processors, on-chip network and packaging. The DTM techniques target the processors and network synergistically. A large suite of applications are being thermally characterized on both fine-grain and coarse-grain CMPs. Broader impacts will include making the tools available on the web to help other researchers, inclusion of material in a graduate-level course, involvement of undergraduates in research and outreach to female and minority students.
|
0.915 |
2007 — 2011 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Ehs: Non-Volatile Carbon Nanotube Ram Based Fpga Architectures
Emerging nanodevices, such as carbon nanotubes, have been recognized as very promising for future technology scaling. A recent innovation has enabled the fabrication of nanotube-based non-volatile random-access memory (NRAM) that has the promise to become a universal memory. NRAMs are suitable for both standalone and embedded memory applications and are fabricated using traditional lithography-compatible manufacturing processes. An NRAM is considerably faster and denser than DRAM, has much lower power consumption than DRAM or flash, has similar speed to SRAM and is highly resistant to environmental forces (temperature, magnetism).
The aim of this project is to explore an NRAM-based class of programmable computer architectures, known as a field-programmable gate array (FPGA) family. Initial work in the PI's group on such FPGAs has shown the feasibility of increasing the logic density by over an order of magnitude compared to traditional FPGAs, using the novel concept of fine-grain temporal logic folding, which makes cycle-by-cycle dynamic reconfiguration feasible.
The objectives of this project include design space exploration of the NRAM-based FPGA space using a parameterized technology mapping and temporal logic folding tool that takes a mixed register-transfer/gate level description of a circuit and maps it to an NRAM-based FPGA family instance. The research will also explore the applicability of such FPGAs to computation-unit integrated memories for memory-intensive multimedia applications.
The proposed work will make a well-characterized and highly versatile family of NRAM-based FPGAs and associated mapper/folder available to industry. The material will be included in a new senior-level course on Design with Nanotechnologies. Undergraduate students are expected to participate in this research. Female and minority students will be attracted to this research through Princeton's Presidential Fellowship Program.
|
0.915 |
2007 — 2011 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Ehs: Hardware/Software Architectures For Secure Embedded Systems
Many embedded systems handle sensitive data or perform critical functions, making security an important consideration. While some of these threats are common in desktop systems, the general-purpose nature of a desktop and the commodity nature of the components comprising it prevent the deployment of meaningful architectural countermeasures. Such restrictions are less stringent in embedded systems, permitting the investigation of new security approaches, covering all aspects of system architecture design. However, such systems are severely resource-constrained in processing and battery capacities. Thus, purely software security solutions can overwhelm these capacities. Providing a secure implementation requires security measures that span various components in the system-on-chip (SoC), including hardware and software.
The aim of this research is to develop design methodologies to obtain efficient hardware/software implementations that can facilitate secure program execution or implement a given security policy in embedded systems. The first objective includes developing hardware/software design methodologies to support trusted platform module (TPM) functionality in resource-constrained embedded systems. TPM acts as a root of trust for the system that contains it, providing capabilities for secure storage, secure reporting of platform configuration measurements, and cryptographic key generation, among other functions. It is reported that by 2010, shipments of TPMs will reach 250 million, giving impetus to this research. The second objective includes design of a security-aware SoC communication architecture that can enforce a system-level security policy. The third objective includes developing techniques to facilitate the deployment of type-safe software
|
0.915 |
2009 — 2013 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr:Small:Preventing the Exploitation of Software Vulnerabilities and Execution of Malicious Software On Embedded Systems
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
Software security attack prevention, which addresses threats posed by software vulnerabilities and malicious software, is important for modern computing, especially for embedded systems. Despite widespread research efforts, the increasing complexity of software and sophistication and ingenuity of software attacks have led to a constant need for innovation. Some of the shortcomings of conventional techniques are insufficient detection accuracy (false positives/negatives) and high performance penalties.
In this project, a new methodology will be investigated for detecting and preventing malicious code execution and software vulnerability exploits, with the potential to significantly improve the accuracy and efficiency beyond current techniques. It will leverage recent advances in related areas, such as virtualization and dynamic binary instrumentation, which enable efficient creation of isolated execution environments and dynamic monitoring and analysis of program execution. The key aspects of the project are safe post-execution analysis to detect violation of specific security policies, derivation of a hybrid model that represents a dynamic control of the program/data flow in terms of regular expressions and data invariants, run-time prevention of malicious behavior, and several software/hardware enhancements for efficiently deploying the defense framework on embedded systems.
The methodologies will be disseminated through research articles, and software tools developed will be placed on the world-wide web. Undergraduates will be encouraged to carry out independent research projects on this topic. Princeton encourages applications from female and minority students through special fellowships, which will be leveraged. Several other outreach activities are also planned for promoting education among underrepresented high school students.
|
0.915 |
2012 — 2016 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf:Small: Fine-Grain Dynamically Reconfigurable Fpga Architecture Aimed At Reducing the Asic-Fpga Gaps
Field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs) are two very important processing elements for computation. FPGAs are very attractive because of their lower design cost and shorter time-to-market compared to ASICs. Still, the marketshare of FPGAs remains less than a fifth of that of ASICs because ASICs enjoy an advantage over FPGAs in terms of circuit area, power consumption, and delay. The objective of the proposed work is to significantly reduce these area/power/delay gaps through a new dynamically reconfigurable FPGA design and thus enable FPGAs to become much more competitive with ASICs. Continued scaling of bulk CMOS technology faces significant hurdles. To alleviate these problems, Intel and TSMC have already announced a switch to multi-gate field-effect transistors, e.g., Trigate and FinFETs, at the upcoming semiconductor technology nodes. Another important trend is towards 3D integrated circuits (ICs), in which multiple die layers are stacked on top of each other. 3D ICs promise a revolution in so called ``More than Moore" computing. The proposed work aims to take advantage of the multi-gate and 3D IC technologies to further reduce the gaps mentioned above.
The proposed FPGA architecture significantly deviates from the conventional island-style FPGA architecture by enabling the logic element to either perform computation or local communication or both. It is aided by the key concept of temporal logic folding that allows a circuit to be drastically folded, aided by on-chip reconfiguration memory, before being mapped to the FPGA. It attacks the main reason for the area/power/delay gaps -- the vast amount of chip resources allocated to reconfigurable interconnects in FPGAs. Logic folding makes the communication local, thus making it possible to reduce the amount of resources devoted to interconnects very significantly. The work entails design space exploration of the different components of the architecture, investigation of novel multi-gate computation/communication structures, and algorithms and design automation tools to map arbitrary circuits to the FPGA architecture. It is expected to yield a well-characterized and highly versatile family of 3D multi-gate transistor based FPGAs that are competitive with ASICs. Work on various design methodologies and tools developed in this research will be disseminated through conference and journal articles. Technology transfer will be done through companies interested in using such FPGAs as accelerators. The material will be included in a senior-level course on Design with Nanotechnologies and a graduate-level course on Low Power IC and System Design introduced by the PI. Female and minority students will be attracted to this research through Princeton's Presidential Fellowship Program.
|
0.915 |
2012 — 2016 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Twc: Small: Collaborative Research: Enhancing the Safety and Trustworthiness of Medical Devices
Personal healthcare systems based on wearable/implantable medical devices are being increasingly deployed for a variety of diagnostic, monitoring, and therapeutic applications. A consequence of the increased functional complexity, software programmability, and wireless network connectivity of such devices is that they are now vulnerable to security attacks that have plagued general-purpose computing systems. Recent demonstration of security attacks on commercially deployed systems has raised medical device security concerns significantly. Unfortunately, medical devices come with extreme size/power constraints and unique usage models, making it infeasible to simply borrow conventional security solutions.
This research focuses on developing a non-intrusive medical security monitor that snoops on all wireless communication to/from medical devices and uses multi-layered anomaly detection to identify potentially malicious transactions. While formal methods have been previously used to check for implementation flaws, they are not geared towards verifying the safety behavior of the medical device software in its interactions with the real world, which can expose logical flaws as well. The work investigates these interactions by transforming properties specified at the real-world interfaces (sensors and actuators) into program properties against which the medical device software can be verified. The findings will be disseminated through conferences and journals. The hardware and software developed will be placed in the public domain, and disseminated to the industry. The knowledge developed will be integrated into various courses. Undergraduates will be encouraged to perform independent research on this topic. Fellowships and outreach programs will be leveraged to encourage participation of female and minority students.
|
0.915 |
2012 — 2016 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Efficient and Accurate Methodologies For Unifying the Layout, Device Simulation, and Process Simulation Worlds
Due to the severely degraded short-channel behavior of MOSFETs, Intel and TSMC have announced their switch to multi-gate FETs at the upcoming technology nodes. Hardware experiments with multi-gate devices and larger circuits entail very high cost and turnaround time. Thus, efficient predictive 3D-Technology CAD (3D-TCAD) based process/device characterization methods for such devices/circuits are urgently needed. A lack of such methods currently poses a significant impediment to rapid progress in this area. Though 3D-TCAD based exploration is essential for accurate predictive modeling, it is beset with major challenges which makes it necessary to develop a seamless set of methodologies/algorithms integrated with 3D-TCAD eco-systems for resolving process, layout, and device level issues quickly. The main aim of the proposed work is to develop efficient and accurate methodologies for unifying the layout, 2D/3D device simulation, and process simulation worlds, thereby, for the first time, expanding the horizon of predictive modeling for multi-gate devices beyond the many-device TCAD barrier, which is a major showstopper at lower technology nodes. The project aims to develop a set of versatile methodologies for synthesizing contiguous 2D/3D device-simulation-ready structures corresponding to given layouts, without the need for repetitive and expensive 3D process simulations on each layout. These methodologies are expected to yield several orders of magnitude speedup in TCAD structure generation for large layouts, with run-time reduction from days/weeks to a few hours per design and decreased memory footprints. The project will also develop fast cache-extrapolate-update techniques to alleviate the problem of obtaining convergence with iterative linear solvers for both mixed-mode and contiguous 3D device simulation.
The methodologies developed in this research will break the many-device TCAD barrier and, by unifying layout with process/device simulation, make accurate and efficient predictive 3D-TCAD possible. The methodologies/tools that are developed will be disseminated through the web, conferences and journals. The material will be included in a course on Design with Nanotechnologies that the PI teaches at Princeton University. Princeton has a tradition of undergraduate independent research. Many senior students are expected to do their research project on this topic. Female and minority students will be attracted to this research through Princeton's Fellowship Program. Further outreach activities are also planned for high-school students.
|
0.915 |
2013 — 2017 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Parasitics-Aware Exploration of the Finfet Sram Design Space
Planar complementary metal-oxide-semiconductor (CMOS) technology scaling is coming to an end with the adoption of FinFETs (a type of field-effect transistor or FET) at the 22nm technology node and beyond. As a result of having multiple gates wrapped around the fin body, FinFETs exhibit better control of the channel potential with scaling, thereby alleviating the explosive leakage current problem faced by planar short-channel devices. Since static random access memory (SRAM) bit-cells are often the densest features patterned on an integrated circuit, researchers have begun investigating the design and manufacturability of FinFET SRAMs. Most of these investigations focus on enhancing/contrasting SRAM direct current (DC) metric targets. However, such metrics are not an accurate guide to FinFET bit-cell designers, as parasitic capacitances for two topologically equivalent bit-cells can be very different (due to differing fin pitches, etc.), resulting in widely varying transient characteristics. Thus, in order to predict array-scale metrics through simulation, capturing transient behavior accurately is absolutely essential. To accomplish the latter, SRAM parasitic capacitances need to be extracted accurately from the layout. Using an accurate parasitic capacitance extraction method that the principal investigator's (PI's) group has developed, the project plans to explore the FinFET SRAM design space from both DC metrics and transient behavior points of view, under process-voltage-temperature variations.
Since SRAMs account for more than 50% of the area of modern microprocessors, it is very important to base them on the best SRAM bit-cell design. A successful conclusion of this work, hence, should be very beneficial to the semiconductor industry. The designs/methodologies/tools that are to be developed will be disseminated through the web. Technology transfer will be done through various companies the PI interacts with. The material will be included in a course on Design with Nanotechnologies that the PI teaches. Princeton has a tradition of undergraduate independent research. Many seniors are expected to do their research project on this topic. Female and minority students will be attracted to this research through Princeton's Presidential Fellowship Program. The PI has supervised 10 female Ph.D. students so far. Further outreach activities are also planned for high-school students. The PI has supervised the research of four high-school students in the last two years, including a female student.
|
0.915 |
2016 — 2019 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Twc: Small: Physiological Information Leakage: a New Front On Health Information Security
With the growing use of implantable and wearable medical devices, information security for such devices has become a major concern. Prior work in this area mostly focuses on attacks on the wireless communication channel among these devices and health data stored in online databases. The proposed work is a departure from this line of research and is motivated by acoustic and electromagnetic physiological information leakage from the medical devices. This type of information leakage can also directly occur from the human body, thus raising privacy concerns. This proposal will investigate physiological information from both the medical devices and the body itself and find countermeasures against this form of information leakage. This research will be accompanied with rapid dissemination of results, placement of software tools on the web for other researchers to take advantage of, and inclusion of the newly discovered material in a course that the PI teaches. In addition to graduate students, undergraduates and high school students will also take part in this research, with emphasis on participation by students from underrepresented groups.
This proposal investigates privacy attacks on human health by targeting physiological signals that continuously emanate from the human body due to the normal functioning of its organs., as well as the signals that emanate from implantable and wearable medical devices. It will also investigate metadata (frequency of communication, time between consecutive transmissions, communication protocol, packet size, detection range, modulation protocol, etc.) collected from such devices to see how they leak vital health information. Finally, it will develop countermeasures against health information leakage from these devices. A comprehensive evaluation of data leakage from the human heart, lungs, skin, devices like insulin pumps and blood pressure monitors would follow.
|
0.915 |
2016 — 2019 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Energy-Efficient Embedded Signal-Processing Inference Systems
Machine-learning algorithms enable pattern recognition from data that are too complex to model analytically. This pattern recognition is of fundamental importance in diverse domains. These algorithms are becoming an essential part of embedded systems that find use in infrastructure, environmental monitoring, personal health monitoring, energy management, food supply chain, assembly lines, etc. This research has the potential to enable significant advances in such systems by enabling highly energy-efficient on-sensor inference to be performed. With its plans for involving students from underrepresented groups, industrial engagement, outreach to the broader public, and online distribution of tools, it is expected to have a broad impact. The aim of the proposed work is to explore the energy savings achievable by embedded signal-processing inference systems through random projections. Random projections have previously been employed in the context of compressive sensing to reduce system energy. We have found that when random projections are used to compress Nyquist signals, the compression mechanism is far more robust, while offering the possibility of two orders of magnitude system energy savings. We term this mechanism compressed signal processing. We propose work on bringing this concept to fruition through new methodologies and signal-processing architectures. In addition, we propose the use of genetic programming and error-aware inference to tackle the nonlinear signal-processing problem. We plan extensive evaluations of the system-level energy-accuracy tradeoffs the proposed mechanisms offer.
|
0.915 |
2017 — 2020 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Small: Exploration of the Transistor-Level Monolithic 3d Sram Design Space
Static random-access memories (SRAMs) constitute an important part of modern microprocessors, occupying more than half of its area. They are part of the memory hierarchy that enables applications to be sped up on these processors. Since traditional 2D scaling of integrated circuits (ICs) is running out of steam, an alternative in the coming decade would be to go vertical, i.e., have several layers of logic and memory in the same IC package. There are two ways to go 3D: using through-silicon vias (TSVs) or monolithic 3D integration. Since monolithic 3D integration enjoys many advantages over TSV based 3D integration, this work is aimed at the former. In particular, its aim is to explore the design space of transistor-level monolithic (TLM) 3D SRAMs implemented in the modern semiconductor technology of FinFETs. A successful conclusion of this work, hence, should be very beneficial to the semiconductor industry. The designs/methodologies/tools that are developed will be made available on the web. They will also be disseminated to the industry through various companies the PI interacts with. The material will be included in a course that the PI teaches. Many seniors are expected to do their thesis on this topic. Female and minority PhD students will be attracted to this research through Princeton Fellowships available for this purpose. Further outreach activities are also planned for high-school students. Results will be disseminated through research articles and seminars.
In the TLM design style, the n-type and p-type transistors can be placed on different layers, thus reducing the footprint area of an SRAM bitcell significantly. This also enables separate optimizations of the two layers, which has the added advantage of improving stability of SRAM cells. Stability is a very important metric for SRAMs since they push the semiconductor technology to its limits in order to accommodate as much memory into the microprocessor as possible. However, there is hardly any work on the TLM FinFET SRAM bitcell design space exploration. The proposed work fills this gap through accurate capacitance extraction and device simulation, under process-voltage-temperature variations on the IC.
|
0.915 |
2019 — 2022 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cns Core: Small: Ultra-Efficient Neural Network and Lstm Architectures
Neural networks (NNs) have begun to have widespread impact on various important applications, such as image recognition, speech recognition, and machine translation. The spurt of interest in machine learning and artificial intelligence in this decade can be traced back to the increase in accuracy that NNs have enabled. Yet, how to come up with the best NN architecture has remained an open problem. Hence, it is attracting a lot of attention from the academia and industry. This work will address this problem.
NN synthesis has largely been limited to big-data applications and the NN models are typically expected to run in the cloud. However, there is recent interest from the industry to have edge-level (e.g., in smartphone or smartwatch) NN models. The current edge-level NNs sacrifice accuracy (by 4-5%) for energy and latency efficiency. NNs are also often not competitive with other models for medium-data and small-data applications. Finally, sequence-to-sequence modeling (e.g., for language translation) also needs to be made much more accurate, fast, and compact enough for edge devices. All these problems will be tackled in this work through new NN synthesis techniques and tools.
This research has the potential to enable transformative advances in overcoming the deficiencies of current NN synthesis methodologies. Due to the explosion in machine learning applications, this research has the promise to provide a significant boost to U.S. companies and economy. Thus, it will involve significant industrial engagements. Several underrepresented (minority/female) will be involved in the research. The research outcomes will be included in two undergraduate courses on Machine Learning and Embedded Computing. Broad dissemination to the academic and industrial communities will be achieved through published papers, posters, and seminars. Additionally, various tools and models will be distributed online.
The list of publications/students and tools/data with appropriate documentation will be made available at https://www.princeton.edu/~jha/. Free use of data and artifacts will be permitted for research and educational purposes. The data will be available online for at least five years following the completion of the project.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2022 — 2025 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cns Core: Small: Cnn-Accelerator Co-Design
Recently, automated co-design of machine learning (ML) models and accelerator architectures has attracted significant attention from both the industry and academia. However, most co-design frameworks explore a limited search space. Furthermore, training the ML model and simulating the accelerator performance are computationally expensive. To address these limitations, this project proposes a convolutional neural network (CNN) and hardware-accelerator co-design framework. It will consist of two new benchmarking sub-frameworks, CNNBench and AccelBench, that will explore vastly expanded design spaces of CNNs and CNN accelerators. CNNBench will use a new search technique to converge to the optimal CNN architecture. AccelBench will perform cycle-accurate simulations for a diverse set of accelerator architectures in a vast design space. Values for a large number of hyperparameters need to be chosen while designing an accelerator for a given application. They include the number and size of processing elements and on-chip buffers, dataflow, main memory size and type, and those for many domain-specific modules like sparsity-aware computation and reduced-precision designs. Similarly, many hyperparameters come into play while designing a CNN as well: the number of layers, convolution type and size, normalization type, pooling type and size, structure of the final multi-layer perceptron head, activation function, training recipe, and many more. At the same time, a CNN architecture that has the highest performance may not meet resource constraints like energy consumption and memory footprint. The proposed co-design framework will efficiently navigate the joint CNN-accelerator design space for the target application under given user constraints. THe project expects at least an order of magnitude efficiency improvement in navigating this space relative to current baselines while delivering higher performance. <br/><br/>Given the importance of the CNN-accelerator co-design problem to the industry, the proposed co-design framework will likely have a transformative impact on it. To effectively transfer research results, the project plans to continue our active industrial engagements. The project will recruit graduate/undergraduate female/minority students (the PI has supervised 15 female Ph.D. students so far as well as many female/minority undergraduate and high-school students). The project will ensure that the participants, including the broader public engaged through outreach, experience the interdisciplinary nature of the research. The research outcomes will be included in two undergraduate courses on Machine Learning and Embedded Computing. Outreach to high-school students is planned through Princeton Laboratory Learning Program. Broad dissemination to the academic and industrial communities will be achieved through published papers, posters, and seminars. In addition, various tools and models will be distributed online.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |
2022 — 2025 |
Jha, Niraj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ccf: Shf: Small: Transformer Synthesis
Just within four years of being first proposed, transformers have had a dramatic impact on the natural language processing (NLP) field and are also beginning to have an impact on other fields, such as computer vision. This success has largely been driven by large-scale pre-training datasets, increasing computational power, and robust training techniques. However, a major challenge that remains is efficient optimal transformer model synthesis for a specific task and set of user requirements. However, this is not easy to do since the design space of transformer models is vast. This project addresses this challenge through the development of transformer-synthesis methodologies and tools. Given the importance of transformers, such tools are likely to have a transformative impact on many application areas. The research will be disseminated to industry via tech transfer e.g., via open-source online distribution of source codes, summer internships, and by leveraging PIs involvement with local companies. Outreach and curriculum development plans will also be undertaken within the context of the proposed research.
There is currently no universal framework that can navigate the vast transformer hyperparameter design space. Previously proposed transformer models are homogeneous in terms of data flow through the network. Unfortunately, this leads to very suboptimal transformer architectures. This project expands the transformer design space to incorporate heterogeneous architectures that venture beyond self-attention by employing other operations like convolutions and linear transforms. It will also explore novel projection layers and positional encodings to make hidden sizes flexible across various transformer layers. It will use a dense embedding to capture model similarity to significantly enhance search efficiency. It will develop a heteroscedastic surrogate model to further speed up search. It will include operations that optimize long-range interactions for long input sequences. It will also explore skipped connections and block-level grow-and-prune synthesis to improve architectural search efficiency.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |