Research Interests

 

My research interests include Embedded System, High-level Synthesis for VLSI, Secure Architecture and System, Hardware/Software Codesign, Computer Architecture, and Compiler. 

 

Research Focus

In the past several years,  my research has been focusing on developing models, methodologies, and algorithms for high-performance, low power and secure embedded systems, and many novel results have been produced which has been published in 5 journal papers, 30 conference papers and 10 journal submissions (see The Past Research Projects for details).  Currently, I am working on developing new code-generation techniques at both source and assembly code levels to optimize timing, power consumption and codes size for embedded systems.

 

Current Research Projects

 

To generate high-quality software with timing, power consumption, and code size optimization is very important for embedded systems due to strict timing constraints of embedded applications, limited memory resources, and low power consumption requirements for mobile devices. Loops are usually the most time-consuming and power-consuming parts of applications. Therefore, in my research, the focus is to develop effective loop optimization techniques to reduce schedule length, memory access, dynamic power, static power and energy with dynamic voltage scaling and code size.  Based on our previous work, we will develop new techniques for dynamic power optimization, static power optimization, memory access optimization and code size optimization techniques. My current research projects are listed as follows:

 

1. Dynamic Power Optimization

Among the three major sources of power dissipation: switching, direct-path short circuit current and leakage current, the dynamic power caused by switching is the dominant part. In our previous work, we have developed several efficient instruction-level scheduling techniques for dynamic power optimization by reducing scheduling length and switching activities on instruction buses for VLIW DSP processors. Due to large capacitance and high transition activities, buses including instruction buses, address buses, etc. consume a significant fraction of total power dissipation in a processor. For example, buses in DEC Alpha 21064 processor dissipate more than 15% of the total power consumption, and buses in Intel 80386 processor dissipate more than 30% of the total. Therefore, we can greatly reduce power consumption by reducing switching activities on buses.  In the project, we will develop new techniques for reducing switching activities on other components such as data address buses.

 

2. Static Power Optimization

The continuing size and threshold voltage reductions of transistors increase the importance to reduce static power consumption caused by leakage current for power-constrained embedded systems. Recent studies have begun to address compiler optimization to reduce static power consumption. However, existing work does not consider loop optimization or only optimizes the DAG (Directed Acyclic Graph) part of a loop. Our previous work in dynamic power optimization shows that exploiting inter-iteration dependencies of a loop by software pipelining provides great opportunities for power optimization. In this project, therefore, we will develop new loop scheduling techniques for static power optimization.

 

3. Memory Access Optimization

Memory system consumes a lot of energy. For example, caches dissipate about 25% of the total chip power in DEC 21164a, and memory system can contribute up to 90% energy in some embedded systems. Therefore, it is important to reduce the number of memory accesses by software to save energy. Some special features like dual memory banks are provided by DSP processors. Several techniques have been proposed to optimize timing performance by partitioning data into different banks and increasing parallel accesses. However, most of existing work focuses on single one-level loop optimization. For nested loops with loop transformations such as loop fusion, loop distribution, it is important to consider memory layout in order to reduce cache misses and increase data locality. Based on our previous work in nested loop transformations, in this project, we will develop new loop transformation techniques with memory layout optimization and instruction scheduling.

4. Code Size Optimization

Embedded systems have very limited memory resources. Therefore, it is important to consider code size when timing, power consumption and other metrics are optimized. Some optimization techniques such as function inlining, loop unrolling and software pipelining, can effectively optimize timing performance but incur code size expansion. Several studies have addressed these problems by improving functional inlining and software pipelining  considering code size constraints. Code compression is another effective code-size-reduction technique by storing embedded program in compressed form and decompressing at run time. Various compression techniques and architecture issues have been studied for RISC processors  and VLIW processors. Some embedded RISC processors such as ARM and MIPS offer the 16-bit compact instruction set (called Thumb for ARM and MIPS16e for MIPS) along with the 32-bit instruction set for code size optimization. The compact instruction set introduces some performance penalties. To finish the same task, more 16-bit instructions are needed compared to the corresponding number of 32-bit instructions. Less registers can be accessed in the 16-bit mode, and changing in and out of the 16-bit mode also takes time and needs extra instructions. In recent work, two novel techniques have been proposed to enhance the Thumb instruction set and reduce the performance overhead by providing new instructions and architecture modification.  

Based on our previous work, in this project,  we will continue our research and propose new theoretical models and algorithms to optimize timing and code size for compact instruction set (like ARM Thumb) and software pipelining.

 

 The Past Research Projects

 

My dissertation research has focused on developing models, methodologies, and algorithms for high-performance, low power and secure embedded systems. My Ph.D. dissertation titled " High Performance, Low Power and Secure Embedded Systems" won the best dissertation award of Erik Jonsson School of Engineering and Computer Science at the University of Texas at Dallas.  An embedded system is a device combining hardware and software to perform a dedicated function or functions. Embedded systems can be found everywhere in consumer electronics, process control systems, aircraft, vehicles, and so on. One of the most challenging problems in embedded system research is how to design a high-performance, reliable and secure system with limited resources such as memory, area, etc. and strict requirements in timing, power, etc. In my dissertation research, I have attacked this problem from various angles including architecture synthesis, low power optimization, secure architecture design, and compiler optimization.  

Architecture Synthesis for High Performance Embedded Systems

In this project, our focus is to optimize timing, power, and reliability of an embedded system through high-level architecture synthesis. In particular, I studied the problem of architecture synthesis for real-time DSP (Digital Signal Processing), one of the most popular embedded applications.  Basically, DSP applications that process signals by digital means need special high-speed functional units (FUs) like adders and multipliers to perform addition and multiplication operations. With more and more different types of FUs available, same type of operations can be processed by heterogeneous FUs with different costs, where the cost may relate to power, reliability, etc. For such special purpose architecture synthesis, therefore, an important problem is how to assign a proper function unit type to each operation of a DSP application and generate a schedule in such a way that all requirements can be met and the total cost can be minimized.

We proved this problem is NP-complete and proposed several algorithms to solve it. The result of this work has been accepted in one of the top and most recognized, scholarly archival journals, IEEE Transaction on Parallel and Distributed Systems.

Low Power Optimization for Embedded Systems

In this project, our focus is to optimize power consumption in embedded systems. Low power is becoming a critical design issue and performance metric in embedded system design due to wide use of portable devices, especially those powered by batteries. In particular, my research work is to reduce power consumption on VLIW (Very Long Instruction Word) DSP architectures.  VLIW architecture is widely adapted in high-end DSP processors. A VLIW processor has multiple FUs and can process several instructions simultaneously. While this multiple-FU architecture can be exploited to increase instruction-level parallelism and improve time performance, it causes more power consumption. In embedded systems, high performance DSP needs to be performed not only with high data throughput but also with low power consumption. Therefore, it becomes an important problem to reduce the power consumption of a DSP application with the optimization of time performance on VLIW processors.  

To solve this problem, several techniques have been proposed.  For applications with or without loops, we proposed corresponding scheduling techniques to reduce switching activities and schedule length together. The results of this work have been accepted in two high-quality journals, International Journal of High Performance Computing and Networking and International Journal of Computational Science and Engineering. We also proposed an instruction-level bus transition optimization technique to reduce power consumption on VLIW architectures. Buses consume a significant fraction of total power dissipation in a processor due to large capacitance and high transition activities. A VLIW processor usually has a big number of instruction bus wires so that it can fetch several instructions simultaneously. Therefore, the power consumption can be greatly reduced by reducing switching activities on the instruction bus. This work tremendously improves the state-of-the-art techniques and the result is accepted by ACM Transactions on Design Automation of Electronic Systems.

Secure Architecture for Embedded Systems

 In this project, our focus is to design secure embedded systems to defend against malicious attacks. With more embedded systems networked, it becomes an important problem to defend them against malicious attacks. Many special purpose embedded systems are used in military and critical commercial applications such as battleship, router, aircraft, nuclear plant, and so on.  A hostile penetration in such facilities could cause dramatic damage. In particular, I studied the problem to defend embedded systems against buffer overflow attacks. It is known that buffer overflow attacks have been causing serious security problems for decades.  More than 50% of today's widely exploited vulnerabilities are caused by buffer overflow, and the ratio is increasing over time. Almost all worms such as Internet worm, Code Red, Code Red II, Sapphire, SQL Slammer, MSBlaster, etc., use buffer overflow vulnerabilities to break into systems. Therefore, how to protect embedded systems against buffer overflow attacks becomes an important problem.

We proposed a hardware/software codesign method to solve this problem. The preliminary result of this work was published in 19th Annual Computer Security Applications Conference in 2003 which has attracted a lot of attention and been cited many times.  The final result has been accepted by IEEE Transaction on Computers.

Compiler Optimization for Embedded Systems

In this project, our focus is to develop compiler techniques to optimize timing, code size, power, memory, register, etc., by utilizing application-specific features in embedded systems. A lot of techniques have been developed that tremendously improve the state-of-art techniques in this area. Our paper, “Optimizing DSP Scheduling via Address Assignment with Array and Loop Transformation”, has been selected as the best student paper as the only finalist in the track of Design and Implementation of Signal Processing Systems in 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing.