Reconfigurable Computing

From Embedded Systems to Reconfigurable Hyperscale Servers

A workshop at FPL 2016, providing a platform for open discussion of ongoing research in industry and academia.

2nd September 2016

Overview

Reconfigurable computing platforms, which offer massive parallelism coupled with the capability of run-time adaptation to changing application requirements are becoming core components of the information processing in embedded systems with high computational demand but limited energy budget. In parallel to their utilization for IoT devices and in cyber physical systems, reconfigurable systems are used together with GPGPUs in data centres for high performance and cloud computing. The synergistic use of multiprocessing techniques and reconfigurable parallelism has shown orders of magnitude improvements in performance, power efficiency, and cost for a wide range of applications. Partial reconfiguration – a research topic for two decades – is becoming mainstream for embedded systems and is seen as an important requirement for efficient utilization in HPC and cloud computing. However, developing systems and applications that employ such architectures still poses many challenges, which are currently tackled in several research projects.

In this workshop, we want to bring together researchers from a wide variety of international projects to share their achievements and innovations in the area of reconfigurable computing, ranging from embedded systems to reconfigurable hyperscale servers. The workshop will provide a platform for open discussion of ongoing research with interested attendees from industry and academia.

Programme

The programme comprises two sessions of invited presentations:

Keynote Talks
14:15
Welcome
Zain Ul-Abdin(Halmstad University, Sweden)
14:20
Heterogeneous Multi-Core Platform for Software-Defined Radio
Jari Nurmi (Tampere University of Technology, Finland)
14:55
An Open Research Platform Exploiting Reconfigurable Technology Towards Exascale High-Performance Computing
Dirk Stroobandt (Ghent University, Belgium)
15:30
Coffee Break
EU Project Talks
16:00
Enabling a 2 Watt Mouse Brain Based on BCPNN Model of Cortex as a Cognition Engine for Autonomous Embedded Systems
Ahmed Hemani (KTH, Sweden)
16:20
AXIOM: Enabling Parallel Processing in Cyber-Physical Systems
Dionisios Pnevmatikatos (FORTH-ICS, Greece)
16:40
Opportunities for deferring application partitioning and accelerator synthesis to runtime
Tobias Kenter (Paderborn University, Germany)
17:00
FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers
René Griessl (Bielefeld University, Germany)
17:20
Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability
Zain Ul-Abdin (Halmstad University, Sweden)

Keynote Talks

Heterogeneous Multi-Core Platform for Software-Defined Radio
Jari Nurmi, Tampere University of Technology, Finland
Extended abstract  Slides

In the race for embedded computing power, homogeneous and heterogeneous approaches have been widely explored. Homogeneous processor arrays are easier to program, but they generally cannot achieve high performance and low power consumption at the same time. The heterogeneous solutions exploit specialized processors such as DSP, GPU, or Application-Specific Instruction-set Processors (ASIP) in addition to the general purpose processing, and/or accelerate processing with fixed or reconfigurable hardware blocks. One of the challenges in heterogeneous systems is diversity of programming interfaces and computing models.
Coarse-Grained Reconfigurable Arrays (CGRA) offer one alternative for accelerating embedded computing. They share the flexibility with FPGAs, but are a lot faster to reconfigure and are operating on word-level computations and are thus better compatible with general-purpose programmable processors. On the other hand, they are very powerful as they utilize the hardware parallelism more efficiently than processors do.
Recently, a phenomenon known as Dark Silicon has been discovered, referring to the inability to operate the entire chip at its maximum clock frequency or even keep it clocked at all, due to extensive power density in modern silicon technologies. It turns out that CGRA is inherently solving this issue partially as it spreads the active computation sufficiently in space and time compared to centralized processing in a high clock-rate CPU. The prototype of a heterogeneous reconfigurable accelerator-rich multicore platform (HARP) was designed to study how to tackle the Dark Silicon issues with heterogeneity. HARP is made up of one or a few general-purpose CPUs that are used to control a set of application-optimized CGRA blocks. The processors can also run less intensive parts of the application software, whereas the most intensive kernels are implemented on reconfigurable hardware blocks.
The main application area is Software-Defined Radio (SDR) which refers to flexible functionality based on programmable and/or reconfigurable implementation technologies. HARP has turned out to be well-suited for SDR implementation. This talk will present the baseline CGRA architectures, the heterogeneous HARP platform, and will show examples on SDR style wireless communication implementations on the HARP blocks. Preliminary performance and power efficiency results based on an FPGA prototype will also be discussed.

Jari's Photo

Jari Nurmi works as a Professor at Tampere University of Technology, Finland since 1999, in the Faculty of Computing and Electrical Engineering. He is working on embedded computing systems, wireless localization, positioning receiver prototyping, and software-defined radio. He held various research, education and management positions at TUT since 1987 (e.g. Acting Associate Professor 1991-1994) and was the Vice President of the SME VLSI Solution Oy 1995-1998. Since 2013 he is also a partner and co-founder of Ekin Labs Oy, a research spin-off company commercializing technology for human presence detection, now headquartered in Silicon Valley as Radiomaze, Inc. He has supervised 19 PhD and over 130 MSc theses at TUT, and been the opponent or reviewer of 29 PhD theses for other universities worldwide. He is a senior member of IEEE, member of the technical committee on VLSI Systems and Applications at IEEE CAS, and board member of Tampere Convention Bureau. In 2004, he was one of the recipients of Nokia Educational Award, and the recipient of Tampere Congress Award 2005. He was awarded one of the Academy of Finland Research Fellow grants for 2007-2008. In 2011 he received IIDA Innovation Award, and in 2013 the Scientific Congress Award and HiPEAC Technology Transfer Award. He is a steering committee member of four international conferences (chairman in two), and participates actively in organizing conferences, tutorials, workshops, and special sessions, and in editing special issues in international journals. He has edited 3 Springer books, and has published over 300 international conference and journal articles and book chapters.

An Open Research Platform Exploiting Reconfigurable Technology Towards Exascale High-Performance Computing
Dirk Stroobandt, Ghent University, Belgium
Slides

To handle the stringent performance requirements of future exascale-class applications, High Performance Computing (HPC) systems need ultra-efficient heterogeneous compute nodes. To reduce power and increase performance, such compute nodes will require hardware accelerators with a high degree of specialization. Ideally, dynamic reconfiguration will be an intrinsic feature, so that specific HPC application features can be optimally accelerated, even if they regularly change over time. In our project, we create a new and flexible exploration platform for developing reconfigurable architectures, design tools and HPC applications with run-time reconfiguration built-in as a core fundamental feature instead of an add-on.
Our project covers the entire stack from architecture up to the application, focusing on the fundamental building blocks for run-time reconfigurable exascale HPC systems: new chip architectures with very low reconfiguration overhead, new tools that truly take reconfiguration as a central design concept, and applications that are tuned to maximally benefit from the proposed run-time reconfiguration techniques. Ultimately, this open platform will improve Europe's competitive advantage and leadership in the field.

Dirk's Photo

Dirk Stroobandt graduated in 1994 as electrotechnical engineer at Ghent University. In May 1998, he obtained the Ph.D. degree in electrotechnical engineering from the same university. From October 1994 to September 1998, Dirk Stroobandt was research assistant and from October 1998 to September 2002 he was post-doctoral fellow with the Fund for Scientific Research - Flanders (Belgium) (F.W.O.). Since October 2002, he is Professor at Ghent University (promotion to Senior Lecturer in 2006 and to Full Professor in 2014), affiliated with the Department of Electronics and Information Systems (ELIS), Computer Systems Lab (CSL). He currently leads the research group HES (Hardware and Embedded Systems) of about 10 people with interests in semi-automatic hardware design methodologies and tools, run-time FPGA reconfiguration, and reconfigurable multiprocessor networks.
Dirk Stroobandt is a member of IEEE, ACM, AIG (Alumni Ghent), and KVIV (Royal Flamish Engineering Association). He is the inaugural winner of the ACM/SIGDA Outstanding Doctoral Thesis Award in Design Automation, June 1999. He also received the `Scientific prize Alcatel Bell' in 2002. Dirk Stroobandt initiated and co-organized the International Workshop on System-Level Interconnect Prediction (SLIP) in 1999 and was the General Chair of SLIP 2000. He is still actively involved in this workshop. He was also Special Session Chair, General Chair and Program Chair of IWLS (ACM/IEEE International Workshop on Logic & Synthesis) in 2013, 2014 and 2015 respectively. He is guest editor of two special issues of the IEEE Transactions on VLSI Systems on System-Level Interconnect Prediction and a special issue on SLIP for Integration, the VLSI Journal. He is also lead editor of a special issue of the International Journal of Reconfigurable Computing and he has been associate editor of ACM's TODAES for three years. He is currently associate editor for ACM TRETS (Transactions on Reconfigurable Technology and Systems). Dirk Stroobandt is involved in the organisation of several conferences in the field and is reviewer for numerous conferences and journals.

EU Project Talks

Enabling a 2 Watt Mouse Brain Based on BCPNN Model of Cortex as a Cognition Engine for Autonomous Embedded Systems
Ahmed Hemani, KTH, Sweden
Extended abstract  Slides

As the complexity of interaction between machines and its environment increases, the complexity of the Von Neumann based machines is increasing exponentially. In contrast, brain like computing machines show a more promising scaling trend. For this reason, using brain inspired computers as cognitive engine for next generation embedded systems that are expected to interact with unpredictable dynamic environment has great appeal. The challenge is to implement such cognitive engines at sufficiently low power performance point to enable them to be deployed as in field. Doing custom hardware design and optimizing the complete architecture and not just computation is the theme we have followed in achieving this goal. The VLSI design community has stayed away from custom design by concerns of high engineering and manufacturing cost. We have developed a structured VLSI design methodology called SiLago - Structured Silicon Large Grain Objects - that addresses these concerns and applied them to design a mice sized model of cortex based on the Bayesian Confidence Propagation Neural Network (BCPNN) using these principles. We show that we can implement a mice sized BCPNN cortex that consumes 2 watts. The modularity and the embrassingly parallel nature of the BCPNN along with the SiLago methodology and 3D integration of custom DRAM design is exploited to show that a mice sized BCPNN cortex that consumes approximately 2 watts of power can be achieved in today's technology to serve as the brain like cognitive engine of next generation embedded systems.

AXIOM: Enabling Parallel Processing in Cyber-Physical Systems
Dionisios Pnevmatikatos, FORTH-ICS, Greece
Extended abstract  Slides

The AXIOM project focuses on developing an affordable CPS node that features general purpose capability coupled with reconfigurable resources. The nodes will be interconnected and a programming layer will turn them into a parallel processing system. The programming layer also makes easier the use of the reconfigurable resources for accelerators. Harnessing the combined CPS resources enables a new level of "edge" processing. We will focus on the interconnection and modularity aspects of the project, and present the current status and the challenges we are facing mainly in performance and efficiency

Opportunities for deferring application partitioning and accelerator synthesis to runtime
Tobias Kenter, Paderborn University, Germany
Extended abstract  Slides

Design flows for reconfigurable computing systems typically follow a static approach that performs computational kernel identification, hardware/software partitioning and accelerator synthesis at compile time. This approach is proven and well understood but it originates from the era of hardware/software codesign for ASICs and inherits implicit assumptions from this use case, in particular, that accelerator design and production is very time consuming (ASIC fabrication) and costly. Hence, static design flows spend a lot of effort on taking good decisions at design time based on code analysis or hints provided by the developer.
FPGAs not only have orders of magnitude faster design cycles but reconfiguration also allows for adding additional accelerators at virtually no additional cost. Further, FPGAs offer the opportunity to build computing systems that adapt themselves to the concrete needs of the application that actually occur at runtime instead of relying on conservative decisions that have been determined from estimations at design time.
In this talk, we will present our work on reconfigurable computing systems that use a combination of offline and online analysis and compilation techniques, that allow us to defer the hardware/software partitioning and accelerator generation process to runtime. We discuss the opportunities and challenges that result from these possibilities and outline directions for future research in runtime compilation and partitioning for reconfigurable computing systems.

FiPS and M2DC: Novel Architectures for Reconfigurable Hyperscale Servers
René Griessl, Bielefeld University, Germany
Extended abstract  Slides

Within the EU projects FiPS and M2DC we aim at significantly increasing the energy-efficiency of compute platforms for cloud and high performance computing. With the RECS Box System, a highly scalable heterogeneous hardware platform is developed, which seamlessly integrates CPUs, embedded CPUs, FPGAs, GPUs and many-core processors. To ease programming of the platform, FiPS is setting up a programming methodology, simplifying the usage of the heterogeneous computing devices as processing elements in a holistic integrated hardware and software server eco-system.
On this basis, a new class of low-power TCO-optimised appliances with built-in efficiency and dependability enhancements is developed within M2DC. The heterogeneous server architecture will enable customisation and smooth adaptation to various types of applications, which will be easy to integrate utilizing a broad ecosystem of management software and system efficiency enhancements.

Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability
Zain Ul-Abdin, Halmstad University, Sweden
Extended abstract  Slides

This project envisions that the diversity of applications and contrasting performance constraints in next-generation embedded systems will necessitate the use of emerging technologies such as reconfigurable architectures and many-core processors. The project aims to develop, experiment and evaluate advanced embedded computing platforms for diverse application needs such as real-time response, low energy consumption and low cost constraints. Our focus is to address these needs through simplified programming and flexible architectures. This approach will provide the industry with tools and methodologies for meeting user-defined performance constraints with a quick time-to-market.

Organizers

Madhura's Photo'
Madhura Purnaprajna (Amrita University, India)

Zain's Photo'
Zain Ul-Abdin (Halmstad University, Sweden)

Mario's Photo'
Mario Porrmann (Bielefeld University, Germany)