## **PREFACE**

The ACACES summer school wants to create an opportunity for PhD students to learn new things and to meet new people. We believe that the 12 courses and the 2 invited talks – all by world class experts – will help you reach the first goal. Reaching the second goal is a bigger challenge. When you come to our summer school, we want to help you to get to know as many other participants as possible in one week. We do this by organizing joint meals and (long) coffee breaks where participants can meet each other and, very importantly, we organize a poster session on Wednesday afternoon. During this session you can present your own research to the other participants and at the same time learn about other students' research. The poster session is organized right in the middle of the summer school week so people with common research interests still have enough time in the following days to discuss their mutual interests, hopefully resulting in a long lasting research collaboration and joint research contributions. The summer school and poster session will help you broaden your professional network and that's exactly what HiPEAC is all about.

A total of 71 posters will be presented during the poster session on Wednesday afternoon. One afternoon is too short to check them all out in detail. Therefore, we have collected the poster abstracts in this book of abstracts in advance. Please note that the abstracts collected in this book were not reviewed because we did not want to exclude anybody from participating in the poster session and from networking with fellow students. This book is meant to help you prepare your visit to the poster session. By reading this book in advance, you can already select the posters that are most interesting for you and you know whom to find to discuss topics related to your own research interests (during the poster session, the posters will be arranged in the same order as in the booklet).

If you are planning to present a poster yourself, make sure to spend not only 50% of your time explaining your own poster but to spend 50% of your time as well on visiting other posters.

I wish you a very productive poster session.

Koen De Bosschere

Summer School Organizer

## **CONTENTS**

| Hardware Realization of an FPGA Processor - Operating System Call Offload and Experiences  Andreas Hindborg and Sven Karlsson                                                                | 1  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Run-Time Hardware Task Scheduling for Partially Reconfigurable FPGAs<br>George Charitopoulos, Kyprianos Papadimitriou and Dionisios Pnevmatikatos                                            | 5  |
| Memory-Centric Design for FPGA SpMV Accelerators<br>Yaman Umuroglu and Magnus Jahre                                                                                                          | S  |
| Towards High-Performance Reconfigurable Computing: Current challenges of a Rapid and<br>High-Level Design Flow<br>Christian Brugger and Norbert Wehn                                         | 13 |
| Large dataset encryption on the Maxeler platform: a service-oriented approach Nikola Bezanic, Jelena Popovic-Bozovic, Ivan Popovic, Goran Dimic and Veljko Milutinovic                       | 17 |
| 10GigE Virtualized NIC on ARM based FPGAs<br>Konstantinos Harteros, Iakovos Mavroidis, George Kalokerinos, Vassilis<br>Papaefstathiou, John Goodacre, Angelos Bilas and Manolis GH Katevenis | 21 |
| A Heterogeneous Architecture for Brain-Inspired Computer Vision<br>Francesco Conti and Luca Benini                                                                                           | 25 |
| CHAMELEON: A Ring-based Optical Network-on-Chip with Reconfigurable Channels<br>Hui Li, Sébastien Le Beux and Ian O'Connor                                                                   | 29 |
| An Adaptive Transmitting Power Technique for Energy Efficient mm-Wave Wireless NoCs<br>Andrea Mineo, Maurizio Palesi, Giuseppe Ascia and Vincenzo Catania                                    | 33 |
| Energy Efficiency Analysis in Embedded Systems using DFS Lubomir Bogdanov and Racho Ivanov                                                                                                   | 37 |
| Self-testing of embedded and cyber-physical systems  R. Seinauskas                                                                                                                           | 41 |
| Programming and Mapping Strategies for Embedded Computing Runtime Adaptability Tiago Carvalho and João Cardoso                                                                               | 45 |
| Distributed Processing of Data Streams in Embedded Networks <i>Ilya Korobkov</i>                                                                                                             | 49 |
| Hardware and Software Tools Development for High Performance Computing<br>Süleyman Savas and Essayas Gebrewahid                                                                              | 53 |
| Analyzing GPGPU Pipeline Latency<br>Michael Andersch, Jan Lucas, Mauricio Alvarez-Mesa, Ben Juurlink                                                                                         | 57 |
| Efficient Design Space Exploration for Many-Accelerator Systems  Efstathios Sotiriou-Xanthopoulos, Sotirios Xydis, Kostas Siozios, George  Economakos and Dimitrios Soudris                  | 61 |
| Thermal Modelling of 3D Stacked DRAM with Virtual Platforms<br>Matthias Jung, MohammadSadegh Sadri and Norbert Wehn                                                                          | 65 |

| Sharing the Instruction Cache Among Multiple Cores for HPC Applications  Ugljesa Milic, Alejandro Rico and Alex Ramirez                                                                      | 69                 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
| Low Complexity Improvements for Chip Multiprocessors Shared Caches at Ultra-low Voltages<br>Alexandra Ferrerón, Darío Suárez, Jesús Alastruey, Teresa Monreal and Víctor<br>Viñals           | 73                 |
| Extending Statistical Cache Models to Support Detailed Pipeline Simulators<br>Nikos Nikoleris and Erik Hagersten                                                                             | 77                 |
| Proximity Coherence for Instruction Caches in Tiled CMP Architectures  Tareq Alawneh, Chi Ching Chi and Ben Juurlink                                                                         | 81                 |
| Instruction Based Management of Faulty Data Caches<br>Georgios Keramidas, Michail Mavropoulos, Anna Karvouniari and Dimitris<br>Nikolos                                                      | 85                 |
| Should extension units be in each core of a multi-core processor?  Alexandre Aminot                                                                                                          | 89                 |
| Aftermath: Performance analysis of task-parallel applications on many-core NUMA systems Andi Drebes, Karine Heydemann, Nathalie Drach, Antoniu Pop and Albert Cohen                          | 93                 |
| ORWL, Ordered Read-Write Locks for Multicores and Accelerators<br>Mariem Saied, Jens Gustedt, Gilles Muller and Gaël Thomas                                                                  | 97                 |
| Hybrid Barrier Synchronization For Many Core Architectures A. Rodchenko, A. Nisbet, A. Pop and M. Lujan                                                                                      | 101                |
| Towards Operating System Support for Remote Memory Usage on ARM Microservers<br>John Velegrakis, Manolis Marazakis, Iakovos Mavroidis, Angelos Bilas, John<br>Goodacre and Manolis Katevenis | 105                |
| A novel loop scheduling strategy for heterogeneous chips<br>Antonio Vilches, Rafael Asenjo, Angeles Navarro, Francisco Corbera and Maria<br>Garzaran                                         | 109                |
| A Virtualization Framework for IOMMU-less Manycores<br>Christian Pinto, Andrea Marongiu and Luca Benini                                                                                      | 113                |
| Efficient Offload Support for Heterogeneous Embedded MPSoCs<br>Giuseppe Tagliavini, Andrea Marongiu and Luca Benini                                                                          | 117                |
| A Dynamic Memory Allocator for heterogeneous platforms  Marco Aldinucci, Maurizio Drocco and Massimo Torquati                                                                                | 121                |
| Maximizing the performance of HPC clusters with rCUDA Carlos Reaño, Federico Silla and José Duato                                                                                            | 125                |
| Towards Transparently Tackling Functionality and Performance Issues across different Open open open open open open open open o                                                               | CL<br>1 <b>2</b> 9 |
| A Compiler Framework for Data-layout Aware Vectorization<br>Shixiong Xu and David Gregg                                                                                                      | 131                |

| deGoal a tool to embed dynamic code generators into applications<br>Henri-Pierre Charles                                                                                                              | 135 |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Graph-based Kernel Recognition for Compiler Guidance  Maria Rodriguez                                                                                                                                 | 137 |
| A LARA-controlled C-to-C Compiler for Code Transformations and Instrumentation<br>Pedro Pinto, Tiago Carvalho and João M. P. Cardoso                                                                  | 141 |
| Metaheuristics Based Approach for Parallelizing Applications in On-Chip Multiprocessors<br>Vladimir Dimic and Olga Dziegielewska                                                                      | 145 |
| Active Learning Accelerated Automatic Heuristic Construction for Parallel Program Mapping William Ogilvie, Pavlos Petoumenos, Hugh Leather and Zheng Wang                                             | 149 |
| APOLLO: A speculative loop polyhedral optimizer Aravind Sukumaran Rajam, Juan Manuel Martinez Caamaño, Willy Wolff and Philippe Clauss                                                                | 153 |
| Communication forecasting for large-scale applications Nikela Papadopoulou, Georgios Goumas and Nectarios Koziris                                                                                     | 157 |
| Performance Prediction of Task-Based Programs Isil Oz, Ananya Muddukrishna, Muhammad Khurram Bhatti, Konstantin Popov and Mats Brorsson                                                               | 161 |
| Evaluating Execution Time Predictability of Task-Based Applications Thomas Grass, Alejandro Rico, Miquel Moreto, Marc Casas and Alex Ramirez                                                          | 165 |
| Dynamic command scheduling for real-time memory controller<br>Yonghui Li                                                                                                                              | 169 |
| Exploration of NVMs for High Level Memories<br>Manu Perumkunnil Komalan, Jose Ignacio Gomez Perez, Christian Tenllado<br>and Francky Catthoor                                                         | 173 |
| P-SOCRATES: Time Criticality Challenge in the Presence of Parallelised Execution Roberto Vargas Caballero and Eduardo Quiñones.                                                                       | 177 |
| On the Analysis of SPTA Computational Requirements<br>Suzana Milutinovic, Jaume Abella, Eduardo Quinones, Damien Hardy, Isabelle<br>Puaut and Francisco J. Cazorla                                    | 181 |
| Leveraging Intel Restricted Transactional Memory for Fault-Tolerance and Deterministic Execution in Safety-Critical Systems Florian Haas, Sebastian Weis and Theo Ungerer                             | 185 |
| GemFI: A Fault Injection Tool for Studying the Behavior of Applications on Unreliable Substrates  Konstantinos Parasyris, George Tziantzoulis, Christos D. Antonopoulos and Nikolaos Bellas           | 189 |
| Implicit-Storing and Redundant-Encoding-of-Attribute Information in Error-Correction-Codes<br>Yiannakis Sazeides, Emre Özer, Danny Kershaw, Panagiota Nikolaou, Marios<br>Kleanthous and Jaume Abella | 193 |

| Building Efficient Wide-SIMD Accelerators with Transport Triggered Architecture Timo Viitanen, Pekka Jääskeläinen, Heikki Kultala, Mikko Järvelä, Janne Helkala and Jarmo Takala                          | 197 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Alleycat: An Adaptive and Energy-aware Task Model for Data-intensive Applications on<br>Accelerated Communication-centric Architectures<br>Benjamin Klenk, Alexander Matz and Holger Fröning              | 201 |
| SmartM: Automated Lifecycle Management Framework for Distributed Applications on<br>Multi-Clouds<br>Antonis Papaioannou, Damianos Metallidis and Kostas Magoutis                                          | 205 |
| ERAC - Efficient and Robust Architecture for the Big Data Clouds  Evangelos Tasoulas and Feroz Zahid                                                                                                      | 209 |
| Comparing topological discovery methods for a massively parallel computer<br>Kier J. Dugan, Jeff S. Reeve, Andrew D. Brown, and Steve B. Furber                                                           | 213 |
| Unit Testing for Operating System Kernels  Maxwell Walter and Sven Karlsson                                                                                                                               | 217 |
| Outer Temperature of Mobile Devices and User Satisfaction<br>Begum Birsen Egilmez, Gokhan Memik, Seda Ogrenci-Memik and Oguz Ergin                                                                        | 221 |
| An accelerated gesture segmentation algorithm for embedded wearable devices<br>Francesco Paci and Luca Benini                                                                                             | 225 |
| Speeding Up Computer Vision Applications on Mobile Computing Platforms Luna Backes Drault and Björn Franke                                                                                                | 229 |
| Performance comparison between HAMA and HADOOP  Katsogridakis Pavlos and Polyvios Pratikakis                                                                                                              | 233 |
| Implementation of a new constraint algorithm for Molecular Dynamics<br>MªAstón Serrano-Gracia, Carl Christian Kjelgaard Mikkelsen, Jesús<br>Alastruey-Benedé, Pablo Ibáñez-Marín and Pablo García-Risueño | 237 |
| Software Platforms for Multi-Domain Multi-Physics Simulations<br>Christos Antonopoulos, Manolis Maroudas and Manolis Vavalis                                                                              | 241 |
| On the use of contention information for adaptive routing<br>Pablo Fuentes, Enrique Vallejo, Marina García and Ramón Beivide                                                                              | 243 |