Transparent Autonomicity for OpenMP Applications
Daniele De Sensi and Marco Danelutto
One of the key needs of an autonomic computing system is the ability to monitor the application performance with minimal in- trusiveness and performance overhead. Several solutions have been pro- posed, differing in terms of effort required by the application program- mers to add autonomic capabilities to their applications. In this work we extend the Nornir autonomic framework, allowing it to transpar- ently monitor OpenMP applications thanks to the novel OpenMP Tools (OMPT) API. By using this interface, we are able to transparently transfer performance monitoring information from the application to the Nornir framework. This does not require any manual intervention by the programmer, which can seamlessly control an already existing applica- tion, enforcing any performance and/or power consumption requirement. We evaluate our approach on some real applications from the PARSEC and NAS benchmarks, showing that our solution introduces a negligible performance overhead, while being able to correctly control applications’ performance and power consumption.
Parallelization of Massive Multiway Stream Joins on Manycore CPUs
Constantin Pohl and Kai-Uwe Sattler
Joining a high number of data streams efficiently in terms of required memory and CPU time still poses a challenge. While binary join trees are very common in database systems, they are mostly unusable for streaming queries with tight latency constraints when the number of streaming sources is increasing. Multiway stream joins, on the other hand, are very suitable for this task since they are mostly independent of the non-optimal ordering of join operators or huge intermediate join results.
In this paper, we discuss challenges but also opportunities for multiway stream joins for modern hardware, especially manycore processors. We describe different parallelization and optimization strategies to allow a streaming query to join up to 256 streams on a single CPU while keep- ing individual tuple response time and also memory footprint low. Our results show that a multiway join can perform magnitudes faster than a binary join tree. In addition, further tuning for efficient parallelism can improve performance again for a factor up to a magnitude.
Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores
Adriano Vogel, Dalvan Griebler and Luiz Gustavo Fernandes
Stream processing paradigm is present in several applica- tions that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self- adaptivity to stream processing applications can provide higher-level pro- gramming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and ac- curacy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strat- egy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions.
A Fully Decentralized Autoscaling Algorithm for Stream Processing Applications
Mehdi Belkhiria Cédric Tedeschi
Stream Processing deals with the efficient, real-time pro- cessing of continuous streams of data. Stream Processing engines ease the development and deployment of such applications which are com- monly pipelines of operators to be traversed by each data item. Due to the varying velocity of the streams, autoscaling is needed to dynami- cally adapt the number of instances of each operator. With the advent of geographically-dispersed computing platforms such as Fog platforms, operators are dispersed accordingly, and autoscaling needs to be decen- tralized as well. In this paper, we propose an algorithm allowing for scal- ing decisions to be taken and enforced in a fully-decentralized way. In particular, in spite of scaling actions being triggered concurrently, each operator maintains a view of its neighbours in the graph so as no data message is lost. The protocol is detailed and its correctness discussed. Its performance is captured through early simulation experiments.
Adaptive crown scheduling for streaming tasks on many-core systems with discrete DVFS
Christoph Kessler, Sebastian Litzinger and Jörrg Keller
Abstract. We consider temperature-aware, energy-efficient scheduling of streaming applications with parallelizable tasks and throughput re- quirement on multi-/many-core embedded devices with discrete dynamic voltage and frequency scaling (DVFS). Given the few available discrete frequency levels, we provide the task schedule in a conservative and a re- laxed form so that using them adaptively decreases power consumption, i.e. lowers chip temperature, without hurting throughput in the long run. We support our proposal by a toolchain to compute the schedules and evaluate the power reduction with synthetic task sets.