Evaluation and optimization of Big Data Processing in high performance computer systems.

Tecnologia 21543 Visitas

evaluation and optimization of big data processing in high performance computer systems. Today, many organizations use Big Data technologies to extract information from large volumes of data.

As the size of these volumes grows, it is more difficult to meet the performance demands of the massive data processing applications.

This thesis focuses on evaluating and optimizing these applications, presenting two new tools called BDEv and Flame-MR. On the one hand, BDEv analyzes the behavior of Big Data processing frameworks such as Hadoop, Spark and Flink, which are very popular today.

BDEv manages its configuration and implementation, generating input data sets and executing workloads previously chosen by the user. During each execution, BDEv extracts several evaluation metrics that include performance, use of resources, energy efficiency and behavior at the microarchitecture level.

On the other hand, Flame-MR allows to optimize the performance of Hadoop MapReduce applications. In general, its design is based on an architecture driven by events capable of improving the efficiency of system resources by overlapping computing with communications. In addition to reducing the number of in-memory copies presented by Hadoop, it uses efficient algorithms to classify and merge the data.

Flame-MR replaces the MapReduce data processing engine in a completely transparent way, so it is not necessary to modify the existing application code. The performance improvement of Flame-MR has been extensively evaluated in clusters and cloud systems, performing both standard tests and applications related to real use cases.

The results show a reduction of between 40% and 90% of the execution time of the applications. This Thesis provides Big Data users and developers with two powerful tools to analyze and understand the behavior of data processing frameworks and reduce application runtime without the need to have expert knowledge in this regard.

Today, many organizations use Big Data technologies to extract information from large volumes of data. As the size of these volumes grows, satisfying the performance demands of massive data processing applications becomes more difficult.

This thesis focuses on evaluating and optimizing these applications, presenting two new tools called BDEv and Flame-MR. On the one hand, BDEv analyzes the behavior of Big Data processing frameworks such as Hadoop, Spark and Flink, which are very popular today. BDEv manages its configuration and deployment, generating input data sets and executing workloads previously chosen by the user. During each execution, BDEv extracts several evaluation metrics that include performance, use of resources, energy efficiency and behavior at the microarchitecture level. On the other hand, Flame-MR allows to optimize the performance of Hadoop MapReduce applications.

In general, its design is based on an event-driven architecture capable of improving the efficiency of system resources by overlapping computing with communications. In addition to reducing the number of copies in memory that Hadoop presents, it uses efficient algorithms to sort and mix the data. Flame-MR replaces the MapReduce data processing engine in a completely transparent way, so you do not need to modify the code of existing applications.

The performance improvement of Flame-MR has been exhaustively evaluated in cluster and cloud systems, executing both standard benchmarks and applications belonging to real use cases. The results show a reduction of between 40% and 90% of the execution time of the applications. This Thesis provides Big Data users and developers with two powerful tools to analyze and understand the behavior of data processing frameworks and reduce application runtime without the need of expert knowledge to do so.

Today, Big Data technologies are used by many organizations to extract valuable information from large-scale data sets. As the size of these data sets increases, meeting the enormous performance requirements of data processing applications becomes more challenging. This thesis focuses on evaluating and optimizing these applications by proposing two new tools, namely BDEv and Flame-MR. On the one hand, BDEv allows a comprehensive evaluation of the behavior of generalized Big Data processing frameworks, such as Hadoop, Spark and Flink. Manage configuration and deployment of frames, generating input data sets and launching workloads specified by the user. During each workload, it automatically extracts several evaluation metrics that include performance, resource utilization, energy efficiency and microarchitectural behavior. On the other hand, Flame-MR optimizes the performance of existing Hadoop MapReduce applications.

Its general design is based on an event-driven architecture that improves the efficiency of system resources by channeling data movements and calculation. In addition, it avoids the redundant memory copies present in Hadoop, while using efficient classification and merging algorithms for data processing.

Flame-MR replaces the underlying MapReduce data processing engine transparently and, therefore, it is not necessary to modify the source code of existing applications. The performance benefits provided by Flame-MR have been comprehensively evaluated in cluster systems and in the cloud by using standard benchmarks and real-world applications, showing reductions in execution time ranging from 40% to 90% .

Compartir

Comentarios