ANR SONGS
SONGS : Simulation of Next Generation Systems
[2011-2015]
Recent and foreseen technical evolutions allow to build information systems of unprecedented dimensions. The potential power of the resulting distributed systems offers new possibilities in terms of applications, be them scientific such as multi-physic simulations in High Performance Computing (HPC), commercial in the Cloud with the data centers underlying the Internet, or public in very large peer-to-peer systems. Evaluating the scalability, robustness and performance of computer systems of such scale raises severe methodological challenges. Simply executing them is not always possible as it requires to build the complete system beforehand, and it may not even be enough when uncontrolled external load prevents reproducibility. Simulation is an appealing alternative to study such systems. It may not be sufficient in some cases to capture the whole complexity of the phenomena, but simulation allows to capture some important trends in an easy and convenient way, while ensuring the controllability and reproducibility of experiments.
The overall goal of this proposal is to design a unified and open simulation framework for performance evaluation of next generation systems. This framework should allow to study the four following domains: Grids, Peer-to-Peer systems, Clouds and HPC systems. The rationale to address these seemingly different applications domains is that they actually have a lot of similarities. In terms of hardware, clusters are central for Clouds and HPC, while wide area networks are used in Clouds and P2P when Grids and HPC rely on high performance networks. Some scientific questions are also transverse, such as energy, reliability (churn in P2P, MTTF in other systems) or I/O, asking for models shared between domains.
Because of their characteristics, these systems induce strong methodological constraints. These are to be addressed by the simulation framework, relying on common pillars of simulation methodology: Efficient simulation kernel, Sound validated models, Simulation analysis tools and Simulation campaign management. Within this project, we propose to build upon the SimGrid framework. Its relevance has already been demonstrated for Grid and Peer-to-Peer systems, and it was shown superior in terms of performance and validity to domain specific tools in various occasions. Moreover, its modularity allows to address several application domains.
Our research effort will be driven by use cases representative of the research agenda in each targeted application domain, led by members of the consortium recognized as experts in these domains. These use cases constitute the context in which any improvement to the simulation models, interfaces and associated tools will be applied and leveraged. This approach is intended to maximize the potential impact of the project results. It induces that the work on use cases will start right at the project start. We plan cycles of incremental improvements on the solutions to these use cases, so that any unforeseen difficulties can still be addressed within the project.
Although simulation alone is not the definitive answer to study large-scale systems, we are convinced that it constitutes a key methodology to this end. The SONGS project builds upon the previous USS SimGrid ANR project, which allowed us to simulate efficiently and accurately very large systems and realize the importance of some methodological aspects. The originality of the SONGS project lies in its multidisciplinary approach. Indeed, we gather in the same project experts in compilers, operating systems, virtualization, grid infrastructures and high-performance computing, statistics, trace analysis and workload characterization, algorithms and optimization, and simulation of large systems. We think that the wide area of expertise gathered in this new project will allow us to propose a grounding infrastructure allowing sound studies of every kind of large scale distributed infrastructures.