The goal of this tutorial is to present the concepts that lay behind the most successfully applied ideas to manage heterogeneous storage devices in cluster environments. Among these ideas we can point out storage virtualization that can either be coupled with deterministic mechanisms to optimize heterogeneous disk usage or with randomization techniques. Focusing on the concepts instead on the real implementations and/or systems (like it is normally done) has the advantage that the attendees get a much clearer view of the ideas presented and enables the attendees to transfer the ideas to their own fields.
The first part of the tutorial will describe the problem of heterogeneity and why it is so important in real clusters. Once the audience understands the problem and the motivation to try to solve it, we will get into the details of the main solutions proposed. Based on the concept of storage virtualization, we will discuss on the one hand deterministic mechanisms that can be applied to take advantage of the different characteristics of the storage devices and network interconnects found in the cluster. On the other hand, we will describe alternatives like randomization or adapting to access patterns which also try to take advantage of the different characteristics of the disks. We will compare these main options concerning their functionality, ease of use, scalability, and performance.
Once the static solutions are presented, we will get into the description of mechanisms that allow the addition and removal of devices (which is the main cause for heterogeneity) and the effect it may have on system performance. Once again, we will compare the proposed solutions in this dynamic environment.
The benefits for the audience will depend on the group they fall in. For instance researchers on the I/O field will get a great opportunity to overview all important concepts and they can discover many ideas that lay somewhat apart from their specific fields. Researchers from other fields may find new ways to see things that can be applied to their own research field and thus it can be of great interest. Finally, designers and developers worried about I/O will learn how to use heterogeneity in a more efficient way because they will see the problems, opportunities and potential solutions.