Abstract :
Summary form only given. In this talk, inspired by working on the Lustre cluster file system during the last ten years, we will take a tour of internal features that are required for recovery and data management to work reliably and with horizontal scaling. Scale now means handling 100´s of servers and tens of thousands of clients in larger data centers, often with replicas spanning the globe. Things that we will look at are search, striping, clustering of metadata services and its recovery, caching and replication as well as HSM, migration and other data management features. Currently such features are implemented in an ad-hoc manner, deep in the guts of many systems. We will demonstrate during this talk is that there is an opportunity to define concise semantics that enables all such features, including their recovery and reasonable algorithmic complexity. This can be the first step towards a more modular, interoperable approach to file based data management applications, including file servers, middle-ware, backends for data and metadata, as well as data management applications. This is similar to what the relational algebra did 30 years ago for database applications.