مرکز منطقه ای اطلاع رساني علوم و فناوري

Abstract :

Summary form only given. Database systems have long optimized for parallel execution; the research community has pursued parallel database machines since the early ´80s, and several key ideas from that era underlie the design and success of commercial database engines today. Computer microarchitecture, however, has shifted drastically during the intervening decades. Until the end of the 20th century Moore´s Law translated into single-processor performance gains; today´s constraints in semiconductor technology cause transistor integration to increase the number of processors per chip roughly every two years. Chip multiprocessor, or multicore, platforms are commodity; harvesting scalable performance from the available raw parallelism, however, is increasingly challenging for conventional database servers running business intelligence and transaction processing workloads. A careful analysis of database performance scaling trends on future chip multiprocessors [7] demonstrates that current parallelism methods are of bounded utility as the number of processors per chip increases exponentially. Common sense is often contradicted; for instance, increasing on-chip cache size or aggressively sharing data among processors is often detrimental to performance [6]. When designing high performance database systems, there are tradeoffs between single-thread performance and scalability; as the number of hardware contexts grows, favoring scalability wins [5]. In order to transform a database storage manager from a single-threaded Atlas into a multi-threaded Lernaean Hydra which scales to infinity, substantial rethinking of fundamental constructs at all levels of the system is in order. Primitives such as the mechanism to access critical sections become crucial: spinning wastes cycles, while blocking incurs high overhead [3]. At the database processing level, converting concurrency into parallelism proves to be a challenging task, even for transactional workloads that are inherent- - ly concurrent. Typical obstacles are by-definition centralized operations, such as locking; we need to ensure consistency by decoupling transaction data access from process assignment, while adapting lessons from the first parallel database machines on the multicore platforms of the future [1][4]. Often, parallelism needs to be extracted from seemingly serial operations such as logging; extensive research in distributed systems proves to be very useful in this context [2]. At the query processing level, service-oriented architectures provide an excellent framework to exploit available parallelism. In this talk, I present lessons learned when trying to scale database workloads on chip multiprocessors. I discuss the tradeoffs and trends and outline the above research directions using examples from the StagedDB/CMP and ShoreMT projects at EPFL.

Keywords :

cache storage; competitive intelligence; microprocessor chips; multi-threading; multiprocessing systems; parallel databases; peer-to-peer computing; query processing; service-oriented architecture; transaction processing; CMP project; Moore Law; ShoreMT project; StagedDB project; business intelligence; chip multiprocessor; commercial database engine; computer microarchitecture; conventional database server; data sharing; database storage manager; multithreaded Lernaean Hydra; on-chip cache size; parallel database machine; query processing level; scalable database system; semiconductor technology; service oriented architecture; single processor performance; single threaded Atlas; transaction data access; transaction processing; transistor integration; waste cycle; Awards activities; Computers; Database systems; Parallel processing; Program processors; Scalability;