This tutorial focuses on how techniques from computer science allow highperformance programming to be elevated from an art that is practiced by the high priests of high performance to a science that exposes a systematic methodology that is accessible to the masses and naturally supports the multicore revolution of architecture design that has arrived.

The arrival of massively parallel architectures in the early 1990s was an opportunity to start retiring legacy codes and embracing abstraction to clean up our habits. Unfortunately, by and large, the reaction from the scientific computing community was to roll up the sleeves and insist on evolution rather than revolution. It is easy to find examples of programs, evolved from legacy code, in support of scientific computing that are broadly viewed by computational scientists as glorious examples of beauty while many of us in computer science hold these same examples up as representative of what used to be the stateof- the-art, but best retired now. A great opportunity to lead in the area of parallel programming was lost, even as great success in the area of the practical application of parallel computing was attained.

With the advent of multicore and the realization that parallelism has to be tackled for and by the masses, it is no longer acceptable to evolve legacy codes. Capturing parallelism at a high level of abstraction is critical to the success of multicore architectures as multicore evolves into many-multicore. And thus we need to understand examples for which abstraction has been successfully employed to manage the complexity of parallel programming. In this tutorial we familiarize the audience with such an example and use it to illustrate techniques that are applicable beyond dense linear algebra.