ComingShanghai Many-Core Workshop, March 27-28, 2008

Backward Compatibility ≠ Forward Scalability?

Research@Intel
March 6th, 2008

Anwar Ghuloum wrote: "One of the constants valued by our developers is the backward compatibility provided by our architectures in the form of a consistent ISA. Historically, a corollary of this has been that legacy software has benefited from process and micro-architectural improvement. Of course, we are doing our best make sure this "forward scalability" corollary still holds true (AKA the "free lunch"). But, the stakes are increasing to re-optimize software to better take advantage of new micro-architectural features that don't obviously benefit legacy binaries. As I've mentioned previously in this blog, core counts increase (this shouldn't surprise anyone at this point), memory hierarchies change, and ISA evolves as power-efficiency becomes the first-order concern. (The graph below shows a combination of a retrospective look at changes in Vector ISA and a speculative loo k at how Vector ISA change in the future considering stuff like co-processor/GPU trends, application usage, and so on.) These changes may have the unintended consequence of tempering the forward scalability corollary or even regressing performance in some cases."

Getting C++ Threads Right

Google Tech Talks
December 12th, 2007

Abstract

The advent of multicore processors has generated profound debate on the merits of writing parallel programs with threads and locks. Nonetheless, for many application domains, this remains the standard paradigm for writing parallel programs, and at the moment, there is no apparent universal replacement. And it is the focus of this talk.

Somewhat surprisingly, there are a number of often subtle, but generally fixable, industry-wide problems with current approaches to threads programming. We'll focus on probably the most widely used environments, consisting of C or C++ with a standard threads library. Problems span the spectrum from system libraries through language implementations through supporting hardware. They get in the way both in that they often make it difficult to write 100% reliable multi-threaded software, and in that they confuse even the basics of the programming model, thus making it hard to teach. A surprising number of "experts" do not understand the basic rules. Arguably, these problems really need to be addressed to even allow a meaningful comparison to other parallel programming approaches.

Since solutions to these problems generally require a coordinated industry effort, we helped to persuade the C++ standards committee to address them by pursuing a coherent approach to threads in the next C++ standard. The talk will outline some of the proposed solutions, and give an update on this effort.

[PDP 2008] Program Released

Euromicro International Conference on Parallel, Distributed and network-based Processing
February 13-15, 2008
LAAS-CNRS, Toulouse, France

"The Sixteenth Euromicro Conference on Parallel Distributed and network-based Processing will cover all the fields of high-performance computing, from advances in scientific and engineering applications to new proposals in programming and problem solving environments, models, languages and architectures. The conference will provide a forum for discussion on recent results dealing with parallel, distributed and network computing. Emerging new areas of research like global, peer to peer and grid computing will be represented in PDP 2008. Challenging new applications will be reported on."

The program includes 2 presentations about Multi-Core systems and 1 about transactional memory:

  • R4-16 Evaluating the cache architecture of multicore processors
  • R25-78 Link Characterization of conflicts in Log-Based Transactional Memory (LogTM)
  • R27-85 Scheduling of QR factorization algorithms on SMP and multi-core architectures

[ICPP-2008] Call for Papers

2008 International Conference on Parallell Processing
September 8-12, 2008
Portland, Oregan, USA

"The International Conference on Parallel Processing provides a forum for engineers and scientists in academia, industry and government to present their latest research findings in aspects of parallel and distributed computing."

The topics include but are not limited to (see the call for papers for the complete list):

  • Architecture

[SPAA'08] Call for Papers

20th ACM Symposium on Parallelism in Algorithms and Architectures
June 14-16, 2008
Munich, Germany

"This year we will celebrate 20 years of SPAA with a series of invited talks, a special track, and a poster session. Contributed papers are sought in all areas of parallel algorithms and architectures. SPAA defines the term ``parallel'' broadly, encompassing any computational system that can perform multiple operations or tasks simultaneously."

The topics include but are not limited to (see the call for papers for the complete list):

  • Multi-Core Architectures
  • Compilers and Tools for Concurrent Programming
  • Transactional Memory Hardware and Software

In addition, the conference organize a special track on Hardware and Software Techniques for Multicore Machines.

[SPAA 07] Proceedings

19th ACM Symposium on Parallelism in Algorithms and Architectures
June 9-11, 2007
San Diego, CA, USA

The 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '07) is part of the 2007 Federated Computing Research Conference (FCRC 2007), June 8-16, 2007.

"The SPAA Symposium was called the ACM Symposium on Parallel Algorithms and Architectures from 1989 to 2002. The new name reflects the expanded scope of the conference as detailed below. SPAA '07 will feature regular papers, each with a 25-minute talk, and brief announcements, each with a 10-minute talk. The SPAA brief announcements are for brief communications including work in progress or demos."

The proceedings of the conference are available online now. MPPUs-related papers were:

Session: Brief Announcements I: parallel and multicore systems

Session: Multicore architectures and Algorithms

Evaluating MapReduce for Multi-core and Multiprocessor Systems

Google engEDU
February 27, 2007

Abstract

This paper evaluates the suitability of the MapReduce model for multi-core and ... all » multi-processor systems. MapReduce was created by Google for application development on data-centers with thousands of servers. It allows programmers to write functional-style code that is automatically parallelized and scheduled in a distributed system.

We describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes. We study Phoenix with multi-core and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. We also compare MapReduce code to code written in lower-level APIs such as P-threads. Overall, we establish that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code