Backward Compatibility ≠ Forward Scalability?
Research@Intel
March 6th, 2008
Anwar Ghuloum wrote: "One
of the constants valued by our developers is the backward compatibility
provided by our architectures in the form of a consistent ISA.
Historically, a corollary of this has been that legacy software has
benefited from process and micro-architectural improvement. Of course,
we are doing our best make sure this "forward scalability" corollary
still holds true (AKA the "free lunch"). But, the stakes are increasing
to re-optimize software to better take advantage of new
micro-architectural features that don't obviously benefit legacy
binaries. As I've mentioned previously in this blog, core counts
increase (this shouldn't surprise anyone at this point), memory
hierarchies change, and ISA evolves as power-efficiency becomes the
first-order concern. (The graph below shows a combination of a
retrospective look at changes in Vector ISA and a speculative loo k at
how Vector ISA change in the future considering stuff like
co-processor/GPU trends, application usage, and so on.) These changes
may have the unintended consequence of tempering the forward
scalability corollary or even regressing performance in some cases."
Getting C++ Threads Right
Google Tech Talks
December 12th, 2007
Abstract
The advent of multicore processors has generated profound debate on
the merits of writing parallel programs with threads and locks.
Nonetheless, for many application domains, this remains the standard
paradigm for writing parallel programs, and at the moment, there is no
apparent universal replacement. And it is the focus of this talk.
Somewhat surprisingly, there are a number of often subtle, but
generally fixable, industry-wide problems with current approaches to
threads programming. We'll focus on probably the most widely used
environments, consisting of C or C++ with a standard threads library.
Problems span the spectrum from system libraries through language
implementations through supporting hardware. They get in the way both
in that they often make it difficult to write 100% reliable
multi-threaded software, and in that they confuse even the basics of
the programming model, thus making it hard to teach. A surprising
number of "experts" do not understand the basic rules. Arguably, these
problems really need to be addressed to even allow a meaningful
comparison to other parallel programming approaches.
Since solutions to these problems generally require a coordinated
industry effort, we helped to persuade the C++ standards committee to
address them by pursuing a coherent approach to threads in the next C++
standard. The talk will outline some of the proposed solutions, and
give an update on this effort.
[PDP 2008] Program Released
Euromicro International Conference on Parallel, Distributed and network-based Processing
February 13-15, 2008
LAAS-CNRS, Toulouse, France
"The
Sixteenth Euromicro Conference on Parallel Distributed and
network-based Processing will cover all the fields of high-performance
computing, from advances in scientific and engineering applications to
new proposals in programming and problem solving environments, models,
languages and architectures. The conference will provide a forum for
discussion on recent results dealing with parallel, distributed and
network computing. Emerging new areas of research like global, peer to
peer and grid computing will be represented in PDP 2008. Challenging
new applications will be reported on."
The program includes 2 presentations about Multi-Core systems and 1 about transactional memory:
- R4-16 Evaluating the cache architecture of multicore processors
- R25-78 Link Characterization of conflicts in Log-Based Transactional Memory (LogTM)
- R27-85 Scheduling of QR factorization algorithms on SMP and multi-core architectures
Posted on February 06, 2008
[ICPP-2008] Call for Papers
2008 International Conference on Parallell Processing
September 8-12, 2008
Portland, Oregan, USA
"The
International Conference on Parallel Processing provides a forum for
engineers and scientists in academia, industry and government to
present their latest research findings in aspects of parallel and
distributed computing."
The topics include but are not limited to (see the call for papers for the complete list):
Posted on January 01, 2008
[SPAA'08] Call for Papers
20th ACM Symposium on Parallelism in Algorithms and Architectures
June 14-16, 2008
Munich, Germany
"This
year we will celebrate 20 years of SPAA with a series of invited talks,
a special track, and a poster session. Contributed papers are sought in
all areas of parallel algorithms and architectures. SPAA defines the
term ``parallel'' broadly, encompassing any computational system that
can perform multiple operations or tasks simultaneously."
The topics include but are not limited to (see the call for papers for the complete list):
- Multi-Core Architectures
- Compilers and Tools for Concurrent Programming
- Transactional Memory Hardware and Software
In addition, the conference organize a special track on Hardware and Software Techniques for Multicore Machines.
Posted on September 27, 2007
[SPAA 07] Proceedings
19th ACM Symposium on Parallelism in Algorithms and Architectures
June 9-11, 2007
San Diego, CA, USA
The
19th Annual ACM Symposium on Parallelism in Algorithms and
Architectures (SPAA '07) is part of the 2007 Federated Computing
Research Conference (FCRC 2007), June 8-16, 2007.
"The SPAA Symposium was called the ACM Symposium on Parallel
Algorithms and Architectures from 1989 to 2002. The new name reflects
the expanded scope of the conference as detailed below. SPAA '07 will
feature regular papers, each with a 25-minute talk, and brief
announcements, each with a 10-minute talk. The SPAA brief announcements
are for brief communications including work in progress or demos."
The proceedings of the conference are available online now. MPPUs-related papers were:
Session: Brief Announcements I: parallel and multicore systems
Session: Multicore architectures and Algorithms
Evaluating MapReduce for Multi-core and Multiprocessor Systems
Google engEDU
February 27, 2007
Abstract
This paper evaluates the suitability of the MapReduce model for
multi-core and ... all » multi-processor systems. MapReduce was created
by Google for application development on data-centers with thousands of
servers. It allows programmers to write functional-style code that is
automatically parallelized and scheduled in a distributed system.
We describe Phoenix, an implementation of MapReduce for
shared-memory systems that includes a programming API and an efficient
runtime system. The Phoenix runtime automatically manages thread
creation, dynamic task scheduling, data partitioning, and fault
tolerance across processor nodes. We study Phoenix with multi-core and
symmetric multiprocessor systems and evaluate its performance potential
and error recovery features. We also compare MapReduce code to code
written in lower-level APIs such as P-threads. Overall, we establish
that, given a careful implementation, MapReduce is a promising model
for scalable performance on shared-memory systems with simple parallel
code