Home
Home > About > Presentations—Gelato ICE | Singapore | October 2007

Presentations—Gelato ICE | Singapore | October 2007

 

Welcome and Introduction of Host Institutions

Hing Yan Lee, National Grid Office

Mark K. Smith, Gelato Central Operations

Kevin Veragoo, Institute for High Performance Computing

No presentation available currently

 

Keynote: Linux in the Enterprise

Randy Hergett, HP

In the overall IT market, Linux is the fastest growing operating system, and has been widely adopted in many IT environments. However, the large majority of the Linux deployments to date have been for "edge of network" applications. Its adoption for true mission-critical applications in the enterprise data center is lagging the other operating systems. Why is that? Many enterprise customers believe that Linux still doesn't meet the rigid requirements to be at the core of their data centers. While that view may be partly due to misperception, there is definitely some reality to their view as well. We need to continue to put energy into areas such as performance and scalability, and we need to focus on areas such as manageability, availability, and serviceability in order to meet the requirements of the next-generation data center. This presentation will provide an overview of the customer needs research that HP has done over the past year, and will outline the features that we believe Gelato and the development community will need to focus on to drive Linux adoption into the data center.

Presentation (pdf, 1.7 MB)

 

Basic Intel Itanium Architecture

Cameron McNairy, Intel

The Itanium architecture and the paradigm of explicit parallel instruction computing (EPIC) are often poorly understood. This presentation will cover important aspects of the EPIC paradigm, including: software pipelining, register save engine, predication, parallel instruction groups, data and control speculation, and many other mysteries of the Itanium-based application and system architectures.

No presentation available currently

 

Oracle Grid Computing on Itanium Linux

Yoke Kew Ling, Oracle

Oracle is a strong supporter of grid computing, and has been providing grid solutions around storage, data, application, and grid control.

Itanium architecture technologies with Linux fit in well with Oracle's grid strategy, in line with Oracle's vision of providing low-cost, high-performance computing resources on demand for enterprises.



Presentation (pdf, 615 KB)

 

Utility Computing in a Heterogenous World

Frank Feldmann, Novell

During this session, I will discuss various aspects of grid-based computing using heterogeneous infrastructure and how grid computing could play an integral part in an organization's utility computing strategy. At the end of the session, you will see a short demonstration of grid workload orchestration.

Presentation (pdf, 1.9 MB)

 

A Reliable and Efficient Adaptive Parallelism Framework

Rick Goh, Institute for High Performance Computing

We describe an adaptive parallelism framework for the efficient parallelization of applications and the attainment of good overall system utilization. This new framework allows an application to monitor its own performance during execution so that it can adjust to runtime situations. The derived algorithm can adapt to its own degree of parallelism and prevailing system load throughout the job execution.
When a job has sufficient work and the system has spare capacity, more active threads are activated. Fewer threads are used when there is resource contention. The framework enables the system to automatically adjust so that it can perform optimally. It also proposes a lock-freedom property to help improve the reliability, efficiency and performance of applications. Benchmark results demonstrated a linear speedup of serial/sequential applications with minimal overheads on multiprocessor and multi-core systems. The proposed framework outperforms traditional non-adaptive approach by about 100% on a Hyper-Threading 16-processor system.

No presentation available currently

 

Computational Challenged in Detection of Pathogens in the Presence of Complex Backgrounds

Yiriy Fofanov, University of Houston

No presentation available currently

 

State of Linux RAS Features for Mission Critical Environments

Ping-Hui Kao, HP

As Linux inches into the mission-critical environment (e.g. corporate data center), it meets the incumbents, various UN*X platforms, such as HP-UX, AIX, and Solaris, which have dominated data centers for decades. For Linux to penetrate into this market, it must meet the requirements and expectations of the customer in a mission-critical environment. RAS (reliability, availability, and serviceability) is a MUST for such environments. Itanium architecture provides many RAS features that do not exist in the popular x86 platforms. This presentation will layout the requirements and the gaps of Linux on Itanium to meet RAS requirements of data centers.

Presentation (pdf, 227 KB)

 

Virtualization on the SGI Altix

Jes Sorensen, SGI

Virtualization has become increasingly important in the server space during the last couple of years and has been moving to the IA-64 platform as well. Virtualization on IA-64 started out on smaller SMP systems, but is now moving onto larger NUMA systems, such as the SGI Altix.

This talk will present the current status of virtualization on the SGI Altix, what has been achieved, and what the near and mid-term future of virtualization on IA-64 and NUMA systems holds for us. With multiple projects on the horizon, status and expectations for the main hypervisor projects will be discussed.

Presentation (pdf, 174 KB)

 

64-Bit Migration to Linux on Itanium

Tony Luck, Intel

The objective of this talk is to bring forth the benefits and challenges involved in the 64-bit migration path, in general as well as specific to Linux on Itanium. This presentation will provide insights into the tools and techniques available for aiding 64-bit migration, while also sharing some tips with the programming community on how to avoid common pitfalls.

No presentation available currently

 

Hyper-Threading on Dual-Core Itanium 2

Cameron McNairy, Intel

Hyper-threading on the dual-core Intel Itanium 2 processor provides what appears as two logical processors for each core. As a result, the benefits of hyper-threading are available to an application automatically. However, there are things that the application and operating system can do to optimize for hyper-threading. This presentation introduces hyper-threading and then transitions into what software can and should do to best realize performance.

No presentation available currently

 

Total Cost of Ownership: Advantages of Genuinely Secure Applications

Steve Goodbarn, Secure 64

Bill S. Worley, Secure64

This presentation is intended for server application developers, CIOs, and IT managers seeking to differentiate their products and gain efficiencies

What you will get from this session: (1) How Itanium with Secure64 makes applications genuinely secure and self protecting—able to be connected to the Web without bodyguard protections. (2) How genuinely secure applications can improve the bottom line, reduce power consumption and cooling, save space, and make life easier. (3) How to make applications genuinely secure.

A case study demonstrating these benefits will also be presented.

Presentation (pdf, 1.9 MB)

 

Java on the Itanium

Adesh Gupta, Intel

No presentation available currently

 

Measuring & Optimizing Performance and Correctness Factors of Large Enterprise Applications

Kumar Rangarajan, S7

Performance tuning is always an important aspect for any application, and one of the challenges for performance tuning is data collection. How do I collect reliable performance statistics, without extensive modifications to my build system, is a question that is always raised in such situations. What are the options provided by the system to collect and improve my applications performance characteristics?

Similarly when developing large enterprise applications, and running them on multi-processor systems, applications sometimes tend to behave strangely. An application which works perfectly well on a single processor system will start throwing tantrums when switched to multi-processor systems. What are the factors that affect behavior in such systems?

This presentation is derived from our company's experience and lessons learned while migrating a large enterprise class database application onto Linux & HP-UX IA-64. It will help you understand the options available under IA-64 for performance tuning and optimizations, and the low level details that affect program behavior, when building large applications. Primarily it will concentrate on the Caliper toolset, PBO (profile based optimization), atomic operations available in the architecture to perform quick small tasks and the issues related to multi-processor systems (e.g.: memory coherency) and how applications can shield themselves from its effects.

Presentation (pdf, 332 KB)

 

Optimizing Software with Intel Compilers

Xinmin Tian, Intel

The Intel C++/Fortran compiler provides an essential tool for unleashing the power of the Itanium family of Montecito and Montvale dual-core processors by means of advanced high-level and low-level optimizations. In this talk, I will present a set of optimizations built in the Intel compiler that has an intimate knowledge of micro-architecture performance aspects for users to optimize their software on Itanium dual-core processors.

No presentation available currently

 

Open MP

Om P Sachan, Intel

This presentation will introduce OpenMP (Open Multi-Processing), a simple and flexible interface for developing shared-memory parallel applications. You will learn how to add parallelism incrementally. The serial and parallel version of code lives together and using compiler option we can generate single or multi-threaded binary.

No presentation available currently

 

The Viability of Commercial Computational Grids

Kumaran Pillai, Protégé Software

The talk will explore the implementations of commercial grids, including business models and case studies.

No presentation available currently

 

Reading and Interpreting Stall Counters

Matthieu Delahaye, Gelato Central Operations

This presentation will cover the fundamentals of reading and interpreting stall counters on the Intel Itanium architecture.

No presentation available currently

 

Update on LinuxOnlinux

Peter Chubb, University of New South Wales

Since I reported on the LinuxOnLinux virtual machine at the last Gelato ICE, we've been doing a lot of work on it. In this talk, I'll give a very brief introduction to LinuxOnLinux, and then will talk about what's changed since April. The main thing we've been working on is direct device access via the UserLevelDrivers work that UNSW has been doing. The aim is to allow unmodified device drivers in the guest to access a selection of the PCI devices available on the host hardware. As I write this abstract in August, this goal has been achieved, albeit at low performance. I expect that by the time of the conference, the main performance problems will be understood, and some fixed.

Presentation (pdf, 155 KB)

 

Spotlight: The Software Stack to Unleash Multi-Core Power from the Intel Perspective

Xinmin Tian, Intel

Ever-increasing performance requirements, coupled with power constraints, has led to the emergence of multi-core platforms. Unleashing the power of multi-core processors is critically lean on the availability of a parallel software stack. In this talk, I will provide Intel’s perspective on building a software stack for platforms configured with multi-core processors. In addition, an overview of the Intel’s new 10.1 C++/Fortran compiler, Intel’s C++ Software Transactional Memory (STM) compiler, and Intel Threading Building Blocks (TBB) will be given to uncover the threading efforts being made inside Intel for the new multi-core era.

No presentation available currently

 

Itanium Processor Road Map

Cameron McNairy, Intel

The Itanium processor road map is always a point of interest and this presentation will discuss where Intel and OEMs are taking the Itanium Processor Family in the near future.

No presentation available currently

 

Locating Optimization Opportunities with VTune

Om P Sachan, Intel

The VTune analyzer provides an integrated performance analysis and tuning environment that helps you analyze your code's performance on systems with Intel Itanium architecture. VTune provides non-intrusive techniques to locate hot spots in your application. This lecture will focus on finding hot spots with Intel's VTune Performance Analyzer, analyzing the call tree and finding bottlenecks on the Intel Itanium 2 processor, using processor cycle accounting methodology.

No presentation available currently

 

Parallel Programming Concepts

Cameron McNairy, Intel

This presentation will refresh attendees' memories on parallel programming concepts, advantages, and pitfalls.

No presentation available currently

 

Storage Layout Optimizations to Improve Parallel Distributed Filesystem Performance

Doug Johnson, Ohio Supercomputer Center

Every parallel filesystem is built on a combination of many instances of local filesystems. Their performance directly impacts the overall performance of the distributed system. In this talk, results will be presented for a detailed investigation of what the effect of filesystem journal placement, type of device, the journal mode, and the size have on metadata and bulk storage performance. The tests were performed on fibre-channel storage arrays with multi-terabyte ext3 and xfs file systems. The journal devices tested include internal and SAN disks, as well as solid state storage (SSD). Results using the kernel blktrace facility will be presented to quantify the differences at the block layer for the different tests.

No presentation available currently

 

GCC and Osprey Project Update

Shin-Ming Liu, HP

The goal of the Osprey Project is to deliver a high-performance GCC with an alternative backend based on Open64 with collaboration of three Gelato Member universities—University of Delaware, Tsinghua University, and Institute of Computing Technology, Chinese Academy of Science (ICT). Since the project started in November 2005, it has released Open64 3.0, 3.1, and 4.0 releases over the Internet (http://www.open64.net). The most recent release is ABI-compatible with GCC 4.0 with significantly better runtime performance over stock GCC. In this talk, we are going to share about the recent development on this compiler as well as some detail performance comparisons with GCC and other compilers.

No presentation available currently

 

Cross Compilers for FUn, Pleasure and Profit.

Peter Chubb, University of New South Wales

Now that the Ski simulator has been open-sourced, people more and more will be attempting to run IPF code on non-IPF platforms. To do this effectively, you need a cross compiler. In this talk, I'll run through the steps you take to make GCC into a cross compiler for your host.

Although I'll be using IA-64 as the example (after all, this *is* Gelato!), similar techniques can be used to create a cross compiler for any architecture.

Presentation (pdf, 100 KB)

 

Multi-Core Support for Helper and Speculative Threads

Wei Chung Hsu, University of Minnesota

Starting with Montecito, Itanium-based CMPs will come out with more and more processor cores. At present, such CMPs are mainly used to improve throughput or highly parallel application performance rather than single thread performance. Helper threading and speculative multi-threading are two ways to utilize the multiple threads to improve general purpose single thread performance.

For speculative multi-threading, architectural support is needed to detect any dependence violation, and also to buffer the results created by speculatively created threads. For helper threading, architectural support for efficient thread communication enables more effective helper threaded prefetching. Based on our experience of speculative and helper thread implementation, we propose some architectural supports that may be useful for implementing speculative and helper thread on future Itanium based CMPs.

No presentation available currently

 

An Inside Look at Scaling Linux to 2048 Processors

Steve Neuner, SGI

Standard Linux distributions are now supporting 1024 processor systems, realtime, and other high-performance OS features, without using a customized or special kernel. This session will look at recent 2.6 Linux kernel improvements that make scaling to 2048p possible. We will also discuss general considerations with scaling systems to large CPU counts.

Presentation (pdf, 866 KB)

 

Using TotalView to Debug Multi-Core Code

Daphne Ng, Qast Singapore Pte Ltd

No presentation available currently

 

NFS Benchmarking Using nfsreplay

Shehjar Tikoo, University of New South Wales

Linux NFS implementations have often been criticized for not performing as well as other contemporary operating systems. An important step to remedy this is constructing realistic and repeatable benchmarks which can illustrate bottlenecks. Previous research has shown that SPEC filesystem benchmarks (the most widely used load generator for NFS benchmarking) do not accurately reflect many workload classes found in the real-world.

nfsreplay is motivated by the idea that the most realistic workload for a particular setup will be best represented by a traffic capture from that network itself.

The nfsreplay project provides various tools to replay such a capture at a target NFS server. It facilitates scaled replay so that the rate of replay can be increased while maintaining the characteristics of the trace.

We'll present Linux server benchmarks performed using nfsreplay on an Itanium-based system.

No presentation available currently

 

VML: A Step Towards Efficient Multi-Core HPC Applications

Clemens C. J. Roothaan, Gelato Honorary Member

In large-scale scientific and engineering computer applications, one frequently encounters critically important tasks where modulo scheduled execution can achieve unprecedented efficiency. The Itanium processor was specifically designed to facilitate the construction of modulo scheduled codes. An important practical implementation in this domain, targeting the mass-production of elementary functions, is the vector math library (VML) released by HP in the public domain. The principal characteristics of the VML programming model are: (1) a source code in which the function at hand is defined by a universal algorithm, without branches, in terms of explicit arithmetic statements; (2) handling of all exceptions by housekeeping instructions that execute in parallel with the floating point calculations; and (3) executing two computational tracks in parallel, thus using the single Itanium chip as a two-core processor. Clearly, generalizing the VML codes for a multi-core computer should not be difficult; some hardware and software considerations will be discussed. It should also be noted that the VLM model is applicable to many other large-scale procedures that can be formulated by a universal algorithm without branches.

No presentation available currently

 

Discussion: Synchronization and Memory

Cameron McNairy, Intel

This session will be a discussion of synchronization and memory ordering for Itanium processors.

No presentation available currently

 

The Itanium PAL (Processor Abstraction Layer)

Tony Luck, Intel

Like all good software structures, the Itanium software stack follows the "lasagna" model, with several layers of software abstracting away implementation details of lower levels while providing higher levels of functionality at the upper levels.

The processor abstraction layer (PAL) sits at the bottom of this stack. This talk explains how the Linux kernel interacts with the PAL.

No presentation available currently

 

Overview of Research Applications Run on TLC2 Itanium Platforms at the University of Houston

Rosalinda C. Mendez, University of Houston

The University of Houston has been a great supporter of the Itanium platform. Researchers from several disciplines use the shared computing environment of the Texas Learning and Computation Center to develop performance tools, run applications for high energy physics, bioinformatics, and geophysical and the life sciences. This presentation will cover the vast array of projects at the University of Houston.

No presentation available currently

 

Power Management in Itanium Linux

Fenghua Yu, Intel

We will overview the current power management status on Itanium architecture on the Linux operating system. We will discuss C states and P states, DBS (demand based switch), power management aware scheduler, power consumption, etc.

Presentation (pdf, 128 KB)

 

The Impact of ANSI C/C++ Aliasing Rules on Real-World Applications Performance

Alexander Isaev, Intel

ANSI C/C++ standards have a simple and clean set of aliasing rules. Most modern compilers (including the Intel compiler, HP compiler, and GCC) support them, and apply more aggressive optimizations for conformant programs.

In this talk, we will describe ANSI C/C++ aliasing rules in detail, show how they can be enabled in compilers, and demonstrate their impact on performance using programs from SPEC CPU2006 suite as an example.

Presentation (pdf, 237 KB)

 

Introduction to TBB

Om P Sachan, Intel

Intel Threading Building Blocks (TBB) is a cross-platform and compiler independent framework that simplifies programming for multi-core architectures by providing pre-built abstractions for parallel execution patterns. This lecture focuses on the presenter’s experience of porting applications to use Intel TBB and the performance benefits it can deliver. You will learn how applications can be multi-threaded with Intel TBB and how to change your code from using STL-like containers and algorithms to Intel TBB. We will demonstrate the elegant ways of building parallel solutions with Intel TBB that are simpler than solutions based on raw threads.

No presentation available currently

 

Leverage Optimized Integrated Software Virtualization on Itanium with Red Hat Enterprise Linux 5.1

Sivaram Shunmugam, RedHat

Understand how the upcoming release of Red Hat Enterprise Linux Advanced Platform 5.1 provides additional high availability and storage virtualization technologies and scales to support any number of processors and guest environments thus making it the solution for today's large, high-performance Itanium systems.

No presentation available currently