Presentation: "Fault Tolerance"

Time: Wednesday 11:00 - 12:00

Location: Conference Hall

Abstract:

Software is everywhere and used more and more every day to conduct our business, maintain our links with society, and to enhance our lives. We want that software to process our requests whenever we want, regardless of whether the software controls web servers, ATMs, the internet or the phone system. We don't want to wait in mid-transaction until someone in some far-away place turns the power on and off to reset the software.

Software that tolerates faults has fewer user impacting failures. When software is designed to detect, isolate, contain and process faults and the errors that they cause then the software system's availability and reliability is increased. Non-functional requirements for availability, reliability and fault tolerance have been important in some application domains for years, and are continually becoming more important in others.

This session will introduce many patterns for increasing software's fault tolerance and hence its reliability and availability. These patterns have been used in many software systems, from many different application domains, for many years. The patterns are of the scope that individual architects and designers can include them to build fault tolerance into their software systems.

Password protected Download slides

Robert S. Hanmer, Alcatel-Lucent

 Robert S.  Hanmer

Robert Hanmer has designed and maintained fault tolerant software and has been an internal consultant on software and system architecture, performance and reliability for Bell Laboratories and now Alcatel-Lucent. Currently he evaluates and selects COTS middleware for inclusion in Alcatel-Lucent products.

He is active in the software patterns community and frequently teaches pattern writing workshops. He has authored many patterns, especially in the area of reliability and fault tolerance. His book of "Patterns for Fault Tolerant Software" will be available this fall from Wiley.