Design of High Availability Systems & Software
|
|
|
| |
|
|
Course Highlights:
This course examines the high-level design of embedded systems and software that are to provide their services at
near-perfect availability.
High availability systems must tolerate both expected and unexpected faults. Their design is based on redundant
hardware and software combined in ways that will achieve “five-nines” (99.999%) or greater availability, equivalent to
less than 1 second of downtime per day. Basic hardware N-plexing and voting issues are discussed, followed by an in-
depth study of a number of backward error recovery fault tolerance techniques including static N-version programming,
Checkpoint-Rollback, Process Pairs, and Recovery Blocks. The class concludes with several forward error recovery
techniques. Many real-world examples are presented.
This course is far from a general course about system or software design theory, but rather it is highly focused on the
design of embedded systems and software that must make their services available at all times, with less than 5 minutes
per year of downtime.
Objective of the course
The primary goal of this course is to give the participant the skills necessary to design software for real-time and
embedded computer systems that must relentlessly provide service despite the occurrence of internal and external
faults. This is a very practical, results-oriented course that will provide knowledge and skills that can be applied
immediately.
Who Should Attend:
This course is intended for practicing real-time and embedded systems software system architects, project managers
and technical consultants who have responsibility for designing, structuring and implementing the software for real-time
and embedded computer systems that are required to continue providing service despite the occurrence of internal and
external faults.
Course participants are expected to be familiar with general embedded and real-time software design. [This knowledge
can be gained by attending a prerequisite embedded software design course such as "Architectural Design of Real- Time Software".]
Course Co-Requisite:
Many (but not all) high-availability systems are also safety-critical systems -- with can threaten human safety or even
human life in situations where the system fails and remains unavailable for significant periods of time. For those high-
availability systems that also have safety-critical requirements, we recommend that the course "Design of Safety-Critical Systems and Software" should be taken at the same time as this course. The two courses have little overlap in content,
and offer complimentary approaches and perspectives. It is also possible to combine these two one-day courses into a
unified two-day course for presentation at customer sites.
Course Outline:
Definitions and Background
High Availability
Fault -> Error -> Failure
Single Points of Failure
Fault Tree Analysis
Exercise: Probabilistic Fault Tree Analysis
Underlying Principles
Fault Avoidance vs. Tolerance
Failure Curves
Redundancy
Replication vs. Functional Redundancy vs. Analytic Redundancy
Dynamic vs. Static Redundancy
Extended Example: Space Shuttle Software
Fundamental System-Level Design Patterns
Static Hardware Fault Tolerance
N-Plex Design
Exercise: MTBF, MTTF Calculations in Triple Modular Redundancy
Dynamic System Fault Tolerance
Redundant Pairs
Clusters
Cluster Failover Strategy Choices
Examples: Redundant Cluster Design
Concepts for Backward Error Recovery
Design Diversity
Dynamic System Redundancy
Backward Error Recovery
Transactions
Checkpointing
System and Software Design Patterns for High Availability
Checkpoint-Rollback
Process Pairs
Recovery Blocks
Limitations of Backward Error Recovery Patterns
Forward Error Recovery Design Patterns
C Language in Critical Systems
Software Robustness: MISRA-C, LINT, Static Code Analyzers
Exercise: C-Language Shenanigans
Final Examination.
INSTRUCTOR: Dr. David Kalinsky
Dr David Kalinsky has more than thirty years of experience in the design and construction of real-time and embedded
computer systems software. He is a popular lecturer and seminar leader on technologies for embedded software
development, appearing before audiences of professional engineers in North America, Europe and Israel. David
regularly presents classes at the Embedded Systems Conferences on topics such as "Architectural Design of Device
Drivers" and "Principles of High Availability Embedded Systems Design".
He has built and managed high-tech training programs on aspects of software engineering for the development of
real-time and embedded systems for a number of Silicon Valley companies. He has also been involved in the design
of many embedded medical and aerospace systems. In addition, he has in the past developed and taught training
courses on a number of major real-time operating systems (RTOSs), including VRTX, pSOS, VxWorks, OSEK / VDX,
Nucleus, OSE and others. With his broad experience, he has trained thousands of embedded systems software
engineers and architectural designers throughout the world.
|
We are a professional organisation providing training services to companies. We offer a comprehensive range of training courses, workshops and seminars covering every aspects relating to engineering.
We provide various training programs that meet the immediate and future needs of engineers. The training is organised through seminar style, hands-on workshop, project-based tutorial or a mixture to bring the maximum learning benefits to the enginners. |
|
We have a quality pool of leading authorities, worldwide experts and fully trained up professionals who are constantly striving to uncover the pitfalls and best practices of modern technology development. |
 |
|