Dependable Computing Systems
ECE/CS 4434/6434
Spring 2021
Due to the current COVID-19 pandemic, all the course activities will be carried out online.
For the latest updates, please visit UVA's Return to Grounds page.
Computing systems are used in various critical domains including aerospace, energy, transportation, healthcare, and commerce.
Failures of these systems may lead to catastrophic consequences such as injury, loss of life, damage to equipment, or financial loss.
This course focuses on techniques for designing and analyzing dependable computing systems that can continue to operate correctly in the
presence of software and hardware problems. We will learn what can go wrong, how we can predict, prevent, and detect faults/errors, and
how we can design systems that can tolerate faults and recover from failures.
Topics:
- Introduction to dependable computing
- Basic terminology, attributes, and evaluation techniques
- Combinatorial and state-space modeling
- Hardware fault tolerance
- Information redundancy
- Software fault tolerance
- Checkpointing and recovery
- Reliable networked systems
- Error detection techniques
- Dependability evaluation techniques
- Safety and Security
Time: Tue/Thu 11:00am - 12:15pm
Location: Online via Zoom
Virtual Instructor Office Hours: Tuesdays 2pm or by appointment - via Zoom
Virtual TA Office Hours: Wednesdays 3pm - via Zoom
UVA Collab Site (For lecture notes, homework submission, grading)
Piazza (For questions, discussions, and polls)
Pre-requisites: This course is intended for graduate and senior-level undergraduate students. A basic knowledge of probability and computer architecture is required. A working knowledge of programming is required for homework and final project.
Undergraduate requirements: APMA 3100 or APMA 3110 - CS 3330 or ECE 4435 (co-req) - ECE 3430 (preferred)
Grading:
Class Participation/Activity | 5% |
Presentations: | |
-- Short presentations on real-world reliability/safety/security incidents/issues | UG: 5% - GRAD: 5% |
-- Paper Presentations | UG: 0% - GRAD: 10% |
Homework | UG: 25% - GRAD: 15% |
Final Project | 30% |
Midterm Exam | 15% |
Final Exam (Take Home) | 20% |
References:
- I. Koren and C. Mani Krishna, Fault-tolerant Systems, 1st edition, 2007, Morgan Kaufmann (Available here through UVA Library).
- J. Knight, Fundamentals of Dependable Computing for Software Engineers, 2012, CRC Press (Available here through UVA Library).
- K. Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, 2nd edition, 2001, John Wiley & Sons (Available here through UVA Library).
- D. K. Pradhan, Fault Tolerant Computer System Design, 1st edition, 1996, Prentice-Hall.
- B. W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, 1988, Addison-Wesley Longman Publishing Co.