Title: Exploiting Stand-in Redundancy to Improve Resilience in a System-of-Systems (SoS)
Publication Date: 3/19/2013
Conference: 11th Annual Conference on Systems Engineering Research
Resilience is the ability of a system or organization to react to and recover from disturbances with minimal effect on its dynamic stability. While the resilience of system-of systems (SoSs) depends on the reliability of their constituent systems, traditional reliability approaches cannot adequately quantify their resilience. Given the heterogeneity and often wide geographic distribution of SoS constituent systems, inclusion of backup redundant systems for a SoS is usually impractical and costly. In this paper, we quantitatively assess the impact of compensating for a loss of performance in one constituent system by re-tasking the remaining systems. We call this “stand-in redundancy”, and we develop two concepts to implement stand-in redundancy in a SoS. First, reactive resilience deals with performance recovery after a system failure has occurred. We provide a method to determine alternative SoS configurations based on performance level recovery and cost of implementation. Second, proactive resilience takes into account the gradual degradation of systems over time. The corresponding reduction in SoS performance could initiate a forcible transition to a different SoS configuration before actual failure of the system. These concepts, and their resulting upstream effects on development costs and risks, can be used by decision-makers to quantitatively assess the impact on resilience of different SoS architectures and their inherent ability to resist failures throughout the SoS lifecycle.