As any engineering system, a network (or a part thereof) can fail for a number of reasons: faulty hardware, software bugs, breakage of physical medium (e.g. fiber cables), and even because of power outages. These are examples of failures that affect specific and, generally, separated network elements. It is typical to assume that such failures are independent events, and that the probability that two events happen simultaneously is quite low. The resulting setting is commonly referred to as a single failure scenario.
A large number of techniques exist for dealing with failures, collectively known as network recovery techniques. The fundamental idea underlying recovery is that of redundancy, whereby network elements deemed to be unreliable are backed up with one or more spare resources that come into play upon a failure. Almost all of the recovery techniques focus on single failures and aim to offer a specific trade-off between resilience guarantee and resource consumption. The recovery techniques are mature and their efficacy is proven.
As size, integration and complexity of networks increase, multiple failure become more probable during the operation of the system. The distinguishing trait here is that a significant portion of the network fails simultaneously. From the point of view of the traditional recovery techniques, a large-scale failure is difficult to handle due to the fact that the redundancy-based approach that is effective for single failures is no longer suitable: the cost of implementing massive redundancy for rarely occurring events is simply prohibitive.
Although large-scale failures may be relatively rare, they usually have serious consequences in terms of the economic loss they cause and the disruption they bring upon thoushands or even millions of users. Therefore, it is mandatory to have methods and tools that can be used in the design and operation of the communications infrastructure so that essential services can be preserved as much as possible when large-scale failures occur. In this context, a key requirement is the ability to evaluate the robustness of the network, that is, be able to assess the performance degradation that should be expected as a consequence of the failure.
The robustness of a network depends on the type of impairment that occurs. From here on, the term impairment refers to any kind of attack, multiple or cascading failures that can happen within a network (it does not refer to physical layer impairments such as cross-talk or chromatic dispersion). Several taxonomies have been proposed in order to classify network attacks [1,2]. Consequently, the discussion presented here simplifies such previous categorizations and focuses on classifying the types of impairments that can occur on the nodes of a network. Nonetheless, it has also been defined in order to be easy extended, so as to consider other componentes of a network, such as links.
Impairments or multiple failures are basically divided into two groups: static and dynamic. The former is related to the idea of affecting a network permanently and just once, while the latter is related to an impairment that has a temporal dimension.
In the SR case, nodal attacks occur indiscriminately selecting nodes at random. Fig. 1 shows this kind of impairments.
Figure 1: Examples of a SR impairment. (a) and (b) show that nodes are chosen randomly
Nodes in an ST attack are chosen in order to maximize the effect of that attack; there is an element of discrimination in the impairment. The choice of attack target may be a function of network-defined features such as nodal degree, betweeness centrality... as well as other "real-world" features, such as the number of users potentially affected and socio-pollitical and economic considerations. Figs. 2 and 3 show some examples of ST attacks, considering different elements of discrimination.
Figure 2: Example of a ST impairment. The target of the attack is affecting the node with the largest node degree
Figure 3: Example of a ST impairment. The target of the attack is partitioning the network into two disconnected sub-networks
Considering a DE, a failure occurs in a node (or a set of nodes of the network) and the failure can spread through the network (becoming an epidemic) or not. Fig. 4 shows an example of how an epidemic can affect a network.
Figure 4: Example of a DE impairment. A failure occurs on a node, and after a period of time, it spreads to its neighbours
This type of failures is based on epidemic models (EM) and there are several forms of them, depending on the evolution of susceptible nodes (nodes in contact with another infected node) once become infected: infected nodes will remain infected forever (Susceptible-Infected model, used for "worst case propagation" analysis), infected nodes will eventually become susceptible again (Susceptible-Infected-Susceptible model), infected nodes will be able to recover and become immune so as will no longer pass the infection onto others (Susceptible-Infected-Recovered model)... In [3, 4] different applications of epidemic models to communication networks has been studied.
A DP is, simply, any kind of impairment that occurs periodically following its characteristic cycle.
Many measures have been proposed in the literature, with the aim of capturing or characterizing some properties in the network, measuring in different ways if the network is "well prepared" or not to survive (i.e. to remain as a connected network) under multiple failures in nodes and/or links. Most of these metrics are computed depending only on the network topology.
Metric | Value | Comment |
---|---|---|
Algebraic connectivity | #algebraicConnectivity# | #comments_algebraicConnectivity# |
Assortativity | #assortativity# | #comments_assortativity# |
Average link betweeness centrality | #averageLinkBC# | #comments_averageLinkBC# |
Average neighbor connectivity | #averageNeighborConnectivity# | #comments_averageNeighborConnectivity# |
Average (out) node degree | #outNodeDegree# | #comments_outNodeDegree# |
Average node betweeness centrality | #averageNodeBC# | #comments_averageNodeBC# |
Average shortest path length | #avgPathLength# | #comments_avgPathLength# |
Average two-term reliability | #a2tr# | #comments_a2tr# |
Clustering coefficient | #clusteringCoeff# | #comments_clusteringCoeff# |
Density | #density# | #comments_density# |
Diameter | #diameter# | #comments_diameter# |
Heterogeneity | #heterogeneity# | #comments_heterogeneity# |
Link connectivity | #linkConnectivity# | #comments_linkConnectivity# |
Link set size (|E|) | #numberOfLinks# | #comments_numberOfLinks# |
Maximum node degree | #maxNodeDegree# | #comments_maxNodeDegree# |
Node connectivity | #nodeConnectivity# | #comments_nodeConnectivity# |
Node set size (|N|) | #numberOfNodes# | #comments_numberOfNodes# |
Spectral radius | #spectralRadius# | #comments_spectralRadius# |
Symmetry ratio | #symmetryRatio# | #comments_symmetryRatio# |
This metric is defined as the second smallest eigenvalue of the Laplacian matrix of the graph. It measures how difficult it is to break the network into islands or individual components. The larger the λ2, the greater the robustness of a topology against both node and link removal [5]. The value of λ2 is in range:
where di is the (out) nodal degree of node i.
Given the NxN adjacency matrix A of a graph G and the NxN Δ=diag(di) matrix, the Laplacian matrix L of the graph is given by [6]:
The assortativity coefficient r, can take values between -1≤r≤1. When r<0 the network is called to be dissassortative, which means that has an excess of links connecting nodes of dissimilar degrees. Such networks are vulnerable to both static random and targeted attacks (SR and ST). The opposite properties apply to assortative networks with r>0 that have an excess of links connecting nodes of similar degrees. Many social networks have significant assortative structure, while technological and biological networks seem to be disassortative.
The global assortativity coefficient r can be determined taking into account the Pearson correlation coefficient of the (out) degrees at both end of the edges:
where je and ke are the degrees of the vertices at the end of edge e∈E, M=|E| and
This metric provides information about 1-hop neighborhoods around a node. It is a summary statistic of the Joint degree distribution (JDD) and it is simply calculated as the average neighbor degree of the average k-degree node [7].
This metric is the average (out) node degree of a network.
Average shortest path length (ASPL) is calculated as an average of all the shortest paths between all the possible origin-destination node pairs of the network. Networks with small ASPL are more robust because, in response to any kind of impairment (SR, ST, DE or DP), they are likely to close fewer connections.
This metric is the probability that a randomly chosen pair of nodes is connected. If the network is fully connected the value of A2TR is 1. Otherwise, it is the sum over the number of node pair in every connected component divided by the total number of node pairs in the network. This ratio gives the fraction of node pairs that are connected to each other. Therefore, the higher the value (for a given number of removed nodes), the more robust the network is in response to an static random (SR) attack that affects the same number of nodes.
This metric is equal to the number of shortest paths from all vertices to all others that pass through a node (or link). Betweenness centrality is a more useful measure of the load placed on the given node (or link) in the network as well as the importance of the node (or link) to the network than just connectivity. The latter is only a local effect while the former is more global to the network.
This metric measures the degree to which nodes in a network tend to cluster together. Clustering expresses local robustness in the network and thus has practical implications: the higher the clustering, the more interconnected are neighbors from a given node, thus increasing the path diversity locally around that node.
The clustering coefficient of the network is the average of the local coefficients. The local coefficient of a node i is given by:
This metric measures how many links are compared to the maximum possible number of links between nodes. If a network has no loops and a simple graph model is assumed, that network can have at most |N|x(|N|-1) links.
This metric is the longest of all the shortest paths between pairs of nodes. In general, low-diameter networks are more robust, but only if node connectivity is high.
Heterogeneity is the standard deviation among the all-pairs shortest path lengths divided by the average all-pairs shortest path length. The lower the magnitude of its heterogeneity, the greater the robustness of the topology.
This metric represents the smallest number of edge-disjoint paths between any two nodes. This metric gives a crude indication of the robustness of a network in response to any of the impairments.
This metric is equal to the number of directional links in the network.
This metric is equal to the highest (out) node degree
This metric represents the smallest number of node-disjoint paths between any two nodes. This metric gives a crude indication of the robustness of a network in response to any of the impairments.
This metric is equal to the number of nodes in the network.
This metric is defined as the largest eigenvalue of the adjacency matrix (Ann), often referred to simply as λ1, which plays an important role in determining epidemic thresholds, which correlates with the severity of dynamic epidemical (DE) failures on a network. For example, in a Susceptible-Infected-Susceptible (SIS) model, if the infection rate along each link is β, while the cure rate for each node is δ, then the effective spreading rate of the virus can be defined as β/δ. The epidemic threshold can be defined as follows: for effective spreading rates below τ the virus contamination in the network dies out, while for effective spreading rates above τ, the virus is prevalent, i.e. a persisting fraction of nodes remains infected. It was shown in [8] that τ=1/λ1. The value of λ1 is upper bounded by the maximum (out) nodal degree.
For non-bidirectional topologies, it is computed from the symmetric adjacency matrix, defined as A'nn=max{Ann, ATnn}, where the max operator is applied element-wise.
This ratio is essentially the quotient between the number of distinct eigenvalues (obtained from the adjacency matrix, or symmetric adjacency matrix for non-bidirectional topologies) of the network and the network diameter. Therefore, on high-symmetry networks, with symmetry values between 1 and 3 (with the network diameter measured in hops), the impact of losing a node does not depend on which node is lost, what means that network performs equally in response to a random (SR) or a target attack (ST).