Elsevier

Computer Networks

Volume 166, 15 January 2020, 106989
Computer Networks

Modeling and analysis of robust service composition for network functions virtualization

https://doi.org/10.1016/j.comnet.2019.106989Get rights and content

Abstract

Fault tolerance is critical for constructing a reliable service in Network Functions Virtualization (NFV). In this paper, we propose novel models and algorithms that provide the resilience of NFV services from multiple node and link failures. We first design an optimization model and the PAR protection algorithm that can efficiently protect an NFV service demand from network failures without any action from a controller due to the diversity of flow assignment. We then develop an optimization model for total demand protection with a guarantee of recovering the whole demand volume. Further, a new restoration algorithm, namely UNIT, is proposed for the design of large survivable NFV-based networks with the recovery of the affected bandwidth under the uncertainty of multiple network failures. We analytically prove the performance guarantee of UNIT in comparison with the optimal static solution. The results of our experimental study in a Mininet-based environment with the Ryu controller show that a combination of PAR and UNIT efficiently protects NFV-based networks from failures in terms of both resource restoration and recovery time.

Introduction

To meet the ever increasing demands on new services, network service providers must continuously install, operate and perform maintenance on new physical equipments. It results in significant drawbacks on the launch of new services in terms of investment and operation costs. Network Functions Virtualization (NFV) is a promising approach to network architectures that enables the providers to deploy and manage new network services with agility. In an NFV-capable system, network functions (e.g., routing, firewall, deep-packet inspection) are migrated from traditional hardware based appliances to a virtualized pool of resources deployed in the cloud. NFV allows network service providers to dynamically consolidate an ordered list of virtual network functions (VNF), called service function chaining (SFC), for providing a network service in a fast and flexible manner. In such agile systems, it is critical to ensure service continuity, rather than focusing on platform availability. In particular, node or link failures should not result in service disruption. Gill et al. show that link failures that occur frequently and are disruptive in datacenters can decrease the volume of traffic delivered by up to 40% [1]. In addition, due to strict performance requirements of network functions, a short time interruption in these systems may cause severe degradation in network service quality. Therefore, a desirable goal is to design mechanisms of service composition for NFV that can be resilient to an inevitable reality of failures [2].

To ensure network resilience upon network failures, the spare capacity must be pre-allocated in the network. The design problem of a resilient network is to optimize the spare capacity and flow assignment for a given survivability requirement. Approaches used in survivable networks can be roughly classified into two categories: protection and restoration. A protection mechanism pre-calculates backup paths before a failure occurs while restoration refers to mechanisms that start to search for alternate paths after a failure. A restoration mechanism provides a better solution in terms of resource consumption. However, it requires more sophisticated signalling schemes, and hence the recovery time is higher. In this paper, we consider a combination of protection and restoration for the robust service composition in NFV-enable networks. The first question is to develop an efficient protection mechanism under which a service demand can rapidly survive on node or link failures without any action from the controller. For a critical service, an NFV provider needs to design a solution that can ensure that the whole bandwidth volume of a service demand is able to be restored. Hence, it raises a challenging question of finding the optimal spare capacity for protecting the whole demand volume from failures. Finally, in order to provide a practical restoration solution, it is necessary to study efficient use of shared capacity for restoration, taking into account the uncertainty of multiple node and link failures in the NFV system. We address such questions as an important part of a fault tolerant network design.

The major contributions of this paper are as follows:

  • We develop an optimization model for an efficient protection mechanism based on the concept of partially disjoint paths in the flow assignment for a service demand in NFV with service function chaining. In case of failures, an amount of bandwidth required by a demand will survive without any action from a network controller due to the diversity of flow assignment. We propose a novel algorithm, namely PAR, which provides an efficient solution of the protection mechanism for a large-scale NFV network.

  • We extend our optimization model to investigate the optimal spare capacity for 100% demand protection with a guarantee of recovering the total demand volume of services. It repairs individual flows by using both shared spare capacity and capacity released by the failed flows. The computation of optimal spare capacity is not limited to single node or single link failures.

  • We propose a novel restoration algorithm, namely UNIT, to provide an online solution of restoration with respect to multiple link and node failures that arrive one-by-one without knowledge of future arrivals in a large-scale NFV network. The performance guarantee of UNIT is analytically proved in comparison with the optimal static solution.

  • We complement our theoretical analysis with numerical results and experiments in Mininet [3] with the Ryu controller [4]. We observe that our protection approach based on partially disjoint paths outperforms the traditional protection approach based on fully disjoint paths. The results under both generated network topologies and real world datasets show that a combination of PAR and UNIT provides fast restoration of service after link and node failures.

In summary, we develop both optimization models and efficient algorithms for NFV resilience upon multiple node and link failures. The effectiveness of the proposed schemes is demonstrated by theoretical analyses and extensive simulations. Our models and algorithms enable service providers to quickly restore the whole affected traffic upon network failures occurred along a service path, which is particularly important in NFV. To the best of our knowledge, this is the first paper that provides a systematic study, with proven guarantees, to the problem of robust service composition for NFV with regard to the uncertainty of failures. In addition, the paper tackles several practical factors on the optimal resource allocation for protection and restoration including multiple link and node failures, the dynamics of a link metric system, and the considerations of a fairness criterion.

The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 describes the model and the problem of designing service composition in NFV that is robust to network failures. Section 4 presents our optimization model (PT-O) and algorithm (PAR) for the problem of protection using partially disjoint paths. We provide our optimization model (RT-O) and algorithm (UNIT) for the problem of restoration under the uncertainty of failures in Section 5. Section 6 presents the evaluation of our models and algorithms upon network failures in both generated network topology and real datasets. We conclude the paper in Section 7.

Section snippets

Related work

NFV has attracted significant attention in recent years as it offers a flexible and agile way to design the networks by leveraging virtualization technology. With the rapid development of NFV, many issues have been studied such as the NFV architecture, resource management and orchestration, and NFV resilience [5], [6]. We refer the readers to Mijumbi’s article for a comprehensive survey on the research results and challenges in NFV [7]. NFV standardization activities have been supported by the

Model and problem statement

We represent an NFV system by a directed graph G=(V,E) including a set of nodes V=V1V2 and a set of links E=E1E2. V1 is a set of nodes in NFV infrastructure (NFVI) (i.e., NFVI nodes). V2 is a set of end nodes. E1 is a set of NFVI links among NFVI nodes. E2 is a set of NFVI links between an NFVI node and an end node. We will assume that traffic starts and terminates at end nodes and there is no directed link connected between end nodes. An NFVI node is a standard high volume server that can

Joint VNF placement and routing for protection

Our objective is to find the optimal VNF placement and routing that provides failure protection by path diversity for maximizing the system performance while fulfilling a fairness criterion under the dynamics of the link metric vector. Particularly, we consider minimizing the maximum link and node utilization, and maximizing demand traffic. In addition, we employ the max-min fairness criterion in bandwidth allocation for a service demand.

We formulate the PT problem as a Mixed Integer Linear

Optimal spare capacity requirement

In this section, we develop further the model of path diversity to design an NFV system that is able to protect the entire demand volume against multiple failures at links and nodes.

In a network G=(V=V1V2,E=E1E2), we consider a set of failure scenarios at NFVI nodes v ∈ V1 and NFVI links e ∈ E1. Each of NFVI components including nodes and links has two states: normal state 0 and failure state 1. We denote by S the set of NFVI states. Let s=0 be the normal NFVI operating state (no failure).

Evaluation

In this section, we present the scenarios and parameters setting in our evaluation using both synthetic topologies and real world datasets. We also discuss the implementation of our protection and restoration solutions in a virtual network using the OpenFlow protocol to connect and configure the network devices. We then analyze the robustness of service composition with the proposed models and algorithms for handling link and node failures in NFV.

Conclusion

In this paper, we studied a robust service composition scheme for NFV using multiple paths to protect service demands from network failures. We formulated the optimization problem as a MILP model. Our optimization models capture the essential aspects of network resilience for NFV, including the failure protection scheme, multiple routing, fairness condition, VNF placement and service function chaining. We developed the PAR algorithm that provides a protection solution close to the optimal

Declaration of competing interest

None.

Acknowledgments

A substantial part of this work was done while T.-M. Pham was visiting the Paris 6 Computer Science Laboratory (LIP6), Sorbonne University, France.

Tuan-Minh Pham received the Ph.D. degree in computer science from University Pierre et Marie Curie, France, in 2011. He was a Visiting Scientist with Pennsylvania State University and University Pierre et Marie Curie, in 2013 and 2017, respectively. He currently holds a faculty position with the Computer Science Department, Phenikaa University. His research interests include the future Internet architectures, the modeling and analysis of networked systems, and network measurement for protocol

References (38)

  • P. Gill, N. Jain, N. Nagappan, Understanding network failures in data centers: Measurement, analysis, and implications,...
  • ETSI, Network Functions Virtualisation: Resiliency requirements, GS NFV-REL 001 V1.1.1,...
  • Mininet,...
  • Ryu SDN framework,...
  • Network Function Virtualization research group,...
  • ETSI, Network Functions Virtualisation: Architectural framework, GS NFV 002 V1.2.1,...
  • R. Mijumbi et al.

    Network function virtualization: state-of-the-art and research challenges

    IEEE Commun. Surveys Tuts.

    (2016)
  • ETSI, Network Functions Virtualisation: Infrastructure overview, GS NFV-INF 001 V1.1.1,...
  • ETSI, Network Functions Virtualisation: Use cases, GS NFV 001 V1.1.1,...
  • C. Bernardos, A. Rahman, J. Zuniga, L. Contreras, P. Aranda, P. Lynch, Network virtualization research challenges,...
  • ETSI, Network Functions Virtualisation; reliability; report on models and features for end-to-end reliability, GS...
  • Y. Xiong et al.

    Restoration strategies and spare capacity requirements in self-healing ATM networks

    IEEE/ACM Trans. Netw.

    (1999)
  • M. Kodialam, T.V. Lakshman, Dynamic routing of bandwidth guaranteed tunnels with restoration, in: Proc. INFOCOM 2000,...
  • Y. Wang et al.

    R3: Resilient routing reconfiguration

    SIGCOMM Comput. Commun. Rev.

    (2010)
  • S. Cho et al.

    Independent directed acyclic graphs for resilient multipath routing

    IEEE/ACM Trans. Netw.

    (2012)
  • B. Yang, J. Liu, S. Shenker, J. Li, K. Zheng, Keep forwarding: Towards k-link failure resilient routing, in: Proc....
  • T. Nguyen, S. Fdida, T. Pham, A comprehensive resource management and placement for network function virtualization,...
  • R. Soulé, S. Basu, P.J. Marandi, F. Pedone, R. Kleinberg, E.G. Sirer, N. Foster, Merlin: A language for provisioning...
  • V. Eramo et al.

    An approach for service function chain routing and virtual function network instance migration in network function virtualization architectures

    IEEE/ACM Trans. Netw.

    (2017)
  • Cited by (0)

    Tuan-Minh Pham received the Ph.D. degree in computer science from University Pierre et Marie Curie, France, in 2011. He was a Visiting Scientist with Pennsylvania State University and University Pierre et Marie Curie, in 2013 and 2017, respectively. He currently holds a faculty position with the Computer Science Department, Phenikaa University. His research interests include the future Internet architectures, the modeling and analysis of networked systems, and network measurement for protocol evaluation. He has served as a Technical Committee Member and a Reviewer for the IEEE ICC, the IEEE CCNC, the IEEE LCN, the Computer Communications Journal (Elsevier), and the IEEE Transactions on Services Computing, among others. He was a recipient of the Best Paper Award from the IEEE International Conference on Social Computing, in 2013.

    Serge Fdida is a Professor with Sorbonne University (formally UPMC) His research interests are related to the future internet technology and architecture. He has been leading many research projects in Future Networking in France and Europe, notably pioneering the European activity on federated Internet testbeds. He is currently leading the Equipex FIT, a large-scale testbed on the Future Internet of Things. Serge Fdida has published numerous scientific papers, in addition to a few patents and one RFC. He is a Distinguished ACM Member and an IEEE Senior member. Serge Fdida has also developed a strong experience related to innovation and industry transfer. He was the co-founder of the Qosmos company, and one of the active contributor to the creation of the Cap Digital cluster in Paris. He has been appointed Vice-President Europe and International affairs at UPMC from 2011 to 2017.

    Thi-Thuy-Lien Nguyen received her Bachelor degree and Master degree in Computer Science from the Hanoi University of Technology in 2011 and 2014, respectively. She has been a lecturer with the Faculty of Information Technology at Hanoi National University of Education since 2012. She is currently a Ph.D. candidate at the Computer Science Department at University of Engineering and Technology, Vietnam. Her research interests include performance evaluation of computer networks and optimization problems in network functions virtualization.

    Hoai-Nam Chu received the bachelor’s degree in electrical and electronics engineering from the Hanoi University of Science and Technology, in 2009, and the master’s degree in information technology from University of Canberra, Australia, in 2014. He has been a Lecturer with the Faculty of Electrical and Electronic Engineering, University of Transport and Communications, since 2009. His research interests include network performance analysis and NFV/SDN.

    View full text