Network Working Group Joseph Internet Draft (Juniper Networks) Intended Status: Proposed Standard August 19, 2009 Expiration Date: February 2010 Experience with rsvp-te p2mp based mvpn draft-joseph-p2mp-mvpn-experience-00.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright and License Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Joseph [Page 1] Internet Draft draft-joseph-p2mp-mvpn-experience-00.txt August 2009 Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Abstract Multicast based VPNs have been deployed for a while now, based on the Draft Rosen solution. In today's scenario, network deployments are moving towards a choice Label Switched Multicast, primarily to garner some of the advantages that a Label Switched Network can offer. In short, the requirement is to achieve more optimal multicast replication and in other words achieve better and effective bandwidth savings. This document describes some of the experiences gained from the implementation and deployment of Label Switched multicast using the RSVP-TE P2MP Label Switched Path approach, and such is information only. The intent is to translate the experiences gained into valuable practices for the Service Provider and Enterprise community who intend to deploy this class of mVPNs. Information based on "Hierarchical Multicast Trees" and "Aggregated P Tunnels" have not been included, and will follow in the next version of this draft. Specification of Requirements The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1. Introduction Draft Rosen based mVPN deployments did not impose any mVPN awareness on the Provider routers, since it used overlay GRE based PIM siganlling between PE routers in order to build indiviual and customer specific Multicast Domains. Going back into a little bit of history, RFC4364 originally RFC 2547bis (and draft-ietf-l3vpn-rfc2547bis), has not described a mechanism to provide multicast signaling and multicast data delivery through the service provider network for Layer 3 VPN service. Thus, a number of solutions were discussed and various architectures were implemented based on a given vendor. Currently, the Layer 3 VPN working group has two drafts in draft-ietf-l3vpn-2547bis-mcast-07.txt (we use the term [LSM-MVPN-DP] to refer to this draft and draft-ietf-l3vpn-2547bis-mcast-bgp-05.txt (We use the term [LSM-MVPN-CP] to refer to this draft. The draft-ietf-l3vpn-2547bis-mcast document is a superset of previous solutions and provides many approaches as options without mandating any of the them. In other words it forms a framework for a more detailed specification of each option. One of the options available is RSVP-TE. So when we refer to the term "LSM-MVPN-DP", we are actually referring to the option of using RSVP-TE. The draft "ietf-l3vpn-2547bis-mcast-bgp-05.txt" defines the control plane signalling using MP-BGP for exchanging Customer Multicast routes within the Provider network. In this document, we describe some of the lessons learnt from implementing and deploying Multicast VPN using the P2MP RSVP-TE approach. We hope it will benefit service providers and network operators looking to deploy this mVPN services based on this approach. Joseph [Page 2] Internet Draft draft-joseph-p2mp-mvpn-experience-00.txt August 2009 2. Implementation As of writing, there are two known implementations in IOS-XR from Cisco and JunOS from Juniper Networks. Contact these vendors for implementation details beyond what is provided in this draft. In the following sections, we describe some of lessons we learnt during implementation and early interoperability testing stage. 3. ASM vs. SSM One of the most common questions asked or assessed while deploying mVPN, especially while deploying Draft Rosen mVPNs. In this case, the SSM Model offers two options namely "I-PMSI" which indeed replicates the ASM model to a large extent, followed by the "S-PMSI" model which is purely source based. In other words, the S-PMSI model ensures that traffic only reaches PE routers for a given mVPN that has interested receivers, which is indicated by the Receiver PE using a Type 7 Multicast route advertisement. The I-PMSI model is similar to the ASM approach as mentioned earlier in this section - as multicast traffic is sent to all PEs within a given mVPN. However Receiver PEs that do not have interested receivers drop the traffic at that point. So in reality the choice here can directly rule out the ASM Model, and instead focus on the two SSM alternates available. Before we answer this question, let us look at the following sections for certain considerations first. 4. RSVP Scalability One of the points to be considered is RSVP Scalability. Looking at a scenario where we have 1 Source PE and 3 Receiver PEs physically connected via a single Provider router, we would have the following: Based on the BGP A-D received from the leaf nodes (Receiver PEs), the Source PE signals RSVP-PATH messages to each Leaf Node, which is reciprocated with RSVP-RESV messages. In this case, we are talking about 6 PATH/RESV messages and state for 3 LSPs in total. The number of messages increases at the Provider router, which is the point of replication for the Sub-LSPs to each Lead Node, as it creates a P2MP LSP which results in 2 PATH/RESV message to each leaf node. Therefore the core router forwards/receives 12 PATH/RESV messages and maintains state for 3 LSPs in total. Joseph [Page 3] Internet Draft draft-joseph-p2mp-mvpn-experience-00.txt August 2009 5. BGP Scalability It is recommended to use BGP Route Reflectors for scalability reasons. It is very common to see operators deploy dedicated Reflectors per address family to simplify administrations, and operations and also to provide scalability. Therefore it may be a good idea to look at deploying Route Reflectors per address family. Certain hardware vendors do offer the capability of dedicating Routing Engines/Route Processors within a single chassis to serve each address family. This could also be an option, since it would offer better consolidation, and provide a cost effective solution. 6. Resiliency and Convergence within the Provider Core At the time of writing this document, RSVP-TE supports Fast Re-route Facility backup/Link Protection for sub-second convergence. Therefore traffic from a failed link can be switched over to a bypass path even before the Routing Protocol process is aware of the failure. In addition, we recommend tuning the IGP to achieve Fast Convergence, in order for the local PE router to be aware of a remote link failure. IGP fast convergence only has an impact on the time taken for the headend router to signal a failed LSP over an alternate path and therefore does not directly have a bearing on the switchover time for RSVP-TE. The most common IGP parameters tuned for quick convergence are; initial delay for generating Link State updates upon a failure - which should be set to take effect immediately, and SPF calculation which should also be set to the best minimum possible as per the hardware vendor's recommendation. Node failures in certain cases caused recovery times to increase upto 180 seconds, and the use of BFD in the context of an LSP reduced the convergence times, and is recommended. It is also important to quickly detect a failure. It has been found that - interface dampening, at times can result in the physical status of the link being suppressed. This would have an impact on the convergence. Certain hardware has default timers set for Gigabit Ethernet interfaces which are referred as "Carrier-Delay". It is recommended that this value be set to "0", in order to enforce an immediate link-down trigger in the event of a link failure. Both Juniper and Cisco support BGP Fast convergence features that avoid long and cumbersome re-writes of the FIB tables in the event of a path to a given Next-Hop changes. Therefore no special or additional configuration is required. Joseph [Page 4] Internet Draft draft-joseph-p2mp-mvpn-experience-00.txt August 2009 7. Resiliency End-to-End Assuming there is more than one Source available for a given group. Two options have been evaluated. The first is to use Anycast RP for two multicast sources for instance, and the address with the longest prefix match automatically becomes the preferred primary source. The second approach is using separate unicast IP addresses - let us assume we are using two sources. In this case, two separate PIM joins are sent to both the sources (Let us say PIM joins are sent to both S1 and S2). Therefore we may have (S1,G1) and (S2,G2). It is also possible to have a single source multi-homed to the ingress PE and achieve link level resilliency at the sending site (between the CE and PE). In this case, something similar to the illustration provided below was tested. Two streams are delivered to the Leaf PE, however using two VRFs: +-----------+ +-----------+ | in PE | | LeafPE | __|__,-----. | ,-------.| ,-----.__|_____ ,---. / | ( VRF1 )..( M-VPN1 ... VRF1 ) | \ ,---. / \/ | `-----' | `-------'| `-----' | \/ Dest\ (SOURCE | | | | | ( CE ) \ /\___|__,-----. | ,-------.| ,-----. | /\ / `---' | ( VRF2 )... M-VPN2 ... VRF2 )_|_____/ `---' | `-----' | `-------'| `-----' | +-----------+ +-----------+ The former is subject to network convergence, which includes IGP, and BGP and RSVP (in case S-PMSI) is used. The reason here is that [LSM-mVPN-DP] chooses a single UMH (Upstream Multicast Hop) for forwarding. The later adds more burden on the Provider network, since two copies are being simultaneously forwarded in the network for the same set of groups. In other words one copy is redundant. However failover in this case, is much quicker than the first option - since the application can re-converge to the redundant/second copy in the event of the primary copy being un-available. Moreover there is no re-synchronization at any control plane protocols needed to achieve failover. Many operators can choose option-1 for standard service offerings, and option-2 may be offered as a premium - for the simple fact that provider bandwidth usage is being doubled. Joseph [Page 5] Internet Draft draft-joseph-p2mp-mvpn-experience-00.txt August 2009 8. Recommendations for the Type of Source Trees In the section above we discussed some of the details in regard to convergence and scalability, and now let us address some recommendations based on experience. The goal of a network designer evolves around achieving an acceptable and optimal level of multicast forwarding in the provider core. By virtue of being acceptable, firstly we recommend that as far as possible traffic should only be forwarded to PEs that have interested receivers for a given group. Secondly while in the backbone, No more than one copy a packet traverses any link. Thirdly, if bandwidth utilization needs to be optimized within the backbone, then a minimum cost tree should be followed rather than a shortest path tree. The number of states in a core network is proportional to the following: + Inclusive P-tunnel: number of VRF members of all MVPNs in the network. + Selective P-tunnels: total Number of multicast flows ((C-S, C-G)) of all MVPNs. It is found that I-PMSI tunnels work well for most implementations, and can tuned with a threshold rate for multicast traffic flows - which will force traffic to switch to a S-PMSI tunnels. We have found that if the ratio between number of member PEs vs. number of receiver PEs for a given mVPN is low - it is more than obvious than the I-PMSI tunnel approach would be recommended, since Selective Tunnels do not help in bandwidth savings. This will have a saving on the state information, since lesser trees are used in the provider network. We have seen convergence being directly proportionate to the amount of state information the network carries. S-PMSI tunnels can be used within an mVPN on a per group basis, for high bandwidth sessions on an individual basis - if needed. This could be a typical requirement for an operator for whom bandwidth savings is more important than scalability. Joseph [Page 6] Internet Draft draft-joseph-p2mp-mvpn-experience-00.txt August 2009 9. PHP The egress PE device does not maintain any multicast state information with the core, similar to the mode used in Draft Rosen. There is no PIM relationship, nor any GRE tunnels with the Provider and Provider Edge devices. The only binding fact, is the MPLS Multicast Tree and its respective label association (which is based on a combination of incoming interface and label space). Therefore, the packet needs to have the labels intact in order to associate the respective multicast tree and egress interface (CE Facing). Therefore PHP is not used in this case. This typically is the default behavior and no additional configuration should be required to enable this. 10. QoS Classification and queuing is based on MPLS EXP bits. The recommendation is to have multicast traffic not subject to early drops such as WRED/RED and provide temporal buffering within the queue that carries multicast traffic, in order to ensure that the latency/jitter is predictable. 11. MTU No special requirements for handling Label Switched mVPN traffic. However it is recommended to have the largest MTU value supported on the entire network, set. 12. IANA Considerations This document introduces no new IANA Considerations. 13. Security Considerations This document introduces no new Security Considerations. 14. Acknowledgements 15. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [MVPN] E. Rosen, R. Aggarwal [Editors], "Multicast in MPLS/BGP IP VPNs", draft-ietf-l3vpn-2547bis-mcast, work in progress [MVPN-BGP], R. Aggarwal, E. Rosen, T. Morin, Y. Rekhter, "BGP Encodings for Multicast in MPLS/BGP IP VPNs", draft-ietf- l3vpn-2547bis-mcast-bgp, work in progress Joseph [Page 7] Internet Draft draft-joseph-p2mp-mvpn-experience-00.txt August 2009 16. Non-normative References 17. Author Information Joseph Juniper Networks, Inc. e-mail: vjoseph@juniper.net Joseph [Page 8]