BGP Route Leak at Angola Cables Slows Connectivity for Many Australians 

On Thursday, 25 May, we saw another BGP route leak, this time causing significant delays in connectivity from Australia to sites in the US including Amazon’s AWS services and Akamai’s content distribution network (CDN). People across Australia were either unable to access or experienced severe delays reaching some sites and services, and companies operating in affected data centers were seriously impacted in their ability to do business. 

This route leak lasted about three hours, and highlights the importance of implementing MANRS recommendations and continually working to improve routing security. In this post, we’re going to do a technical deep dive into this incident, but the bottom line is that continuously improving routing security is critical.  

The good news is that there are solutions for these kinds of incidents. One is RFC9234 – Route Leak Prevention and Detection Using Roles in UPDATE and OPEN Messages. RFC9234 introduces a proactive method to prevent route leaks by defining new capabilities called BGP Roles. These roles are exchanged during the establishment of eBGP sessions and enable BGP routers to understand the AS relationships between local and remote ASes. This understanding helps prevent the unintended spread of route leaks. Read more about RFC9234 in the MANRS blog “There is Still Hope for BGP Route Leak Prevention”, written by MANRS friend Mingwei Zhang from the Cloudflare Radar team. 

There is also another new standard in development, the Autonomous System Provider Authorization (ASPA), which if it had been implemented would have significantly lowered the effect of this route leak. 

What Is a Route Leak?

Before going into the details of this incident, let’s first remind ourselves of what a Border Gateway Protocol (BGP) route leak is, as defined in RFC7908:. 

“A route leak is the propagation of routing announcement(s) beyond their intended scope.  That is, an announcement from an Autonomous System (AS) of a learned BGP route to another AS is in violation of the intended policies of the receiver, the sender, and/or one of the ASes along the preceding AS path.” 

Route leaks can have various causes, including misconfigurations, software bugs, or human errors. When a route leak happens, incorrect BGP routing information can spread across the Internet and have significant impacts, such as: 

  • Increased Network Congestion: Incorrect route announcements can attract traffic that was not intended for the affected network, causing congestion and potential performance issues. 
  • Suboptimal Routing: The leaked routes can disrupt the normal routing hierarchy, leading to inefficient traffic paths and higher latencies. 
  • Denial of Service (DoS) Attacks: Malicious route leaks can be exploited to redirect traffic to unauthorized destinations, potentially facilitating DoS attacks or unauthorized monitoring. 
  • Internet Instability: Large-scale route leaks can propagate widely, affecting multiple networks and disrupting the stability and resilience of the global Internet routing system. 

Mistakes happen on the Internet, just as in any complex system. The Internet is a vast and interconnected network of more than 77,000 networks comprised of numerous devices, governed by different policies and procedures and operated by humans. 

Today’s Incident

Around 8:30 PM Australian Eastern time (10:30 UTC), complaints started appearing in a local technical chat group that many were facing high latency toward US-East coast networks, predominantly toward AS20940 (Akamai) and AS16509 (Amazon – AWS), but also toward many other networks. Many people jumped in to investigate, and an hour later Jarryd Sullivan shared some details on Twitter.  

Tweet from Jarryd Sullivan - "Possible BGP hijack of various Amazon (mostly US East) IP ranges by Angola Cables (AS37468). Snipped AS Path below, impacting service connectivity from Australia (and probably elsewhere). 🧵1/2BGP routing table entry for 44.192.0.0/11
37468 3356 16509 14618"

By then it was obvious that AS37468 (Angola Cables) was leaking routes via an IX Route Server, and it was accepted by many peers and further propagated around the world. Here is the snapshot of what was visible on the global routing table: 

A route to Akamai’s AS20940, which normally reaches my route collector in Sydney (at Vultr): 

N*> 2.16.0.0/13      169.254.169.254     0 64515 65534 20473 6939 4826 20940 i 

Was coming through AS37468 (Angola Cables), a shorter path, which is why it was accepted: 

N*> 2.16.0.0/13      169.254.169.254     0 64515 65534 20473 37468 20940 i 

Looking at the timestamp of these routes using the pybgpkit tool: 

2023-05-25T10:13:09Z    2.16.0.0/13     8218 6939 4826 20940 
2023-05-25T11:01:42Z    2.16.0.0/13     24482 4826 20940 
2023-05-25T11:00:19Z    2.16.0.0/13     20932 6939 4826 20940 
2023-05-25T11:05:11Z    2.16.0.0/13     28634 6939 4826 20940 
2023-05-25T11:12:01Z    2.16.0.0/13     202032 6939 4826 20940 
2023-05-25T11:14:16Z    2.16.0.0/13     202032 6939 4826 20940 
2023-05-25T11:18:05Z    2.16.0.0/13     202032 6939 4826 20940 
2023-05-25T11:21:40Z    2.16.0.0/13     61292 37468 20940 
2023-05-25T11:21:41Z    2.16.0.0/13     37989 37468 20940 

At 11:21 UTC, the route changed its path and started coming via AS37468. Cloudflare Radar and RIPE Stat also picked up unusually high BGP announcements from AS37468: 

Graph: BGP Announcements for AS37468 (ANGOLA-CABLES) showing spike in announcements at 11UTC.
Graph: AS37468 announcements spike from 11:40UTC to 11:50UTC

Looking again at the log of announcements for 2.16.0.0/13, we can see that the path with AS37468 in it disappeared after 12:41 UTC. This is the final announcement captured by one of the RIPE RIS nodes. 

2023-05-25T12:41:31Z    2.16.0.0/13     133210 4826 20940 
2023-05-25T12:41:31Z    2.16.0.0/13     49544 4826 20940 
2023-05-25T12:41:32Z    2.16.0.0/13     138064 4826 20940 
2023-05-25T12:41:36Z    2.16.0.0/13     45489 37468 20940 
2023-05-25T12:43:36Z    2.16.0.0/13     45489 4826 20940 

This route leak lasted around three hours. If you look at the AS Path with “37468 3356” you will get some interesting results. There are hundreds of IPv6 routes that are propagated by AS37468, such as these:  

2023-05-25T01:56:45Z 2001:502:be98::/48 49544 13786 265038 265038 37468 3356 2914 7342 
2023-05-25T01:56:45Z 2001:503:f189::/48 49544 13786 265038 265038 37468 3356 2914 7342 
2023-05-25T01:56:45Z 2001:500:ed30::/48 49544 13786 265038 265038 37468 3356 2914 7342 
2023-05-25T01:56:45Z 2001:503:3227::/48 49544 13786 265038 265038 37468 3356 2914 7342 
2023-05-25T01:56:45Z 2001:503:7bbf::/48 49544 13786 265038 265038 37468 3356 2914 7342 
2023-05-25T01:56:45Z 2001:503:f3da::/48 49544 13786 265038 265038 37468 3356 2914 7342 
2023-05-25T01:56:45Z 2001:503:ff39::/48 49544 13786 265038 265038 37468 3356 2914 7342 
2023-05-25T01:56:45Z 2001:503:4872::/48 49544 13786 265038 265038 37468 3356 2914 7342 

Angola Cables (AS37468) peers extensively across the world. As per PeeringDB data, they are peering at 21 IX locations. As Doug Madory (Director of Internet Analysis at Kentik) rightly said in this Twitter thread:  

Tweet from Doug Madory: Part of the challenge (unique to AS37468) is that they running a network which spans the South Atlantic.Routing traffic from AO to US via BR is very tricky. Likewise, EU to BR via AO. This leads to path leaks because it is hard to guarantee where routes propagate.

A representative from the Angola Cables engineering team indicated this was an automation error. For more details, refer to the NANOG mailing list.

How Could This Route Leak Have Been Avoided?

If you wonder whether Resource Key Public Infrastructure (RPKI) and the use of Route Origin Authorizations (ROAs) could help, unfortunately no, in this case they wouldn’t. If you look at this data:  

V*> 3.163.234.0/23   169.254.169.254  0 64515 65534 20473 37468 16509 i 
V*> 5.8.25.0/24      169.254.169.254  0 64515 65534 20473 37468 199524 ? 
V*> 5.8.33.0/24      169.254.169.254  0 64515 65534 20473 37468 199524 202422 i 
V*> 5.9.0.0/16       169.254.169.254  0 64515 65534 20473 37468 24940 24940 i 
V*> 5.39.0.0/17      169.254.169.254  0 64515 65534 20473 37468 16276 i 

All the above routes have ‘Valid’ Route Origin Authorizations (ROAs). A ROA is a cryptographic object in the RPKI system that associates an IP address prefix with the Autonomous System (AS) that is authorized to announce that prefix. In this leak, the Origin AS remained the same and so all of the routes were valid according to the RPKI. It’s wonderful that these routes have Valid ROAs as that can prevent many other types of routing problems, but in this case they didn’t do anything to help. 

However, in this scenario a newer technology called Autonomous System Provider Authorization (ASPA) would have helped. ASPA objects are created and distributed the same way as ROAs. While ROAs state which ASNs are authorized to announce given prefixes, ASPAs state which ASNs are allowed to propagate their routes. If you want to understand more about ASPA, read this MANRS blog post explaining how ASPA can protect from route leaks. Other than ASPA, a prefix limit would have avoided thousands of routes coming from a peer that normally only announce few hundred.  

Determining the nature and intent behind an incident requires careful analysis, investigation, and evidence gathering. It is essential to maintain a balanced perspective when dealing with such incidents, recognizing that while non-malicious incidents are common, malicious incidents also pose significant risk. This incident can be categorized as non-malicious unless proven otherwise.  

MANRS emphasizes the importance of continuously improving routing security. MANRS provides a framework and set of guidelines for network operators to enhance their routing security practices and collaborate toward a safer and more resilient Internet. MANRS is technology agnostic, and fully supports all technologies that can help mitigate route misorigination and route leaks. We strongly recommend all network operators not only create ROAs for all their address space but also move toward validating those routes using Route Origin Validation (ROV). Once ASPA becomes a standard (perhaps soon), we will encourage operators to document their BGP relationship, which can then be verified.  

Within the MANRS framework, network operators are encouraged to implement measures such as filtering, anti-spoofing, coordination, and global validation. Improving routing security is an ongoing effort and not a one-time task. Learn more about MANRS and join the community today. 

Leave a Comment