Earlier this month, an academic experiment studying the impact of newly released security features for the Border Gateway Protocol (BGP) went horribly wrong and crashed a bunch of Linux-based internet routers.
The experiment, organized by academics from all over the world, was first announced last year in mid-December and was described as “an experiment to evaluate alternatives for speeding up adoption of BGP route origin validation.”
BGP Route Origin Validation, or ROV, is a newly released standard part of a three-pronged security pack for the BGP standard, together with BGP Resource Public Key Infrastructure (RPKI) and BGP Path Validation (also known as BGPsec).
BGP BOV allows routers to use BGP RPKI information to filter out unauthorized BGP route announcements, and shut down BGP hijacks meant to re-route internet traffic from legitimate servers to bad networks.
This month’s experiment, which was the continuation of a previously released research paper that studied the adoption of BGP security features, was set to take place between January 8 and January 23.
The initial plan was for the research team to announce a BGP route “with a valid standards-compliant unassigned BGP attribute” from a network the researchers controlled, and then study how the route definition propagated across the networks of other internet service providers across the internet.
The idea was to follow how the BGP attribute move around and get an idea of vulnerable points or what ISP networks were vulnerable to internet traffic manipulation.
But on the first day of the experiment, things didn’t go as expected.
“We’ve performed the first announcement in this experiment yesterday, and, despite the announcement being compliant with BGP standards, FRR routers reset their sessions upon receiving it. Upon notice of the problem, we halted the experiments,” said Italo Cunha, a researcher at the Federal University of Minas Gerais in Brazil.
The problem, according to the researcher, was that the BGP attribute they used caused software crashes in routers running FRRouting (FRR), an IP routing protocol suite for Linux and Unix platforms.
FRR developers released a fix on January 9, and after some discussions about the ethics of continuing the experiment, researchers decided to go on ahead with another round of tests this Wednesday, on January 23.
FRR routers didn’t act up this time, but other problems arose in other places of the world. The main reason was that the experiment was only announced on the mailing list of NANOG, the North American Network Operators Group.
“You caused again a massive prefix spike/flap, and as the internet is not centered around [North America] (shock horror!) a number of operators in Asia and Australia go[t] [a]ffected by your ‘exp[e]r[i]ment’ and had no idea what was happening or why,” said a network admin for PacketGG, a company that provides various internet traffic support services, as soon as the second run of the experiment got started.
This time, the problem was caused by BGP software that couldn’t handle the BGP attribute the researchers used. The reason was that some ISPs failed to update their BGP software to the latest version, hence they couldn’t handle the custom BGP attribute.
While there are no public statistics on the number of networks affected by this second incident, the researchers didn’t wait long to shut down their BGP ROV experiment.
“We have canceled this experiment permanently,” Cunha said, 20 minutes after Cooper’s complaint.
But despite the problems caused by the test, all subsequent replies after Cunha’s cancelation urged the research team to continue digging into the adoption of BGP security features.
The reason is that BGP hijacks have been the internet’s Achille’s heel for more than two decades, and securing BGP, the protocol that binds the internet together, is a top priority for the entire networking and infosec communities.
“Stopping the experiment is only treating symptoms, the root cause must be addressed: broken software,” said Job Snijders of NTT Communications.
This experiment is also not the first time that academics have crashed portions of the internet while testing BGP features. Something similar happened in August 2010 when another experiment crashed Cisco routers across the globe.