Incident report of Unscheduled partial network disruption 2017-12-04 10:18 - AEST
Posted by Nicholas Meredith on 04 December 2017 04:16 PM
Incident report of Unscheduled partial network disruption 2017-12-04 10:18 - AEST|
Start: 2017-12-04 10:18 AM
Finish: 2017-12-04 10:44 AM
10:18:45 AM AEST: An erroneous ingress packet storm originating from a QLD Peering IX hit our primary Brisbane router.
By 10:20:15 AM this traffic had overloaded the routing engine process, acting as an inbound DoS which caused multiple neighbouring BGP sessions to time out.
An update announcing the network disruption that had been identified was tweeted to https://twitter.com/HostNetworks at 10:27 AM.
While sessions were beginning to recover, another packet storm DoSed the routing engine processes again as our NOC were taking action to deactivate the source of the flooding.
10:30:02 AM: More aggressive packet flooding limits were applied to ensure a repeat occurrence such as this can no longer trigger a Denial of Service to routing functionality.
10:33:51 AM: NOC staff monitored as affected BGP sessions were re-establishing connectivity and routing paths were re-installing into the FIB.
10:39:36 AM: Several BGP sessions on our primary Brisbane router flapped due to the excess load which the routing engine services had been put under.
10:39:39 AM: BGP neighbor session states re-stabilised as more sessions continued to be restored.
10:44 AM: Our NOC observed all disrupted connections had now restored, as we continued to monitor to ensure stability.
10:54 AM: After 10 minutes of route stability we tweeted an update confirming normal network connectivity had been restored.
Host Networks NOC