BGP dampening – how not to use maths

Hello

Today I had a bgp dampening scenario and the task was to calculate what parameters should be used if a route should be reused after 5 minutes.

By default, a flap adds 1000 to the penalty, while the half-life time reduces the penalty by 50%.

Therefore, two flaps will result in a 2000 penalty, which will be reduced by 50% after the half-life time. But what if the reuse threshold is not exactly 50% but less: how will we calculate when the route will be reused?

There is a complex algorithm for that where we can use the logarithmic function to calculate this, but frankly: who will remember this during the exam?

I have a better idea: how about a simple command to show how long until the route is reused?

R3#show ip bgp dampening dampened-paths
BGP table version is 16, local router ID is 150.1.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network         From        Reuse     Path
*d 1.1.1.0/24 155.1.13.1 00:00:50 100 i

So now:

  1. add the command neighbor x.x.x.x advertise-interval 0 on router A
  2. Use the bgp dampening command (which by the way creates a handy log if you are debugging: BGP(0): Created dampening structures with halflife time 4, reuse/suppress 750/2000)
  3. Flap the routes that you want to advertise to router B (e.g. shut/unshut loopbacks)
  4. Use the command above to see how long until the route is reused.

3#show ip bgp dampening dampened-paths
BGP table version is 32, local router ID is 150.1.3.3
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network From Reuse Path
*d 1.1.1.0/24 155.1.13.1 00:05:30 100 i
R3#
EvD: accum. penalty decayed to 1971 after 7 second(s)

We can see that with the half-life time of 4 minutes, suppress penalty of 2000 and reuse threshold of 750, the reuse time is 5minutes 30 seconds.

Obviously, if our goal is e.g. 4 minutes, then the reuse threshold should be 1000.

If our goal is e.g. 6 minutes, how about simply raising the half life time to 6 minutes and setting reuse to 1000?

What is the conclusion? There’s no point in using maths if you can just be clever.

But wait! there’s more! Let’s have a look at how the penalty actually decays: Linearly? i don’t think so…

In this example we turned on debugging and flapped the link 3 times on the other router so the penalty has accumulated to 3000.

BGP(0): flapped 3 times since 00:00:04. New penalty is 3000
EvD: accum. penalty 2956, now suppressed with a reuse intervals of 47
EvD: accum. penalty decayed to 2671 after 35 second(s)        !!! 329 decayed in 35 seconds = 9.3points/sec
EvD: accum. penalty decayed to 2379 after 40 second(s)       !!! 292 points in 40 seconds = 7.27 points/sec
EvD: accum. penalty decayed to 1971 after 67 second(s)    !!! 408 points in 67 seconds = 6.1 pts/sec
EvD: accum. penalty decayed to 1413 after 115 second(s)
EvD: accum. penalty decayed to 1043 after 105 second(s)
EvD: accum. penalty decayed to 864 after 68 second(s)
EvD: accum. penalty decayed to 851 after 6 second(s)
EvD: accum. penalty decayed to 747 after 49 second(s)     !!!104 points in 49 seconds!!!=2pts/sec

Half-life time duly occurred after 4 minutes, but in this case the reuse happened after 483 seconds (>8 minutes).

As you can see, after 35+40+67 = 142 seconds we have 1971 points BUT

after 35+40+67+115+105+68+6+49 = 485 seconds we have still 747 points

So clearly the decay is not linear! Why is that? Of course, to give BGP a longer ”flap memory”, which increases the likelihood that a more reliable path will be kept. In this situation it was enough for the route to flap again once after about 482 seconds and the other route would still be used because the reuse for this path would occur after 483 seconds.

What is the second conclusion, then? If the total penalty was 3000, 1500 happened after 4 minutes, 750 will NOT be reached after 6 minutes which would be the case if the decay was linear. If you don’t want to use maths to calculate the reuse time, it is clearly the best idea to use the show ip bgp dampening dampened-paths command.

 

 

Skomentuj

Wprowadź swoje dane lub kliknij jedną z tych ikon, aby się zalogować:

Logo WordPress.com

Komentujesz korzystając z konta WordPress.com. Wyloguj /  Zmień )

Zdjęcie na Google

Komentujesz korzystając z konta Google. Wyloguj /  Zmień )

Zdjęcie z Twittera

Komentujesz korzystając z konta Twitter. Wyloguj /  Zmień )

Zdjęcie na Facebooku

Komentujesz korzystając z konta Facebook. Wyloguj /  Zmień )

Połączenie z %s