Today i got a call from my friend ( to whom I shall henceforth be referring to as Sancho Panda) with the following problem: One of the switches had a consistently high CPU usage caused by HULC DAI process. The software upgrade that I recommended earlier didn’t help so it had to be something else.
It turned out to be one of those great design ideas.
The client has a typical star topology consisting of 2960g, 2960s and 2960c switches. The switches have about 150 vlans each + management vlan 999. The spanning tree mode is RSTP.
You might point out that this is already a bad design because some switches may not support this number of spanning tree instances. Yikes. And the management vlan is 999 so it is one of those vlans without a spanning tree.
Now comes another engineer with a brilliant idea: a star topology is not redundant, so he connects two spokes with a redundant connection.
Unfortunately, the hub of this star topology is a 2960g which doesn’t support that many spanning tree instances
The moment we tried to troubleshoot this and remoted into the 2960 hub, we caused a loop in layer 2 and caused an outage of the customer site.
The irony of all this is because the only vlan that causes this loop is management vlan, had we not tried to troubleshoot the relatively (80-90%) cpu usage on a spoke, we wouldn’t have caused an outage of the whole site.
Ouch ouch. Thanks Sancho!