Cisco’s troubleshooting guide for STP loop detection is worth a read:
- Some links sustain a loop, some links simply carry the loop traffic. Shutting down the non-sustaining links will not break the loop.
- The main reason why loops may occur is because BPDUs are not received properly. (The document mentions two reasons, but it, IMHO, boils down to one reason).
- Typical symptoms of a loop are – high link utilization on boxes, protocol session flaps, high CPU utilization etc.
- Finding the loop and identifying the source of the loop involves – find the highly utilized links, understanding the topology, finding redundant paths in the network, making sure some of the paths are blocked (that is finding whether the switch knows the correct STP root – knowledge may not be consistent across the system because of lack of propagation of information),
What’s more, when a forwarding loop happens, its not just the network, hypervisor hosts too get hogged. As Ivan points out, hypervisors keep their NICs in promiscuous mode, so they end up punting every packet to the CPU! Hypervisor hosts typically have their NICs in trunk mode and BUM traffic from every VLAN would come hit the hypervisor.
Just by going through the troubleshooting guide, it gets pretty clear that STP is a mess and is at best avoided. No wonder, the network world wants to move to L3 fabrics.