switch egress qos troubleshooting

Hello

I had an interesting case today where clients were having problems with Lync calls.

On nearly all interfaces in a 2960x stack I could see a lot of output drops

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 45345017
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 68085386
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 53105700
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 35941000
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 29047035
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 110869352

 

How to start:

myswitch#show run int gi3/0/1 | i priority

priority-queue out

Ok we know that queue 1 is a priority queue. Later on we will see what DSCP values are mapped to queue 1

myswitch# show mls qos int gi3/0/1 queueing
GigabitEthernet3/0/1
Egress Priority Queue : enabled
Shaped queue weights (absolute) : 25 0 0 0
Shared queue weights : 1 30 35 5
The port bandwidth limit : 100 (Operational Bandwidth:100.0)
The port is mapped to qset : 1

We know from this that the port belongs to qset 1.

myswitch#show mls qos int gi3/0/1 stat
GigabitEthernet3/0/1 (All statistics are in packets)

dscp: incoming
——————————-

0 – 4 : 6334650 0 0 0 0
5 – 9 : 0 0 0 0 0
10 – 14 : 0 0 0 0 0
15 – 19 : 0 0 0 0 0
20 – 24 : 0 0 0 0 50507
25 – 29 : 0 0 0 0 0
30 – 34 : 0 0 0 0 268
35 – 39 : 0 0 0 0 0
40 – 44 : 0 0 0 0 0
45 – 49 : 0 231935 0 0 0
50 – 54 : 0 0 0 0 0
55 – 59 : 0 0 0 0 0
60 – 64 : 0 0 0 0
dscp: outgoing
——————————-

0 – 4 : 105609708 0 21572 0 35
5 – 9 : 0 15319 0 0 0
10 – 14 : 0 0 0 0 0
15 – 19 : 0 6121238 0 0 0
20 – 24 : 0 0 44985 0 448876
25 – 29 : 0 0 0 0 0
30 – 34 : 0 0 15590 0 16
35 – 39 : 0 0 0 0 0
40 – 44 : 8577 0 0 0 0
45 – 49 : 0 1017369 0 1118867 0
50 – 54 : 0 0 0 0 23
55 – 59 : 0 0 0 0 0
60 – 64 : 0 0 0 0

We know from this that there is a lot of DSCP 16, 46 and 48 traffic. While 46 is ok (EF voice traffic which hits the priority queue), 48 is a bit surprising (control traffic on a host port?), and what is DSCP 16?

cos: incoming
——————————-

0 – 4 : 15078934 0 0 0 0
5 – 7 : 0 0 0
cos: outgoing
——————————-

0 – 4 : 163322627 286 6167824 449053 16722
5 – 7 : 1026344 1119097 893496
output queues enqueued:
queue: threshold1 threshold2 threshold3
———————————————–
queue 0: 0 0 1041542
queue 1: 6179950 478186 2012921
queue 2: 0 0 169311408
queue 3: 0 0 0

output queues dropped:
queue: threshold1 threshold2 threshold3
———————————————–
queue 0: 0 0 0
queue 1: 0 10 0
queue 2: 0 0 9110533
queue 3: 0 0 0

Most importantly, we can see that queue 3 (yes – queue 3 because of how stupidly cisco shows queues in this output: from 0 to 3. So 0 is 1, 1 is 2 etc…) has all the drops.

 

myswitch#show mls qos queue-set
Queueset: 1
Queue : 1 2 3 4
———————————————-
buffers : 15 25 40 20
threshold1: 100 125 100 60
threshold2: 100 125 100 150
reserved : 50 100 100 50
maximum : 200 400 400 200
Queueset: 2
Queue : 1 2 3 4
———————————————-
buffers : 25 25 25 25
threshold1: 100 200 100 100
threshold2: 100 200 100 100
reserved : 50 50 50 50
maximum : 400 400 400 400

 

So the problem is that DSCP traffic 0-7 is dropped. We’re not really worried because tcp stack will take care of the drops via retransmissions but we can fix it easily.

 

Let’s see how DSCP values are mapped to queues and thresholds.

myswitch#show run | i dscp
mls qos map cos-dscp 0 8 16 24 32 46 48 56
mls qos srr-queue output dscp-map queue 1 threshold 3 32 33 40 41 42 43 44 45
mls qos srr-queue output dscp-map queue 1 threshold 3 46 47
mls qos srr-queue output dscp-map queue 2 threshold 1 16 17 18 19 20 21 22 23
mls qos srr-queue output dscp-map queue 2 threshold 1 26 27 28 29 30 31 34 35
mls qos srr-queue output dscp-map queue 2 threshold 1 36 37 38 39
mls qos srr-queue output dscp-map queue 2 threshold 2 24
mls qos srr-queue output dscp-map queue 2 threshold 3 48 49 50 51 52 53 54 55
mls qos srr-queue output dscp-map queue 2 threshold 3 56 57 58 59 60 61 62 63
mls qos srr-queue output dscp-map queue 3 threshold 3 0 1 2 3 4 5 6 7
mls qos srr-queue output dscp-map queue 4 threshold 1 8 9 11 13 15
mls qos srr-queue output dscp-map queue 4 threshold 2 10 12 14

myswitch#show run | i threshold

mls qos queue-set output 1 threshold 1 100 100 50 200
mls qos queue-set output 1 threshold 2 125 125 100 400
mls qos queue-set output 1 threshold 3 100 100 100 400
mls qos queue-set output 1 threshold 4 60 150 50 200

 

So what can we do to fix this?

  • move some dscp values to queue 1 so that queue 3 doesn’t overflow? This is a bad idea because DSCP 0-7 should not be prioritized. This could be scavenger traffic for all we know.

How about increasing buffer ratio for queue 3? … nah. there’s a better solution

mls qos queue-set output 1 buffers 10 40 35 15

  • change the scheduler on the interface so that certain queues are serviced more often than others? Again, we might negatively influence other traffic

srr-queue bandwidth share 1 30 30 10

The best (first) idea is actually changing drop threshold

  • increase threshold for queue 3  waaaay up

 

mls qos queue-set output 1 threshold 3 800 800 100 800

 

Final conclusion:

If you install on your PC any application that is using a non-standard (non EF) DSCP value and is sending traffic in bursts AND your port is 100mbit, you might get into trouble if you have a default qos config on your switch.

After a few days of tweaking, here’s the final config.

 

mls qos queue-set output 1 threshold 1 175 200 60 300
mls qos queue-set output 1 threshold 2 125 125 100 400
mls qos queue-set output 1 threshold 3 800 800 100 800
mls qos queue-set output 1 threshold 4 400 400 50 400
myswitch#show run | i buffer
mls qos queue-set output 1 buffers 15 40 25 20

myswitch#conf t

myswitch(config)#int gi1/1

myswitch(config-if)#srr-queue bandwidth share 1 30 30 10

 

 

 

Skomentuj

Wprowadź swoje dane lub kliknij jedną z tych ikon, aby się zalogować:

Logo WordPress.com

Komentujesz korzystając z konta WordPress.com. Wyloguj /  Zmień )

Zdjęcie na Google

Komentujesz korzystając z konta Google. Wyloguj /  Zmień )

Zdjęcie z Twittera

Komentujesz korzystając z konta Twitter. Wyloguj /  Zmień )

Zdjęcie na Facebooku

Komentujesz korzystając z konta Facebook. Wyloguj /  Zmień )

Połączenie z %s