Wyróżnione

become an IT expert with #humanITy

john_whitelbow

My name is Tomek De Wille. I’m the owner of humanITy, a small IT training shop for all those of you who want to change their lives and believe they can.

I used to be a language teacher but in 2009 I had a sudden change of heart. I knew I could do more than teach. I joined Nokia as a technical editor and started learning about networks. Fast forward to 2017 and i’m a senior network engineer, troubleshooting networks all around the globe. Because I still love teaching, i’ve created humanITy – a home for anyone who wants to become an IT expert. We run IT courses on Saturdays and Sundays in the beautiful city of Wroclaw, the very heart of Europe.

For a very brief (and very true) explanation of why we founded humanity, see this 3minute youtube video

If you speak English and would like to start a successful career in IT, have a look at our site http://www.humanity.pl

Follow our blog for discount codes, information on new and exciting IT courses and much, much more.

Logical thinking and even a little knowledge goes a long way

Hello

I had an interesting problem today. The customer has two lines: MPLS and internet with dmvpn running on both.

Suddenly the voice quality dropped, and we saw that the tunnel over MPLS is down.

The route to the DMVPN hub goes through the mpls line with the IP of the CE router being 172.16.0.1

However, the arp entry for 172.16.0.1 was incomplete. Clearly the line was faulty.

A ticket to to the telecom was raised but the answer was unexpected: you haven’t been using the line for the last seven days.

I was like: what do you mean we’re not using the line! we’re not using the line because it’s DOWN, that’s what it is…but i began to think about it: in every answer there’s a grain of truth…

So i pinged the broadcast 172.16.0.255 and VOILA – 172.16.0.2 responded. But what is 172.16.0.2??? Let’s experiment since the line is down anyway
I changed the static route to go through the .2 address and of course the line was working again.

It turned out that the telecom replaced the line a while ago and changed the ip address of the CE router.

Now my point is here: i only used logic and one simple ip route command. No big deal, yet nobody else has come up with the idea to look for other addresses on the subnet. This is great new for beginners in the IT world: keep your mind open, be on the ball and even limited knowledge can bring great results.

ospf sham-link

Hello

Today another episode of ”things that sound complex but are in reality very easy”.

Why create a sham-link?

Normally, prefixes received from your other PE look like IA routes (because of the existence of the super core concept in BGP VPNv4), so if you have a backup link between your CEs (just in case), the backup link will always be preferred because the intra area prefixes will be always preferred over the inter-area prefixes from your PE, who is treated as an ABR between a normal area and the supercore.

The problem is that the backup link is a private link, so it might be a more expensive link, something we want to use it for emergencies only, not as the primary link.

What we will be doing is creating a standard vpnv4 tunnel (BGP session through an MPLS core) and then creating another ospf virtual-link tunnel through it. This overlay tunnel is a sham-link. Think of the sham-link as an improved ospf virtual-link. This is why it’s called a shamlink – a false impression of a true link.

 

Prerequisites

  • mpls-enabled core between loopbacks of PEs (command mpls ip everywhere or simply mpls ldp autoconfig under the ospf process everywhere)

Step 1

Assuming that you already have mpls between the /32 loopbacks of PEs, build a bgp session with the other PE router, creating effectively a tunnel across an OSPF provider core. So apart from the normal bgp session, you need to activate the address family vpnv4. Activating ipv4 family is not needed so you can add the command no bgp default ipv4-unicast to disable the default ipv4 address-family.

router bgp 100
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor 150.1.4.4 remote-as 100
neighbor 150.1.4.4 update-source Loopback0
!
address-family ipv4
exit-address-family
!
address-family vpnv4
neighbor 150.1.4.4 activate
neighbor 150.1.4.4 send-community extended
exit-address-family

Step 2

Create a new loopback address and advertise it into BGP address family ipv4 just like you would advertise your other addresses to the other PE.

interface Loopback 200
ip vrf forwarding VPN_A
ip address 150.1.55.55 255.255.255.255

router bgp 100
address-family ipv4 vrf VPN_A
network 150.1.55.55 mask 255.255.255.255

Step 3

Create a sham link with the new loopback of the other PE.

router ospf 100 vrf VPN_A
area 1 sham-link 150.1.55.55 150.1.66.66 cost 1

Additionally, you might want to make sure that the new loopback is not advertised to the CE.

 

Verification command:

show ip ospf sham-links

R6#show ip ospf sham-links
Sham Link OSPF_SL0 to address 150.1.55.55 is up
Area 1 source address 150.1.66.66
Run as demand circuit
DoNotAge LSA allowed. Cost of using 1 State POINT_TO_POINT,
Timer intervals configured, Hello 10, Dead 40, Wait 40,
Hello due in 00:00:00
Adjacency State FULL (Hello suppressed)
Index 2/2, retransmission queue length 0, number of retransmission 0
First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
Last retransmission scan length is 0, maximum is 0
Last retransmission scan time is 0 msec, maximum is 0 msec
You do the same thing on the other PE and voila! it’s done. Now your sham-link vpnv4 prefixes are also intra-area prefixes, just like your backup link prefixes. You can now use cost to prefer sham-link prefixes over backup link prefixes.

R8#show ip route ospf
Codes: L – local, C – connected, S – static, R – RIP, M – mobile, B – BGP
D – EIGRP, EX – EIGRP external, O – OSPF, IA – OSPF inter area
N1 – OSPF NSSA external type 1, N2 – OSPF NSSA external type 2
E1 – OSPF external type 1, E2 – OSPF external type 2
i – IS-IS, su – IS-IS summary, L1 – IS-IS level-1, L2 – IS-IS level-2
ia – IS-IS inter area, * – candidate default, U – per-user static route
o – ODR, P – periodic downloaded static route, H – NHRP, l – LISP
a – application route
+ – replicated route, % – next hop override

Gateway of last resort is not set

150.1.0.0/32 is subnetted, 4 subnets
O 150.1.7.7 [110/11] via 155.1.78.7, 00:32:39, Ethernet0/0.78
155.1.0.0/16 is variably subnetted, 12 subnets, 2 masks
O 155.1.7.0/24 [110/20] via 155.1.78.7, 00:32:39, Ethernet0/0.78
O 155.1.37.0/24 [110/20] via 155.1.78.7, 00:32:39, Ethernet0/0.78
O 155.1.67.0/24 [110/20] via 155.1.78.7, 00:01:44, Ethernet0/0.78
O 155.1.79.0/24 [110/20] via 155.1.78.7, 00:32:39, Ethernet0/0.78
172.16.0.0/16 is variably subnetted, 3 subnets, 2 masks
O 172.16.7.7/32 [110/11] via 155.1.78.7, 00:32:39, Ethernet0/0.78
O E2 192.168.6.0/24 [110/1] via 155.1.58.5, 00:26:42, Ethernet0/0.58

 

Then you can run:

show ip ospf topology network

show ip ospf topology router

 

to check the reasoning of OSPF (why it chooses intra area path through router 5 than through the backup link of router 7). Specifically, if you set the cost on r8, you need to check LSA router of R8 – what it thinks is the cost to get to r7. The same on the other end

LS age: 1263
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 172.16.7.7
Advertising Router: 172.16.7.7
LS Seq Number: 8000000C
Checksum: 0x1987
Length: 108
Number of Links: 7

Link connected to: a Stub Network
(Link ID) Network/subnet number: 150.1.7.7
(Link Data) Network Mask: 255.255.255.255
Number of MTID metrics: 0
TOS 0 Metrics: 1

Link connected to: a Stub Network
(Link ID) Network/subnet number: 172.16.7.7
(Link Data) Network Mask: 255.255.255.255
Number of MTID metrics: 0
TOS 0 Metrics: 1

Link connected to: a Stub Network
(Link ID) Network/subnet number: 155.1.79.0
(Link Data) Network Mask: 255.255.255.0
Number of MTID metrics: 0
TOS 0 Metrics: 10

Link connected to: a Transit Network
(Link ID) Designated Router address: 155.1.78.8
(Link Data) Router Interface address: 155.1.78.7
Number of MTID metrics: 0
TOS 0 Metrics: 500

 

Sancho Panda and the mgmt vlan of death

Hello

Today i got a call from my friend ( to whom I shall henceforth be referring to as Sancho Panda) with the following problem: One of the switches had a consistently high CPU usage caused by HULC DAI process. The software upgrade that I recommended earlier didn’t help so it had to be something else.

It turned out to be one of those great design ideas.

The client has a typical star topology consisting of 2960g, 2960s and 2960c switches. The switches have about 150 vlans each + management vlan 999. The spanning tree mode is RSTP.

You might point out that this is already a bad design because some switches may not support this number of spanning tree instances. Yikes. And the management vlan is 999 so it is one of those vlans without a spanning tree.

Now comes another engineer with a brilliant idea: a star topology is not redundant, so he connects two spokes with a redundant connection.

Unfortunately, the hub of this star topology is a 2960g which doesn’t support that many spanning tree instances

The moment we tried to troubleshoot this and remoted into the 2960 hub, we caused a loop in layer 2 and caused an outage of the customer site.

The irony of all this is because the only vlan that causes this loop is management vlan, had we not tried to troubleshoot the relatively (80-90%) cpu usage on a spoke, we wouldn’t have caused an outage of the whole site.

Ouch ouch. Thanks Sancho!

Don-Coyote-and-Sancho-Panda.jpg

 

 

 

vmware player 12.5.9 + gns 2.1.5 + csr1000v 16.3.1 ova

Hello

I’ve been playing a while with csr1000v images using vmware player and after 5 hours of troubleshooting here’s some word from the wise: do not use importing the appliance option with qcow csr images because they’re broken. CSR freezes right at boot time. Apparently this has been a problem since 16.2.1

Instead, i downloaded the ova file and ran it as a vmware vm in gns3. So far so good.

Before, i was able to run csr1000v qcow image when using vmware workstation pro. When I deleted the pro version, i came across another problem: vmplayer doesn’t have the vmware VIX installed so you have to install VIX. But then you need to remember that the latest vm player version 14.x needs 1.17 VIX – release notes are there for this version but where are the download files???
In the end, I ended up downgrading to 12.5.9 version of the player with 1.15 VIX + ova csr images.

5 hours of my life.

 

switch egress qos troubleshooting

Hello

I had an interesting case today where clients were having problems with Lync calls.

On nearly all interfaces in a 2960x stack I could see a lot of output drops

Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 45345017
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 68085386
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 53105700
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 35941000
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 29047035
0 unknown protocol drops
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 110869352

 

How to start:

myswitch#show run int gi3/0/1 | i priority

priority-queue out

Ok we know that queue 1 is a priority queue. Later on we will see what DSCP values are mapped to queue 1

myswitch# show mls qos int gi3/0/1 queueing
GigabitEthernet3/0/1
Egress Priority Queue : enabled
Shaped queue weights (absolute) : 25 0 0 0
Shared queue weights : 1 30 35 5
The port bandwidth limit : 100 (Operational Bandwidth:100.0)
The port is mapped to qset : 1

We know from this that the port belongs to qset 1.

myswitch#show mls qos int gi3/0/1 stat
GigabitEthernet3/0/1 (All statistics are in packets)

dscp: incoming
——————————-

0 – 4 : 6334650 0 0 0 0
5 – 9 : 0 0 0 0 0
10 – 14 : 0 0 0 0 0
15 – 19 : 0 0 0 0 0
20 – 24 : 0 0 0 0 50507
25 – 29 : 0 0 0 0 0
30 – 34 : 0 0 0 0 268
35 – 39 : 0 0 0 0 0
40 – 44 : 0 0 0 0 0
45 – 49 : 0 231935 0 0 0
50 – 54 : 0 0 0 0 0
55 – 59 : 0 0 0 0 0
60 – 64 : 0 0 0 0
dscp: outgoing
——————————-

0 – 4 : 105609708 0 21572 0 35
5 – 9 : 0 15319 0 0 0
10 – 14 : 0 0 0 0 0
15 – 19 : 0 6121238 0 0 0
20 – 24 : 0 0 44985 0 448876
25 – 29 : 0 0 0 0 0
30 – 34 : 0 0 15590 0 16
35 – 39 : 0 0 0 0 0
40 – 44 : 8577 0 0 0 0
45 – 49 : 0 1017369 0 1118867 0
50 – 54 : 0 0 0 0 23
55 – 59 : 0 0 0 0 0
60 – 64 : 0 0 0 0

We know from this that there is a lot of DSCP 16, 46 and 48 traffic. While 46 is ok (EF voice traffic which hits the priority queue), 48 is a bit surprising (control traffic on a host port?), and what is DSCP 16?

cos: incoming
——————————-

0 – 4 : 15078934 0 0 0 0
5 – 7 : 0 0 0
cos: outgoing
——————————-

0 – 4 : 163322627 286 6167824 449053 16722
5 – 7 : 1026344 1119097 893496
output queues enqueued:
queue: threshold1 threshold2 threshold3
———————————————–
queue 0: 0 0 1041542
queue 1: 6179950 478186 2012921
queue 2: 0 0 169311408
queue 3: 0 0 0

output queues dropped:
queue: threshold1 threshold2 threshold3
———————————————–
queue 0: 0 0 0
queue 1: 0 10 0
queue 2: 0 0 9110533
queue 3: 0 0 0

Most importantly, we can see that queue 3 (yes – queue 3 because of how stupidly cisco shows queues in this output: from 0 to 3. So 0 is 1, 1 is 2 etc…) has all the drops.

 

myswitch#show mls qos queue-set
Queueset: 1
Queue : 1 2 3 4
———————————————-
buffers : 15 25 40 20
threshold1: 100 125 100 60
threshold2: 100 125 100 150
reserved : 50 100 100 50
maximum : 200 400 400 200
Queueset: 2
Queue : 1 2 3 4
———————————————-
buffers : 25 25 25 25
threshold1: 100 200 100 100
threshold2: 100 200 100 100
reserved : 50 50 50 50
maximum : 400 400 400 400

 

So the problem is that DSCP traffic 0-7 is dropped. We’re not really worried because tcp stack will take care of the drops via retransmissions but we can fix it easily.

 

Let’s see how DSCP values are mapped to queues and thresholds.

myswitch#show run | i dscp
mls qos map cos-dscp 0 8 16 24 32 46 48 56
mls qos srr-queue output dscp-map queue 1 threshold 3 32 33 40 41 42 43 44 45
mls qos srr-queue output dscp-map queue 1 threshold 3 46 47
mls qos srr-queue output dscp-map queue 2 threshold 1 16 17 18 19 20 21 22 23
mls qos srr-queue output dscp-map queue 2 threshold 1 26 27 28 29 30 31 34 35
mls qos srr-queue output dscp-map queue 2 threshold 1 36 37 38 39
mls qos srr-queue output dscp-map queue 2 threshold 2 24
mls qos srr-queue output dscp-map queue 2 threshold 3 48 49 50 51 52 53 54 55
mls qos srr-queue output dscp-map queue 2 threshold 3 56 57 58 59 60 61 62 63
mls qos srr-queue output dscp-map queue 3 threshold 3 0 1 2 3 4 5 6 7
mls qos srr-queue output dscp-map queue 4 threshold 1 8 9 11 13 15
mls qos srr-queue output dscp-map queue 4 threshold 2 10 12 14

myswitch#show run | i threshold

mls qos queue-set output 1 threshold 1 100 100 50 200
mls qos queue-set output 1 threshold 2 125 125 100 400
mls qos queue-set output 1 threshold 3 100 100 100 400
mls qos queue-set output 1 threshold 4 60 150 50 200

 

So what can we do to fix this?

  • move some dscp values to queue 1 so that queue 3 doesn’t overflow? This is a bad idea because DSCP 0-7 should not be prioritized. This could be scavenger traffic for all we know.

How about increasing buffer ratio for queue 3? … nah. there’s a better solution

mls qos queue-set output 1 buffers 10 40 35 15

  • change the scheduler on the interface so that certain queues are serviced more often than others? Again, we might negatively influence other traffic

srr-queue bandwidth share 1 30 30 10

The best (first) idea is actually changing drop threshold

  • increase threshold for queue 3  waaaay up

 

mls qos queue-set output 1 threshold 3 800 800 100 800

 

Final conclusion:

If you install on your PC any application that is using a non-standard (non EF) DSCP value and is sending traffic in bursts AND your port is 100mbit, you might get into trouble if you have a default qos config on your switch.

After a few days of tweaking, here’s the final config.

 

mls qos queue-set output 1 threshold 1 175 200 60 300
mls qos queue-set output 1 threshold 2 125 125 100 400
mls qos queue-set output 1 threshold 3 800 800 100 800
mls qos queue-set output 1 threshold 4 400 400 50 400
myswitch#show run | i buffer
mls qos queue-set output 1 buffers 15 40 25 20

myswitch#conf t

myswitch(config)#int gi1/1

myswitch(config-if)#srr-queue bandwidth share 1 30 30 10

 

 

 

back to labbing, mpls section

Hello

After a nearly 2 month break i’m back to labbing for the ccie exam. I’d never had similar breaks before but the reason for doing so were manifold.

Firstly, i was just fed up with doing labs every single day. Secondly, i felt i was making progress anyway because of how much stuff i learnt at my work. Thirdly, I had to revise the mpls theory, which i’d never been very strong at. Finally (and most importantly), my little project called ”the son” is way more fun than doing labs and probably any father can relate to that.

So after this little break, i’m back with recharged batteries. My personal deadline for the exam is still January 31, 2019 so let’s get down to business.

 

 

English is not the be all and end all

Hello

Yesterday i spent 5 hours installing hardware for a client in Brazil. And because they couldn’t speak good English, I went an extra mile by using my spanish. It was great fun, although of course the client was speaking portuguese so it was a bit difficult at times. But the important thing is: there will be situations where English is not enough. I strongly encourage anyone to learn at least one language extra because first it’s great fun , second, learning e.g. German will get you a sizeable salary boost because there are a lot of projects being transferred. Do some research and find out learning which language pays off well and start learning… now 🙂 not next Monday, not next month. Do something now.