Every now and then I have a look how the good old GNS is changing and I was kind of amused to see that the 2.2 update will bring the link awareness, that is: if you don’t connect a cable between two devices, the link will be down/down. Before, any link that you unshut would be automatically up.
Anyways, I’m kind of ”between jobs” at the moment as i’m moving to a different team so I feel like i’m in a limbo. Stll, if everything goes according to plan in the next 2 months, there are a few courses that i’d like to do in the future, depending on what i’ll be doing in my new team. It’s either CCIE wireless from networkdojo or CCIE enterprise from Micronics. Or maybe both.
On a slightly related note, i’ve been reading 12 rules for life by Jordan Peterson and I found this passage that I thought was very inspiring. If you have a choice between security and self-improvement, it’s a good idea to choose improvement.
“You are by no means only what you already know. You are also all that which you could know, if you only would. Thus, you should never sacrifice what you could be for what you are. You should never give up the better that resides within for the security you already have—and certainly not when you have already caught a glimpse, an undeniable glimpse, of something beyond.”
I was watching another video with Jordan Peterson today and I was reminded of how important it is to have an authority figure in your life, someone you can look up to, someone that you can listen to every day. Especially now where the world is changing at a pace never seen before, we are getting a bit lost, and there are many questions that nobody seems to have a good answer to. How to catch up with the changing technology and expectations at our workplace? What is my role in the family that is so unlike the family I was part of as a child? how to keep my life balanced and meaningful? Am I smart enough to get ahead in life or in my job?
A decade ago, back when I started learning about IP networks, I was lucky enough to stumble upon Jeremy Cioara, an amazing instructor from CBTnuggets. But he’s not only that: he can create an atmosphere where you can believe that you can achieve anything you want, that you are capable of achieving your goals, and that your goals are worth fighting for. Once in a while Jeremy will share with you his vision of the world, or he will tell a story about his wife and kids, how he deals with pressure etc., And even though it’s supposed to be strictly about IT, these extra bits and pieces about ”the world of Jeremy” are such a great addition to these courses because then Jeremy becomes a real person, a person that has the same challenges that you have, a person that also deals with the lack of time, with any mistakes he’s made at work etc. And he is able to maintain an unbelievable level of optimism and certain insouciance that makes you feel that this can be indeed the best job you can get because no problem is insurmountable.
For a few years Jeremy’s words were probably nearly 90% of all words that I heard every day as i spent up to 6 hours every day watching CBT. I’m fairly sure that if it hadn’t been for Jeremy, I wouldn’t have achieved that much in such a short period of time. And it would have been much more painful with any other trainer. So if you’re like me, try to browse through a few IT portals with video courses and see if any person in particular comes across as someone that you admire or at least someone you don’t mind watching for very extended periods of time.
So to all Jeremies, Brians and Jordans out there: cheers! You are the leaders of men in the age of technology and fast-paced change.
Once in a while Cisco releases a so-called Field Notice, and i’ve learnt that it’s a good idea to subscribe to those notifications because they can spare you a lot of trouble.
Usually a field notice means a bug so critical that it is imperative that you upgrade your devices as soon as humanly possible. The device may crash, die, steal your money, attempt to leave the data center and kill your relatives or worse.
You can subscribe to those notifications at https://cway.cisco.com/mynotifications . They will send you a validation email. Once you validate your email account, you will start getting notifications about bugs and field notices.
One example from this week was a FN for 3650, whose constant memory leaks meant that my SNMP manager kept creating tickets ”device lost”, because the device wasn’t responding to snmp polls.
A more flagrant example (a catastrophy, really) was the time where the clocks failed on ASAs after a certain time…
Today a real no-brainer (especially in hindsight). I had a config where the distribution switch has some vlan interfaces with multicast devices connected to access switches downstream. The problem was that devices on different vlans couldn’t communicate with multicast even though PIM sparse was enabled on both vlan interfaces and RP was set for all groups for bidir traffic. The problem was really simple: if you add the keyword bidir, make sure that on RP you set the same bidir keyword, otherwise it only wants to be an RP for non-bidir multicast traffic.
The lesson here is again very simple: divide the problem into digestible chunks:
- what vrf are you looking at?
- is the RP for the same vrf? do you have any RP-address config where you have bidir for some groups only?
- is sparse mode enabled? or bidir?
- what can you see in show ip mroute? S,G entries or only *,G entries? why?
- what can you see in show ip igmp membership tables?
- where is the RP? what can you see on the RP? Does it know that it is the RP and is PIM enabled on the interface with the IP address that is the RP? Is the same mode enabled (sparse? bidir?) Is it bidir for all groups or only for some?
A bit of research (especially if you deal only occasionally with multicast) doesn’t hurt. That way you know (and you’re not surprised…) why you only see *,G entries when bidir is enabled.
I’ve noticed that if you can’t do show run on ISE and the server doesn’t respond to SNMP, a quick solution is to disable CDP on your gi interface and everything starts working fine. Funny, innit?
I still remember my first job in the network field, it was for a company called Sonicwall. Most of our phone calls were from really inexperienced admins, and because most of them had pppoe connections, a large chunk of the tickets could be solved by suggesting that mtu should be 1492, not 1500.
Now today my colleague had an interesting case where employees couldn’t log in to skype business. The router logs had a lot of max fragments errors, which clearly showed that it was busy trying to fragment packets but it just couldn’t store all the fragments in the memory.
Adjusting the TCP MSS on the dialer interface solved the problem. This is also described here. It turns out that skype logins use TCP and skype seems to ignore the PMTUD.
Everyone loves getting those SNMP agent lost alarms twice a day right? This is great to make your NOC statistics look bad because you don’t have enough time to take on a ticket because it gets resolved automatically within 10 minutes.
So i started googling for an answer to this petty problem and couldn’t find anything. On Cisco software the 16.3.5 is deferred, but not 16.3.5b so this seemed fine. Release notes don’t mention any flagrant bugs either.
On the device the problem itself is kind of weird – every 7 minutes or so i get CPU spikes to 85% with SNMP and iosd_ipc and dbal processing being the biggest cpu hogs (SNMP 25%, iosd 25%, dbal about 20%), but the snmp agent lost events ocurred only twice a day, not every 7 minutes, so there had to be some other factor that was difficult to track down. I figured that I could find the guilty party by looking up show snmp stats oid to see what table was being retrieved when CPU was busy. And here was my first suprise – even though it was clear that the AuthManager tables were being retrieved, I couldn’t exclude them from the SNMP view – exclusions did not work. What a drag. This took me about 2 hours on Friday afternoon, and because I hate it when stuff won’t just work, i couldn’t just go home. Partly because my personality is as broken as ios code.
About 5.30 p.m. I realized that the ios is broken beyond repair and figured that since that ios is 2 years old , the upgrade will probably solve the issue anyway and went home. It goes without saying that i wasn’t in a good mood. Not being able to google the answer is never good and usually means that your google search strings are wrong (so probably the CPU was not the problem here). The cisco live presentation on tshooting ios xe is a bit crap, though, so it’s not as if i hadn’t tried. There’s a ton of show commands but no clear advice on what is good output and what is bad output.
Today I had some more time and finally found the relevant field notice: https://www.cisco.com/c/en/us/support/docs/field-notices/703/fn70359.html with the memory leak info and a suggestion to upgrade the software
The conclusions from this are as follows
- Sometimes neither release notes nor cisco software download pages are updated and you need to look further and further.
- You need to upgrade your software regularly… 16.3.5b was released 2 years ago and has been found to have multiple vulnerabilities anywa
- That ”complete rewrite” of the code from 3.x to 16.x is not exactly a success.
- Spending too much time on researching old software versions may be a waste of time because you need to upgrade anyway.
- Corporate ”software upgrade research teams” are crap and cannot be relied upon. I should have received that info from them a long time ago