A recent install of a new UPS in my home lab which required a power down all of all my kit. This always makes me nervous as its not often I shutdown everything and there’s always a possibility It might not all come back online. At least not without a bit of tinkering here and there, this time was one of those occasions.
Post installing the UPS and powering everything back up only one device was offline. Accept this was my Synology RS815 NAS. Basically the storage strategy for all my home kit, iSCSI storage for some of my VM’s, SMTP server, syslog server, IP camera surveillance server, the list goes on.
It would respond to ping and I could even SSH to it but storage was offline and the Synology web GUI was not responding. After consulting with Google I was convinced this was an issue with the Apache service. I then preceded to run various commands via the CLI with no prevail. I was starting to contemplate a factory default of the NAS. This was not an appealing prospect at all. I follow the storage 3-2-1 strategy so I had backups of all the data but its the time required to restore everything that was a depressing thought.
The NAS was reloaded (again) and had one last attempt at logging in via the GUI but this time from a different workstation. I initially received an untrusted certificate error (This workstation did not have my CA certificates installed) which I accepted and then the session just hung. I’ve seen this behaviour before a few years back while troubleshooting an issue and this looked similar.
It stank of an MTU Issue
Jumbo frames are enabled on all the devices that support it including the NAS and my workstations. I lowered the MTU on the interface of my workstation to the default and tried a new session to the NAS web GUI. Boom! it responded and I could log in and the storage was once again available. This confirmed my suspicion of an MTU issue.
I began checking the configs on each device in the path only to find that jumbo frames was enabled as expected on all of them. So where was the fragmentation taking place? Packet capturing at strategic places revealed a Cisco SG300-20 switch which connects to the NAS was the culprit.
The config on the switch was double checked, the web GUI was reporting that jumbo frames were enabled:
And from the CLI:
SG300-20#sh ports jumbo-frame Jumbo frames are enabled Jumbo frames will be enabled after reset SG300-20#
The config on the switch reports Jumbo frames were enabled but the packet captures are revealing fragmented frames ?
Time to reload the switch. Once it had booted back up I re-enabled jumbo frames on the workstation and re tried the web GUI and storage drives, boom! everything was back to normal.
Summary
Packet capturing is seen by some as the last resort for troubleshooting. I certainly wished I’d done a capture before going down the wrong and very dark hole. When your experiencing a strange issue don’t believe what a devices configuration is telling you. Do a capture and look closely and what’s happening on the wire as in the words of the creator of Wireshark Gerald Combs “Packets Don’t Lie”
Failing that, just switch it all off and back on again, a strategy that seems to fix most things 🙂