Yukk. This mouse was retired from the desk of someone in our office who smokes. Bleah. There’s a matching keyboard if anyone wants to buy the set.
Our Toronto office moved on the weekend. We are also getting all the configuration details ready for a switchover to a new VPN infrastructure. We hadn’t planned to go live on the new infrastructure until the end of this week or the beginning of next week, but Thursday last, two days before the Toronto office move, we found out that the old ISP for Toronto would be unable to provide service in the new location, and the only connectivity we would have would be the new infrastructure.
A flurry of router configuration, proxy server construction and general networking hackage ensued. We connected Toronto to our main office via the new VPN. Since the new VPN infrastructure will have an Internet gateway that isn’t ready yet, the routers on the new VPN don’t know about the old internet gateway at the head office, so the only way to get the Toronto users on to the internet was to use a proxy server that was aware of both the old gateway and the new VPN. I built that as a virtual machine on a machine that was already mostly built, and deployed it at the head office, and over the weekend we got everybody connected from Toronto.
Unfortunately, I needed the server that I deployed the proxy on to send to the location where our new gateway will be located, for the new production proxy server. I decided to take one of my lab servers from Engineering and re-deploy the virtual machine on it, and then swap it out for the other server so that I could send the other server to the new gateway location.
Both boxes are identical HP ML370 severs. I set up the lab one to be identically configured to the one in the server room. Then I copied the working proxy server over to the lab machine from the live machine. I tested everything in the lab and it all worked fine. Then I waited until after hours for the Toronto office, and took the lab machine next door to the server room. I shut down the live ML370, unplugged it’s cables, and plugged them into the ML370 from the lab and turned it on. It wouldn’t boot. It powered up but no video signal came out.
I tried switching power cables, keyboard, mouse, monitor, and network cables. Nothing worked. The machine from the lab wouldn’t boot. Then, since the Toronto office was without internet connectivity, I restored all the original cables and put the old box back in, fired it up and saw that it worked fine.
I quit for the day after asking Stuart to look into warranty replacement for the lab machine.
This morning I took the broken lab machine back into the lab, and just for fun I hooked it up in to the cables I had used when I configured it the day before. It boots up fine and works just as it should. I almost wish it hadn’t. Problems that have no discernable cause are so hard to figure out.
We’re busy like bees around here getting ready for all kinds of stuff. I’ve been working on getting ready to flip over to our new VPN provider, and helping handle network issues for an office move, screwing around wth routers and firewalls, proxy servers and virtual machine hosts.
Dad’s also been back in the hospital for post-operative coplications from his heart surgery, which makes things seem even more surreal.
The hanging blade server problem caused by zmd is gone, to be replaced by one where zmd itself hangs. At least it’s an improvement on having the whole server down.
I have been testing a beta patch for the ZLM Linux management agent, to see if it would prevent my VMware GSX server blades from crashing every few days. I installed the patch (zmd7020a) on one blade while leaving the others alone, and then let them run normally, including some VM loads, for several days. Every single blade hung up over Easter weekend, except the one with the patch installed, and everyhing else on that one blade seems to be working normally, so I am going to try deploying the patch on the rest of the servers. I installed the patch to the rest of the blades this morning by making it into a ZLM bundle and pushing it out via ZenWorks.
Hopefully the hang-up problem is now fixed.
Mack turned the big oh-seven last week. He wanted to go bowling for his birthday so we took him and seven other boys to the St. Albert bowling alley. They had lots of fun, and Mack even got a strike! Then they ate pizza, which the bowling alley provided and which was suprisingly good, and ate cake (which I made, I’m proud to say). It was a good birthday. He also got spoiled by us and his grandparents. There are now several new Gamecube games in our house thanks to various people. He also got a very nice skateboard from Oma and Opa, and we got him protective gear to go with it, including a new helmet, which he already tested by landing completely upside-down on his head during one of the funniest moments of the weekend (don’t get me wrong, I don’t usually laugh at the kids wiping out, but he didn’t hurt himself and he couldn’t have done a more perfect skateboard dismount to a headstand if he tried).
Easter brunch saw 12 people around our table this year, and since it could have easily been less, what with both Jenn’s dad and mine having had open heart surgery two weeks apart, we were very grateful to have everyone there. Even Grandma, who will have her 90th birthday later this year, was quite spry and cheerful. Everyone ate decorated eggs and cold-cuts and buns, but Emily stole all the swiss cheese. It’s nice to have everyone over, and since we always massively clean our house when we put on somthing like that, we now have a nice clean house to enjoy too.
I re-built eight of the nine blades in the Bladecenter after Brainshare. I wasn’t quite done by the weekend, so I finished that off this week. I got the VMs from our new financial management system running on the newly-rebuilt blades, and configured one of our routers so that our developers could see the VMs from their regular desktops, even though the VMs are in the normally isolated lab network. Then I started working on other stuff.
We are trying to get ready for a major switchover from our existing VPN infrastructure to a new one from a new provider. The provider has some challenges getting things setup the way we need. We have also had problems getting fibre pulled into the various sites, with contractors shrugging the responsibiltiy back and forth between themselves and the telco provider. I think we’re finally just about ready with that stuff.
Meanwhile, the blades started crashing overnight, so I spent a day or two figuring out what was causing that. It turns out cron would execute zenworks zmd (management agent) to do some maintenance function, and it would cause a kernel panic, locking up the machines. I found a beta patch of the zmd piece, and installed that on one server to see if it would work. I’m waiting to see if it stays up when the other blades crash.
I have been struggling with Multipath-tools to get my IBM bladecenter working properly with my IBM Fibre-channel SAN. I got multipath-tools working manually on each blade, assigned storage to storage groups so that storage partitions on the SAN would only be visible to the appropriate blades, and had everything working including failover when one of the redundant paths to the SAN went away. The only thing I couldn’t figure out was why multipath paths wouldn’t automatically show up after a boot. I had to login as root and run multipath, which would then make all my paths show up in /dev/disk/by-name, according to my mappings in /etc/multipath.conf. For some reason they wouldn’t show up until I did this step manually.
Finally, I was just working on something unrelated and I had the idea that I may have forgotten to add part of the fibre-channel driver set to the initial ramdisk. I edited /etc/sysconfig/kernel, and sure enough, I had remembered to add the drivers qla2xxx and qla2xxx_conf, but not qla2300. I added this to the iniial ramdisk line, ran mkinitrd, and rebooted, and lo and behold, my /dev/disk/by-name directory was automatically populated with my storage mappings. Whew.
Did you ever decide that you were annoyed with your computer, because of some wierd hardware / software glitch and make up your mind to re-install with a different version of Linux? Did you ever spend several hours and a lot of bandwidth to download a DVD iso image on your other computer so you could instal that new Linux distribution? Did you ever stick the disc with the carefully burned DVD image into your computer and reboot, only to realize, “Dammit, I don’t have a DVD drive!”
We went to the “Cabane a Sucre” on Friday, which is a sort of celebration of french-canadian culture, where we eat tourtiere and beans and guzzle maple syrup and fun stuff like that. Unfortunately this year the tourtiere (at least I think it was the tourtiere) was a little off. Friday night Mack was up all night sick. Saturday morning Emily and I got it Saturday morning, and Jenn got it later Saturday after going out early and running 7 km in a relay running race. We were all sick right up to Monday. When Jenn called into the school to let them know the kids wouldn’t be there, the receptionist said “This may sound like a funny question, but were you at the Cabane a Sucre on Friday?” Jenn asked why, and found out that half a dozen families or more that went to the Cabane a Sucre had kids out of school Monday. I’m swearing off tourtiere for a while.