At 8:28 this morning, I moved one of our domains (brainuse.com) to use the new DNS service. The day before, I've made sure that all records had a TTL and expiration of 1 hour so that if something went wrong it wouldn't be for more than 1 hour.
At 9:20 I started checking the new DNS and things were not going well. At 9:30 I was in panic because a lot of queries were failing. The DNS requests were not in the expected format.
So I started a frantic search for some answers, debugging, investigating, reading about a dozen RFCs, and... I found RFC 2671 that makes a breaking change to the DNS infrastructure if you don't support it.
I decided to revert to our old DNS and implement 2671. Implementation is done and tested and I'll try again in a few minutes. Need some time to catch my breadth.
The good thing is that I really minimized impact for our users. First, because I set the DNS cache to 1 hour. Second, because I was relatively fast to roll back. And third, because I didn't do a full roll out, just 1 domain which is, by the way, our least used domain.
On MSN, I'd say that 80% of downtime that we had during a year was related to network issues. So, every time you are making changes to the software or hardware of the network infrastructure you must be prepared (and expecting) the worse.