Skype is apparently fully functional and has released an explanation of the problem that attributes the failure to Patch Tuesday. Specifically, the peer-to-peer network failed because of the large number of simultaneous reboots and consequent relogins to Skype on boot. There are some questions with this explanation, namely why did it take over 24 hours for the system to fail after a 3am reboot (the default) on Wednesday (failure was Thursday) and if Patch Tuesday is to blame, why didn't it happen last month?
The more interesting notes for me as a non-Skype user is that this shows several consumer behaviors and their ill-effects. The automatic updates for Microsoft are 3am local time to the machine. Very few people change this, even on the enterprise level. For most places, it makes sense. Most are in bed at 3am and nothing is going on. A few 24x7 shops might want to rotate times a bit to prevent disruption of work. But mostly, users (particularly consumer-grade users) aren't going to touch the defaults on their machines. If only we had operating systems and software packages that shipped in a hardened-by-default way, many problems would be averted.
The second interesting note, is that if Skype's explanation is true, that means that vast majority of Skype users have machines that don't require a login on boot. Those machines simply happily login as the default user (and I bet almost all have full admin rights) and the log on to Skype (and their other start-on-boot applications).
Neither of these two behaviors are particularly surprising. Consumer-grade users will not have the time, inclination, and/or capability to harden their machines and you simply can't make them do it either. Systems need to be shipped as hardened-by-default but be usable too. So, dear reader, how would you fix it?
UPDATE (11:06 CDT 8/20/07)
According to ISC Reader Raul, the VOIPSA list has another theory that the crash was in fact a malicious DDoS. There is a proof-of-concept code that will send malformed URIs to Skype Servers that will cripples them and allow them to transverse the entire server list. The ultimate result, assuming enough malicious users do it, is a DoS against the entire balance of Skype servers. I'll contact Skype to get their opinion on the PoC...
UPDATE (11:12 CDT 8/20/07)
And for some humor... (courtesy of ISC Reader roseman)
UPDATE (13:10 CDT 8/20/07)
After reviewing many reader comments, various mailing lists and other sources, I'm inclined to agree that Skype's line on blaming patch Tuesday is a line of bull. The PoC out may or may not work (there is no safe way to test it because the code is proprietary) but there seems to be more than they are telling and many people (including myself) are less than convinced with the story line. The patch Tuesday theory doesn't add up. Why did it take "so long" to have the failure? Why not last month? What about this Proof-of-concept? Skype just isn't answering the questions that matter.
Consumers can tolerate proprietary code (see Microsoft)... consumers don't tolerate being snow-jobbed by their vendors well.
UPDATE (17:00 8/20/07)
Aug 20th 2007
1 decade ago