February 07, 2004
The time to break backwards compatibility is NOW. In Longhorn.
Since my original entry pointing to Michael’s post about "The IE Patch (MS04-004) demystified" I have seen a lot of ridiculous and ludicrous comments in the midst of some great insight. I am only thankful that none of those idiots seem to visit my blog, as I am not sure I would appreciate such dim-witted statements here.
Yes, I’m venting. Mostly because in the midst of Microsoft doing something right as it relates to security, people complain. It wasn’t even a month ago that these same people complained about the IE vulnerabilities... only to find something else to complain about after the recent IE patches. Yesterday on one private mailing list I am on I actually heard people discuss "class action" lawsuits against Microsoft for "loss of profits". Idiots. The moderator of that list sure got a piece of my mind on that one.
But that’s not what this post is about. There are plenty of blog entries and news stories around the world that already point out that RFC 1738 STATES under section 3.3 that the HTTP URL format should NOT include username and password information. Don’t believe me?
The HTTP URL scheme is used to designate Internet resources accessible using HTTP (HyperText Transfer Protocol). The HTTP protocol is specified elsewhere. This specification only describes the syntax of HTTP URLs.
An HTTP URL takes the form: http://<host/>:<port>/<path>?<searchpart>
where <host> and <port> are as described in Section 3.1. If :<port> is omitted, the port defaults to 80. No user name or password is allowed. <path> is an HTTP selector, and <searchpart> is a query string. The <path> is optional, as is the <searchpart> and its preceding "?". If neither <path> nor <searchpart> is present, the "/" may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are reserved. The "/" character may be used within HTTP to designate a hierarchical structure.
Quite frankly... it appears that Microsoft was wrong in breaking the original standards in the RFC by adding the support. And they were right when they removed it. Enough said.
Which gets me to the point of this entry.
When Robert Scoble came to see me recently we got into a discussion about Microsoft’s cardinal rule. "Don’t break the build. Don’t break backwards compatibility." He gave an example that if they simply broke existing software with a patch/change, that action could have devastating effects to Microsoft in client retention, and even bad press. (Sound familiar, in the recent few days?) Steve Ballmer would not appreciate a call from a CTO of a major corporate client screaming that their entire system is broken due to such a change, and as such the cardinal rule is considered their "Prime Directive"… so to speak. That’s interesting.
Why I found this interesting is that this discussion was not only surrounded around IE, but Longhorn. Its launch is still a distant ways off, but this kind of rule SHOULDN'T be part of Longhorn. The time to break backwards compatibility is NOW. In Longhorn.
While that statement sinks in and you prepare to send me a nastygram… let me predicate that by saying I know this comment makes me look like an ignorant outside observer. I am. I acknowledge that. I live in a small box, and don’t look at it from an end user’s perspective... but as a computer security software architect.
Microsoft has made great strides as it relates to designing better security into their operating systems. I have been saying over and over on this weblog that we won’t see any of these significant changes until Longhorn. And I still believe that. Mostly because it takes a few years from the time code is written until it is available to the mass market. We won’t see Longhorn server until at least 2006, and I would bet it’s not really ready until 2007. I base that very fact that in the last three release cycles, there is always a desktop version a year before the server one.
Let’s get back to the topic of this post, as it relates to software development and secure coding. If we look at what Microsoft has been doing as of late, we can see that they have made significant changes to build a foundation for a more secure computing experience:
- They have created better error-reporting software. They have found that the top 20% of their errors make up 80% of the problems. Knowing this and capitalizing allows Microsoft to significantly prioritize and reduce bugs that matter the most.
- They have created better developer tools to help write more secure software, with release of tools like prefix, prefast, AppVerifier and FxCop. Their only problem right now with this is that they AREN’T letting developers know about them!
- They halted product development for a period of time and retrained their developers to code more securely
- They audited as much product source code as humanly possible and now have a dedicated lead security person for each component of the Windows source code to watch over code quality as it relates to security. Previously they had a clean up crew come in after the fact and try to sanitize the master sources.
- Microsoft has begun to provide more secure defaults when shipping new product. As a clear example we have seen the launch of Windows Server 2003 with a lessened attack surface than previous versions of their server product.
- Microsoft now provides better tools such as the Microsoft Baseline Security Analyzer to analyze and audit patch management as it relates to security bugs in a proactive manner.
- After major security incidents (like MSBlaster and MyDoom) Microsoft has released tools to help respond and fix possible vulnerable and compromised machines. Although these are not timely enough (IMHO), it’s still good to see.
- Microsoft has provided a more definitive patch management cycle to address “patch hell” until their newer products get released that have a significantly lessened attack surface, and have better code quality.
- Microsoft will be providing better integrated firewalling with their Internet Connection Firewall (ICF), to be released with the next service pack of XP. Ok this item isn’t about secure coding… but more about "secure by default" mentality.
- Microsoft is being more open about the entire security process. And not just for PR purposes. More articles, documentation and transparent communication are now available through MSDN, Microsoft employee blogs, and Microsoft’s Security webcasts.
With all these positive moves there is one thing that is missing. I have arrogantly stated in the past that the NT kernel continues to be brittle, and riddled with insecurities and needs to be replaced. I would like to alter that thinking and say now that it is time that the kernel gets refactored.
This argument comes into play because there was way too much code written and added in an insecure state before Microsoft retrained its teams to think more securely. The line of reasoning that code bloat means less secure software has been around for ever and is based on simple mathematics. As more lines of code are written the complexity rises exponentially and exposes the system to more vulnerability and risk. But this is true of all operating systems... and any code. On the secure coding mailing list (SC-L) we have been spending time recently discussing how to maintain better code quality and design more secure software. It’s not easy.
But I look back on a great article Joel Spolsky wrote in which he stated that Netscape made the single worst strategic mistake that any software company can make: they decided to rewrite the code from scratch. He was right. It is much more cost effective to refactor code that is working and just needs to be cleaned up. And that is something that from a secure coding perspective is much more difficult to accept. It is a WAY better idea to design it from the start securely, threat model it properly and code it effectively. Bolting on security after the fact is much harder. Grafting secure coding practices onto insecure code isn't always a sane approach… as it would be much more effective to rewrite that code entirely. This is where refactoring comes in. You can rewrite sections of code, and remove "dead weight" as necessary.
This SHOULD be done in Longhorn. Although I am confident that most of the kernel has been rewritten by now over the years… I think that there are entire areas of code that have to be removed, or at the very least, refactored. There are entire subsystems within Windows that simply should be torn out, as they have been replaced with better systems that should be threat modeled, analyzed and refactored. This might/will break backwards compatibility with some software. Some people might not like that. Well... Microsoft could follow what Apple did with OSX, and include VirtualPC for free and allow users to run their legacy software in XP or Windows Server 2003 through a sandboxed virtual machine, allowing them to bridge the gap until the software vendor has time to update their products, or the client finds an alternative.
Let me give you an example. Why is it that there was a Network DDE Escalated Privilege Vulnerability in Windows 2000 a couple of years ago? Why the hell were people still using DDE in software for Windows 2000, when OLE replaced it, and then COM replaced OLE and finally DCOM replaced COM. And guess what... in Longhorn DCOM will be replaced with Indigo! Seems like a PERFECT time to focus on the intricacies of Indigo, design and code it properly (which I would gather they are doing now that they have been properly trained), and provide a clear and clean upgrade path to the new system. Yet I know on Microsoft’s Indigo FAQ they state that Longhorn will still include COM+… but upgraded to include Indigo. *sigh*
There are lots of examples of this within the system. If you think about it for a moment, there are examples ranging from the driver framework to the graphics layer that could be ripped, refactored and replaced. Longhorn is the PERFECT time to do it, and the most logical step forward in the evolution of the server operating system from Microsoft. With Microsoft already giving access to Longhorn API, there is no excuse for the learning curve of the new Longhorn API systems to be to difficult to tackle for any developer. Further to this, Microsoft has made great strides to simplify many of the API and reduce the total amount of code that needs to be written. If we can agree that more lines of code means more potential vulnerability, we can use simple mathematics to show risk/return ROI on products being updated to the new system (as it relates to security).
A perfect example is the new Filter Manager that is in Longhorn and is now backported to XPSP2 (and hopefully W2K SP5... done yet Darren? :) ) that is being used for file system filter drivers (FSFD). Filter drivers have been a significant problem in the past for Microsoft. Too many third party drivers (anti-virus, encryption drivers etc) didn’t play nice together and would choke a system. They didn’t scale well, had stability issues and were all around ugly for interoping with other drivers. I know in one case I used to be able to install two separate antivirus drivers and freeze my system! Microsoft hosts "Plugfests" to do interoperability testing to help mitigate these risks… but made a smarter decision and simplified the framework to reduce the actual amount of code you need to write for a FSFD. This forward thinking maneuver will benefit Microsoft significantly… security, stability and performance of the third party code will be increased as well as its ability to interop better. Complex buggy legacy drivers will be a thing of the past… which only helps the Longhorn platform.
Anyways, enough ranting. You get my point. I think a quote I like from Gene Spafford could best sum this up:
"When the code is incorrect, you can’t really talk about security. When the code is faulty, it cannot be safe."
You may now send me your nastygrams. If they are constructive, please post them here. If not… send them to /dev/null.
Posted by SilverStr at February 7, 2004 08:34 PM
Dana, I'm in awe (vent-wise that is).
Just a couple of things... Not really arguments or support, just comments.
1) MS was responsible for the vulnerability that made Blaster possible. The actual infection was the fault of the slow-to-patch customers with poorly protected networks.
2) For some reason, I'm uncomfortable with MS writing any type of firewall. I need to consider it for awhile but the argument would run along the lines of MS's tendency to embrace-and-extend protocols so that only MS products will work properly with MS firewalls (like what happened with the MS version of Kerberos). If Microsoft would code their security products to work with everyone else's "stuff", they might get more support from the security types. Of course, it'd give the marketing dept. aneurisms though.
I can relate to your uncomfortableness with Microsoft's ICF. They have a credibility and trust problem they need to get over. I think we might be ok on this one though... you CAN turn off the ICF, and use your own. What they are doing correctly here is providing a working firewall with strong policies to help reduce the attack surface of a default installation... allowing you to modify/replace the firewall as needed.
Host-based firewall companies have had years to "educate" the users and profit from this opportunity. As we have seen, it hasn't been very effective. Far to many systems (mostly home users) still don't run a personal firewall, and wouldn't even know WHAT to do with it. By providing safe and useful defaults... Microsoft is able to take advantage of their dominance in the desktop space and fix this gaping hole. I think the adage of "Trust... but Verify" will be in order here.
I would be interested to see some stats on what happened during "Personal Firewall Day" last month... and the "Protect your PC" campaign Microsoft and a few leading vendors have been doing to provide firewalls and antivirus. Are people "getting it"? I'm not sure that they are. By shipping operating systems with more secure defaults (which include a basic firewall)... we all can benefit from it.
You can give MS as much credit as they deserver, but the fact is xpsp2 is still 6 months away, and longhorn anywhere from 2 years to eternity away, and I still get around 10 mydoom in 12 hours, which is probably pretty light compared to some people. MS is the master of telling people how wonderful it'll be just around the corner, and yet, they never say how wonderful it is right now.
And oh look, another email with the subject "hi". I think I'll click it....
Most of what you refer to has nothing to do with the kernel at all, rather the middleware and higher-level services that have been built over the years and which ISVs are dependent on.
Ya, I used the kernel as an example. This has to be done at all levels, with all different subsystems in the OS. The point remains the same though. It has to be refactored. Stuff should be pulled that isn't being maintained or hasn't went through the rigours of Microsoft's new code quality standard as it relates to security. This is never an easy task... but something that makes the most sense at this point in time. Longhorn is being touted as such a far separation from previous versions of their OS. Why not take advantage of that and actually use this "window of opportunity" to bring everything up to snuff?
Eh? Since when did COM replace OLE? Since when did DCOM replace COM? It goes like this:
OLE is built on COM
DCOM is built on COM
OLE Automation is built on DCOM
They aren't separate versions, they are evolutions and extra layers on top of the same core technologies.
I think you are also mistaken in assuming the kernel has been mostly rewritten several times, or that it's possible to just pull random code from it.
Pray tell, what would you scrap?
USER? That would instantly prevent nearly all existing Win32 applications from running. Customers won't want to run their apps-in-a-box. Some clever hacks might allow you to do an Apple and somehow copy/paste the windows from the emulated box onto the screen, but it's asking for majorly pissed off customers and tons of breakage. Yes, I know you think that's acceptable but consider - remembering its emulation subsystem, Linux would end up being able to run Windows apps better than Windows could!
The registry? Even if you rewrote the registry from scratch, there is no evidence this part of the kernel causes security problems. Taking it out of the kernel would kill performance in some cases.
GDI? Force every Win32 graphics driver in the world to be rewritten (many never would be)? No graphics drivers equals no upgrades, so these people would never benefit from the new security features anyway.
No... that stuff will be sticking with Windows for a long time yet.
"Refactoring" code doesn't make it more secure, it makes it easier to extend and maintain usually. If anything it makes it less secure as you change code that may have been ugly but you knew worked.
You're also mistaken about Netscape and the Mozilla code rewrite. It's common for people who have never seen the old Netscape code to assume that it can't have been that bad, they could have just refactored a bit and it'd have been great! But no. Really, the Communicator codebase had been taken as far as it could go, and then some more. No amount of refactoring could have ever salvaged that code. It took a long time, but the rewrite happened, and now even Microsofts own employees use the results.
The final point to remember is that Microsoft are limited in what they can do with Longhorn by:
a) Need for profits/income. Windows and Office upgrades/sales provide a large amount of cash for them. That places an upper limit on how long they can go between releases.
The last one is especially important. If they keep slipping Longhorn by the time it comes out desktop Linux will be beating the snot out of them in the marketplace. Even if Longhorn lives up to all the hype (and Windows releases universally never have) Microsoft still have to make the assumption that people will pay a huge premium for these new features - there's no guarantee of that.
So while it's a nice idea that they'd go in and rewrite or "refactor" as you put it (would amount to the same thing) large chunks of Windows, it won't happen. It would be commercial suicide.
Ok, so bugs in OLE may be exposed in DCOM because it was built on the original framework. I may have the evolution path wrong... but the fact remains that legacy code exists from previous frameworks... exposing more risk than is required.
If you look at some of the attack vectors as of late including the RPC DCOM vulnerability that killed so many systems... there were needs for patches of this vulnerability back to NT4. This means this same bug exists in a section of code which is over 10 years old. (NT4 shipped in 96... code was written in 94) That's not all that shocking... bugs exist in any code base. The issue is that it hasn't been looked at and refactored since Microsoft retrained their developers to address secure programming. This is the point I am trying to get across.
‘Rip and refactor’ doesn't mean to blindly rip it out and remove functionality 'for fun'. A serious look would be needed at what the feature does, and what it impacts. I notice you didn't mention DDE when discussing the IPC timeline. I use this feature because it is such a perfect example. DDE is dead. Has been for a LONG time. Yet I see NEW products still using it to hack out half-vast solutions to problems on platforms now. Why? There are better solutions now, which have undergone code audits and are in active development. Why not capitalize on them? Why not strip DDE completely?
I respectfully disagree on your point that "Refactoring code doesn't make it more secure, it makes it easier to extend and maintain usually.". If done correctly, refactoring code CAN make it more secure, if you know how to apply secure programming principles at the time of rewrite. That would include threat modeling the functionality, learning the impact on the rest of the system and then resulting in a creation of both more manageable code and security tests. On top of that, the developer is forced through Microsoft's new source code management policy to run the code through their tools like prefix/Appverifier to ensure it passes the required tests. I would imagine (and this is just a guess) that they even have the ability to write these new security tests for these tools during the development cycle, making the new fixes also "test" better than their earlier versions.
I can't comment on Netscape’s rewrite past Joel's original comments. I think he clearly explains the growth hurdles that have occured because of the rewrite of Communicator, and the market penetration loss that Netscape had because of it. This very point is why I have changed my thinking about "refactoring" code instead of scrapping the whole thing. Yet I would like to point out that very FEW people actually used the “new” Communicator 6.0 for some time after the initial release.... until it was refactored in various revisions to fix all the problems that popped up.
There is a fine line between “touching what’s working” and realizing there is a responsibility to rectify insecure and brittle code. Joel had a good comment that “the idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they've been fixed. There's nothing wrong with it.” I agree with him on this right up to the last point. There MAY be something wrong with it because the evolution of secure coding practices has shown that the mentality applied to the codebase back then and now are different. I would imagine that if forced to re-evaluate many features, and looking at what newer more stable now exists, much would be rewritten or removed.
Your final point about Microsoft’s limitation due to business constraints is an interesting one. I cannot argue that this is a real issue for their company. I would counter it though by looking at the fact that Microsoft has had a lot of shareholder and PR value lost due to recent events that surround their coding practices of the past. Don’t kid yourself that the cost they continue to bear to regain consumer confidence and trust as it relates to security will require definitive action. I have already bulleted some of the positive maneuvers they have taken in this regard. If they don’t do this right and Longhorn as a platform is riddled with holes, Microsoft is going to have a SERIOUS problem maintaining consumer confidence, and significantly hinder their ability to regain IT manager trust. Businesses are impacted by attacks to their Windows’s systems that is having devastating effects on Microsoft’s customer base in both financial and business loss. Every security researcher, hacker and script kiddie wanna-be is going to be poking and prodding looking to make a name for themselves by finding insecurities in the platform. If Longhorn cannot stand up… migration to alternatives (such as OSX server, Linux etc) becomes a serious possibility. Why? Because Microsoft has gambled and touted their new practices as the solution to their security nightmare of the past. Meanwhile, platforms like OSX server and Linux continue to gain some footholds into that same market.
I don’t think it would be commercial suicide at all to refactor pieces of the system. You might be right that it could be to late… which was my point that if they were going to do it… it would have had to be done now.
Great insights here Mike. I appreciate your comments.
I basically agree with Mike.
Yet, I wish we could see more of a natural progression to reach Longhorn's goals. That is, evolution, not revolution!
Take a look at the various timelines of the Windows OS Products and related technologies: http://www.microsoft.com/windows/WinHistoryIntro.mspx. It's fairly obvious that over the last 4-5 years, new major releases are only coming out at best every other year, and not every year like during the mid-90's.
Don't forget that the main reason for Netscape deciding to "rewrite" and become Mozilla was not really because of unfixable code, but because they wanted to give MS/IE a bloody nose by giving their own source code away. As ESR quotes, "Open Source ... is not magic pixie dust": http://www.catb.org/%7Eesr/writings/cathedral-bazaar/cathedral-bazaar/ar01s13.html
In my opinion, Microsoft would be better to release early/often by integrating bits and pieces of Longhorn into service packs for Windoes XP/2000/2003. And by open sourcing some portions (but understandably not all), then can have a good chance of preventing more people from switching to other platforms. And hey, more eyes on the code means higher quality and better security, right? right? If the COM/DDE code had been available years ago, then these vulnerabilities would have been discovered and properly fixed years ago.
You don't have a contact page or an email posted on your blog. Probably a good idea.
Given that you are a security expert, I wondered if you were going to see Eric Bonabeau speak at the O'Reilly emerging technology conference? It the topic of his talk could be used to evolve the Windows kernel, as well as other subsystems, in the manner you are talking about.
I won't be able to see Eric Bonabeau speak. Very interesting track though. Would love to hear his insights, especially using the Navy war ship issue as an example. (Divide by zero errors stalling a war ship is never a good thing *lol*) Are you going? If you are, please blog about it over at lazycoder. I'll be sure to link to it.
Mike - I have to agree with you on some points. Mainly, Microsoft doesn't care about it. They care about profits, profits profits! (developers! developers! developers!). I think that Windows could be re-written from scratch and be a better, more efficient, more secure environment.
Yes, it'd require a HUGE amount of work rewriting drivers, and so on, and they'd basically be back where Linux was in 1992 or 1993. IE: starting from scratch. However due to their weight in the industry, it'd take less than 10 years for them to get up to speed as it has linux. I'd be willing to bet that if MS went to the major companies and said "hey, we have a new driver model that'll be in effect in Longhorn in 2 years, here's all the API information and a schedule of our training seminars, provided for free" they'd be able to do it.
But they'd never do it. There was an article a while back linked from somewhere about how the original core of Excel(?) is still in there, even though they planned to re-write it several times, but never did simply because re-writing from scratch is a big PITA. The deeper reason I believe is that taking a year to re-write a huge app like that, or a huge OS when you could shoehorn some better graphics in and add some extra features in a few months.
Microsoft, like any big company, is in it for the profit (*cough* I mean, the shareholders, erhm... shareholder profits), and doing a complete rewrite, or ripping out huge amounts of code and removing the functionality that allows 20 year old programs to run, is just not profitable.
Oh, forgot to say, on the other hand, it's not impossible to write a full new, secure and stable OS with apps, from scratch, as Mr. Torvalds and the folks working on Linux and it's apps have shown.
You make the word "profits" sound like a dirty word. It's not. It is part of business. Without it... Microsoft couldn't progress anywhere.
I think its this very issue that has Microsoft in a corner right now. If they wish to grow the company, they not only need to retain existing customers in an upgrade path, they have to get NEW business. This is hard to do.. compound mathematics isn't in their favour. Considering that they want a 10% growth every year.. you do the math.
My comments on the refactoring go to retain existing clients and attract new ones. If security continues to plague Microsoft products, trust can never be restored through the customer's eyes. However, its a fine line when investing resources to do this.
Microsoft is in the business of making software. There is nothing wrong on profiting from it... but they also must be responsible to their customers to make sure their products work as intended. With so many clients of Microsoft continuing to have mounding business and financial loss due to issues on Microsoft's platforms, Microsoft must strive to mitigate these risks... or fear losing customers.
No, nothing wrong with profits, but my point was that profits are what is driving them as a company, NOT making a better and more secure product. Right now they are in a position where they don't have to do these things because they have a monopoly position in the market. They make $x billion a year simply due to people buying new computers. These systems are (from my small exposure to a new computer with windows pre-installed) vulnerable and succeptable to most of the exploits and worms out there in the wild. However, they still get sold, and MS still makes money off them.
If the people of the world said "no, we're not going to buy any more windows apps or OSs from MS until the security is where it should be" the story would be different. Just because they don't grow by 10% a year doesn't mean that Bill will be buying foodstamps anytime soon of course :)
Personally I'd be in favor of a complete refactoring, or start from scratch for windows. Get rid of the old, start right. If people can't use their legacy 16bit apps then stick with XP, which will. Split the company into a "new" and "old" division, one in charge of keeping up with xp security patches and service packs while the other works on a completely new (from the ground up, no legacy support) OS. Educate users on how to use firewalls and firewall devices so that if they "have" to use their old apps on XP years into the future when the "old" division is disbanded they have protection. Encourage the vendors of the old apps that people are using to release new products, or point these people to other products that can help them. Tell people that things are going to break. If NewOS's browser is going to screw up FoobarCo's operation and BigCO is going to call Bill and bitch make sure that FoobarCo is educated and trained ahead of time so that the NewOS version works properly and doesn't break.
But do you think that there's a hope in hell of something like this happening? I don't think so (not a full ground up rebuild anyway, or throwing away insecure legacy support). Why? If it's not because it's unprofitiable, what would the reason be. Is a redo the "right" thing to do? Yup, you and I agree on this :)
I didn't go to the ETech conference. Cory Doctorow has posted his running notes on Dr. Bonabeaus talk at Boing Boing. I'm pointing to them and posting a copy of them at my site as well.
I'd love to hear from some other attendees who went to his talk and get their impressions.
Of particular interest to me is his investigation of "Google Bombing"
"If the COM/DDE code had been available years ago, then these vulnerabilities would have been discovered and properly fixed years ago."
This is a fallacy. FOSS such as sendmail and bind continue to have bugs found in them after many years. Having the source available doesn't mean that highly skilled, motivated, benevolent people will actually take the time to look at it, find the bugs, patch them, test the patches, and release them to the world.
Ever hear of Sardonix? Or why it failed? http://www.securityfocus.com/columnists/218
As for Microsoft
Ooops. Was starting to make another comment but decided against it. Nothing to see here; move along.