November 22, 2005
The Cost in Fixing Bugs and How Irresponsible Disclosure doesn't Help the Matter
First off, a disclaimer. I am not a Microsoft employee, nor do I ever expect to be one. The views on this post are speculative at best, and down right wrong at worst. I have no real idea of how Microsoft makes its decisions when it comes to their software and security development life cycles, and base my assumptions on what I know, and what I have seen in the industry. And with that disclaimer out there, let me start a discussion on the real thing that drives how and when bugs get fixed, and how security considerations come into play.
In case your head is in the sand and you didn't realize it, software companies are BUSINESSES. That's right. They are in the business to make MONEY. Don't get shocked by this. Don't dwell on it. Accept it as a fact of life. Hopefully the software companies you work with day to day try to build quality product that solves real pain points for you. If they didn't, then you probably wouldn't care much, and have very little vested into their success as they aren't doing anything to help you (at this time anyways).
I could dig deep into discussing the economics of running a software business, and how software is (and always will be) shipped with bugs, but I don't have to. Eric Sink wrote an excellent article on My life as a Code Economist. In the article Eric brings up an interesting point:
The six billion people of the world can be divided into two groups:
1. People who know why every good software company ships products with known bugs.
2. People who don't.
Those of us in group 1 tend to forget what life was like before our youthful optimism was spoiled by reality. Sometimes we encounter a person in group 2, perhaps a new hire on the team or even a customer. They are shocked that any software company would ever ship a product before every last bug is fixed.
I will let you read his excellent blog to learn more about how he came to that conclusion, but would like to pull an important aspect from his post on the matter. There are four questions that a developer needs to ask themselves about every bug they are faced with:
- When this bug happens, how bad is the impact? (Severity)
- How often does this bug happen? (Frequency)
- How much effort would be required to fix this bug? (Cost)
- What is the risk of fixing this bug? (Risk)
Questions One and Two are about the importance of fixing a bug. Questions Three and Four are about the tradeoffs involved in fixing it. And you need to consider them all when looking at deciding what the right thing is to do for the customers using that product.
So what does this have to do with security and Microsoft? Well, Questions One and Two are covered off on Eric's site as well as articles such as Hard-assed Bug Fixin' by Joel Spolsky (Joel On Software). One of my favorite quotes from Joel was in his bug fixing article when he pointed out that:
Fixing bugs is only important when the value of having the bug fixed exceeds the cost of the fixing it.
Remember that quote. We will be coming back to it.
According to Joel, in the early nineties there was a financial reorganization at Microsoft under which each product unit was charged for the full cost of all tech support calls. So the product units started insisting that PSS (Microsoft's tech support) provide lists of Top Ten Bugs regularly. When the development team concentrated on those, product support costs plummeted. So bugs started to be prioritized by the cost impact to the product unit.
But something changed in the last few years. When it came to security, it wasn't about the COST to fixing the bug as much as it was the criticality and severity of the bug. It seems Microsoft reorganized their Security Bulletin Severity Rating System at the Microsoft Security Response Center to align with how and why bugs got prioritized within the product groups. The more critical a bug was, the higher the priority was to fix it in the grand scheme of things.
Now lets look at the Microsoft Internet Explorer "window()" Arbitrary Code Execution Vulnerability that is running around on the web. In my last post entitled "Again with the Irresponsible Disclosure - 0-day IE exploit in the wild" a lot of the comments by my readers were about the fact Microsoft had 6 months to fix the original bug, and that it was ok that some security researchers acted irresponsibly and posted a NEW attack vector using this original bug as a base without informing Microsoft.
Six months is a long time. No doubt about it. It would seem that there is no excuse on WHY Microsoft didn't have a fix for the original denial of service bug by now that we can easily see. Unless we consider what goes into fixing a security bug.
Most people who consider themselves technically savy (and may even be developers) can't fathem how hard it would be to simply make a change in the code base and fix it. Typically, that flawed arrogance is because they haven't actually worked on a code base with millions of lines of code. Or have had to test against so many deployment scenarios as Microsoft needs to.
Back in June eWeek ran an excellent interview with Microsoft's MSRC program manager, Stephen Toulouse (Personal blog, Company Blog) . There was a piece in that article which I think has some bearing on this discussion.
In some cases, particularly when the Internet Explorer browser is involved, the testing process "becomes a significant undertaking," Toulouse said. "It's not easy to test an IE update. There are six or seven supported versions and then we're dealing with all the different languages. Our commitment is to protect all customers in all languages on all supported products at the same time, so it becomes a huge undertaking."
"This is exactly why it can take a long time to ship an IE patch. We're dealing with about 440 different updates that have to be tested. We have to test thoroughly to make sure it doesn't introduce a new problem. We have to make sure it doesn't break the Internet. We have to make sure online banking sites work and third-party applications aren't affected," he added.
Internet Explorer updates are also cumulative, meaning that they address several newly discovered vulnerabilities and all previously released patches, causing even more delays when the new fixes are bundled into older updates.
"This is why it takes so long, but that's not to say that if there's an exploit, we won't accelerate testing and get it out there as fast as we can. But if we find problems in the testing phase, it could trigger a restart and cause even more delays," Toulouse said.
Think about that for a second. A single code change has to go through all that. It could take weeks.... even months, depending on the scope and impact of the change. And to top it off, Microsoft is a business that has to weigh everything accordingly. Remember the four questions Eric Sink brought up? Severity and frequency help to determine how immediate the fix may need to be, and how costs of producing the fix, weighed against the risk of breaking something else, means that a lot more investment must go into the fix that you would originally think. If Microsoft ranked the DoS of the original bug as Low or Moderate, it may not see the light of day right away as other bugs will take precedence.
Remember Joel's quote about how fixing bugs is only important when the value of having the bug fixed exceeds the cost of the fixing it? Can you imagine now how much "cost" is associated with this bug now that its severity has probably been elevated to Critical at Microsoft? We can probably expect a fix to be coming soon. I bet it's jumped queue and has been reprioritized as something major. Is this evidence that the security researchers were the catalyst by releasing the PoC exploit, proving that responsible disclosure isn't required? I don't believe so, and let me tell you why.
The potential financial impact of this action goes beyond the cost to Microsoft for fixing the bug. Or the cost to Microsoft for testing it. Or the cost to Microsoft for the loss of trust by customers. The real financial impact could end up falling to us, the end users of the product. One reader stated in my last post that "Now itís all out in the open, at least we know how to counter the threat, and perhaps now Microsoft can finally give it the proper due care and attention it deserves." There may be a flaw in that thinking. The impact to the real world businesses out there could end up being more severe on an averaging basis. If a malicious payload using this new attack vector comes across the Internet it can have a huge financial impact on businesses as they have to pay to repair the damage. Labour costs, lost productivity and lost credibility of the business could do more harm than good. I don't see that as a benefit. Do you?
What about the fact that maybe now "Microsoft can finally give [the bug] the proper due care and attention it deserves."? Well, if the researchers would have reported the NEW attack vector to this bug to email@example.com I can pretty much guarantee (using their history over the last year as a base) that Microsoft would have taken swift action to firstly reprioritize the bug, and secondly take action to mitigate it. But Microsoft didn't get the chance to do it. As a DoS bug, maybe Microsoft felt the value of having the bug fixed did not yet meet the cost of the fixing it. Maybe their is a plethora of more critical bugs that need to be fixed first. I don't know. I don't work there. But I do know this... the potential financial impact NOW thanks the the UK security group is much higher than it was a few days ago. Real risks against our environments are exposed by their actions. As a whole, we didn't gain from this. Microsoft certainly didn't. But the UK security group did.
Oh wait... wasn't that their intent in the first place?
Since they gain from it... should they also be liable for any damage it causes? Or should Microsoft? Un oh... now we are getting into liability and the software industry.... an entirely different can of worms. Something beyond the scope of this discussion.
So what exactly am I saying? That its ok that Microsoft didn't fix the original bug in a timely manner? Nope. Far from it. My point is that fixing bugs has a cost base to any software business. And to every business that uses software. We need to understand that it doesn't happen in a vacuum. Next time you want to yell at Microsoft (or any software vendor for that matter) for not fixing bugs, ask yourself what the REAL cost is to you if they don't ship you a patch right away. Or the cost to you if they ship you a shotty one. Consider the current risks to your business because of the flaw, and the financial impact the flaw actually has on you. Now consider irresponsible disclosure and security researchers releasing exploit code that can assist in the creation of malware before the vendor has a chance to make a patch. How does that HELP you? if anything it may accelerate the release of a patch. As we saw earlier in the eWeek review though, this might actually help cause an ineffective patch. And that is no good.
Irresponsible disclosure just doesn't help the situation. We should not be so willing to accept it as being an acceptable practice.
Posted by SilverStr at November 22, 2005 09:16 PM
I am curious - if irresponsible disclosure doesn't help the situation, how does responsible disclosure help it?
Responsible disclosure does a few things. Firstly, it gives the vendor time to research and fix critical flaws in their software and get it to their customers after testing can be properly completed. It can also keep vendors on their toes, as security researchers who follow a standard disclosure timeline help keep the vendor honest by forcing them to an acceptable timeline. That is, of course, if the vendor acts responsibly and follows disclosure timelines themselves. In a moment I will show how Microsoft isn't the best at this.
Further to this, it allows the credit to still go to the original authors of the report. It shows the industry that the researchers not only know their stuff, but respect the fact that responsible disclosure gives everyone time to brace for the fix/patch.
Responsible disclosure also allows us to measure vendor responses. I like how Eeye does it with their Upcoming Advisories. If you check it out, you can see that Microsoft isn't doing so well in this regard. This is where Microsoft should be getting yelled at. Notice how some items are over the normal 60 days and show criticality as being High. Risks exist for the Window of Exposure here, but the world isn't exploiting these particular vulnerabilities as they don't know about them. Now imagine if Eeye decided to blast out PoC code for all of these items. Can you imagine the damage to the business infrastructure?
Lets focus on solving the problems instead of creating them. We should be screaming at Microsoft for leaving a critical IE flaw such as
EEYEB-20050505 overdue by 140+ days that they have known about. Not one that is 2 days old. Remember they only learned about it thanks to the UK security group releasing a PoC exploit for the world to see. Now they have to focus on fixing that issue, and probably stalled on EEYEB-20050505.
But you are overlooking one important point!
Microsoft never responded to the original DoS Vulnerability, clearly, this was never on the cards to be fixed.
Several security aggregators including secunia rated this vulnerability as low.
By Microsoftís own admission, they do not perceive localised application DoS vulnerabilities as a threat FACT!
If you read Microsoftís advisory, this issue was originally labelled a stability problem Ė not a security threat.
Only DoS vulnerabilities that have an adverse affect on the stability of the operating system or service warrant a security bulletin. Fundamentally, this issue was overlooked not only by Microsoft, but the rest of the security community.
The way I see it falls into three possible scenarios
1, The security researchers donít tell anyone, nothing gets done about it, and we are all no worse off - Except what if this vulnerability is already being exploited in the wild (6 months is a long time).
2, Microsoft are informed, but we still have to wait another 4-6 months for a patch, again what if this vulnerability is already being exploited in the wild.
3) It is all brought out in the open, users know of the risk, and how to implement a temporary solution while Microsoft fixes the issue.
This is a no brainer; it has to be option 3.
For me this vulnerability is more than just your regular Full-Disclosure Zero day exploit scenario. In fact, you canít label the posting of the proof of concept Zero day, itís more like 180 day.
What Richard said!
I also notice your constant use of *NEW* to describe the code execution attack vector. This is flawed thinking.
It is Microsoft's responsibility to take the *SIX MONTH OLD* warning, investigate the true severity and determine that it's not merely a DoS issue. They've had six months to do that, so this cannot be described as new.
Again, sorry to harp on this, but the irresponsible party here is Microsoft.
"it gives the vendor time to research and fix critical flaws"
Why would you set a timeline for fixing flaws if you want vendors to have more time? Wouldn't you be better off with no disclosure at all?
"It can also keep vendors on their toes"
Are you shooting for perfect software? If not, what level of quality is appropriate? Why should vendors be kept "on their toes" and what other ways exist to do this?
"but the world isn't exploiting these particular vulnerabilities as they don't know about them"
Why do you think the world doesn't know about them? Don't you think it is more rational for anyone who believes this to prefer that nobody ever disclose any vulnerabilities?
Are we mixing up the original DoS with the new code execution ability of the flaw? With the information present at hand back in May, Microsoft and all the security sites rated it as low. According to the rating system Microsoft uses, the original report may fit that category.
To your comment of "Fundamentally, this issue was overlooked not only by Microsoft, but the rest of the security community.", you are correct. To be fair though, they based it on what they knew at the time. The real diservice that occured here was that during analysis of the report, Microsoft missed the execution threat. I would assume that the threat model they would have performed on this piece in IE was not completed to the depth that it should have, which gets us to where we are today.
I still don't agree that it was right to release the exploit code into the wild without even attempting to follow proper disclosure rules. It put us all at more risk than we need to be at.
Maybe my thinking is flawed. You are right that Microsoft should have come to the realization that this bug has a larger damage potential than originally thought. However, we can't think for them. It's not always easy to spot everything. We are all human. They apparently missed this. Their mistake. And one they will have to own up to.
With that said though, my point is that when the bug was thought to be a DoS issue, it was rated as Low. Microsoft never got a chance to deal with the *new* (yes I said new) attack vector which elevates this rating to Critical. If the UK security group wishes to claim that they are a legitimate security firm interested in protecting their current and potential clients, they should have done the responsible thing and informed Microsoft of their findings so they could work on a fix to get everyone protected before exploit code hit the Internet.
This didn't happen, and now people are scrambling to deal with it at extra cost and burden to their business.
"Why would you set a timeline for fixing flaws if you want vendors to have more time? Wouldn't you be better off with no disclosure at all?"
No I don't think so. We wouldn't be better off. Without knowing the threats that we are suceptible to, we cannot build secure systems. I don't disagree with Richart and black_cat that people have a right to know they are at risk. But this has to be balanced with the entire business ecosystem, not just a few people. In other words, we need to weigh what the public needs to know now, verses what happens when everyone knows. This is a common dilemma at the nation security level, and its really no different here, but in a different scope. The timeline gives a window to the vendor to make a fix and push it to the world, before the rest of the world has to deal with it in any major scope.
Let's face it. If we profile the adversaries to our systems we quickly see that in most cases its the lazy script kiddie that takes these exploits and launches them to cause significant damage. For fame, glory or what have you, their curious intent hurts us all. Will sophisiticated hackers in the underground probably already have heard about this attack vector before the rest of the world, and use it to their advantage. Probably. But are the masses typically at risk to this? Probably not.
Why is that the case? Because without an asset of interest, the professional hacker won't waste his time on you. And if you follow a defence in depth posture, you will probably have mitigated against this threat anyways. They aren't going to go nail grandma's machine... there is very little in it for them. And they probably aren't going to go and nail you, unless you have assets they are interested in.
"Are you shooting for perfect software? If not, what level of quality is appropriate? Why should vendors be kept "on their toes" and what other ways exist to do this?"
There is no such thing as perfect software. There will always be flaws. I've said this before, and I will say it again. Security is a property supported by design, operation, and monitoring... of correct code. When the code is incorrect, you can't really talk about security. When the code is faulty, it cannot be safe. The quality appropriate is to the point that the code acts correctly as intended. And that includes acting correctly in its error code paths when something doesn't go as expected. I don't know how to keep the vendors on their toes past challenging their code through disclosure. Some vendors will ignore it. Others will take it to heart and make the effort to fix the flaws. But by not telling them, how will they ever know?
"Why do you think the world doesn't know about them? Don't you think it is more rational for anyone who believes this to prefer that nobody ever disclose any vulnerabilities?"
Far from it. As I said earlier, I think everyone has a right to know about the vulnerabilities. I am just asking that we act responsibly and let vendors have enough time to deploy fixes, and give users enough time to apply them. As an example, I would argue that Eeye has every right to keep Microsoft on their toes by disclosing more information on issues such as EEYEB-20050505 to the public. Although I would refrain from releasing PoC code that could help attackers, I think it would be relevant to let us know WHAT the problem is, and any mitigating strategies that could be used. Microsoft has had 202 days to come out with a fix to that one. Unless someone challenged them on this, they will continue to abuse the fair disclosure rules.
This is very simple, given all the unique circumstances surrounding this issue, the information had to be fully-disclosed. Keeping it under wraps would have helped no one.
Considering this vulnerability has been in the public domain for 6 months, you have to assume it is already been exploited (always look at the worse case scenario).
Again, reiterating one of my previous points, informing Microsoft first would have prolong our exposure perhaps by another 4-6 months.
Far too many people of making the incorrect assumption that it is only now we are all vulnerable. NOT SO
We have been vulnerable since the original disclosure of the DoS vulnerability 6 months ago. Iím sure the British researchers are very talented, but you are not telling me that they are the only ones that discovered a way to exploit this problem.
Fundamentally, there is no security through obscurity.
Hi, Dana - thanks for your indulgence in answering my questions. A few more:
"Without knowing the threats that we are suceptible to, we cannot build secure systems."
Don't we already know what threats we are susceptible to? Is each individual vulnerability really that important? If so, how many do we have left to find?
"we need to weigh what the public needs to know now, verses what happens when everyone knows."
Why does the public need to know now? How is this different from the 140-day old vulnerability that eEye is tracking or the x-day old vulnerability that only the black hats are aware of?
"in most cases its the lazy script kiddie that takes these exploits and launches them to cause significant damage."
It seems like you are suggesting that the only point where the masses are at risk comes after we disclose vulnerabilities ("But are the masses typically at risk to this? Probably not."). If this is the case, shouldn't we be seeking a situation where there is lower risk?
"But by not telling them, how will they ever know?"
Aren't we mostly concerned about incidents? Wouldn't the incidents speak for themselves and operate as a much stronger incentive for vendors?
"I think everyone has a right to know about the vulnerabilities."
Can we know about ALL vulnerabilities? Don't we already know about enough of them? How much is enough? When are we done?
One final question: Do you really need to know about specific vulnerabilities in order to mitigate them?
Again, thanks for the answers.
Two quick things:
1) Open-source projects can often have multiple businesses and people not motivated by profit or something else easily definable, which makes this discussion only relevant for a large company with proprietry code.
2) I don't think the binary division of the worlds population into "People who know why every good software company ships products with known bugs." is a fair treatment of the 4 billion poor, repressed and sometimes starving tech-illiterate.
What about rootkits? They have been documented for over a year? What has Microsoft done about it? I mean, aside putting every single PC on earth at risk only because MS doesn't bother release a rootkit detection and removal tool?
Rootkits have been around for more than a year. Attackers are just now being more aggresive about using them. What has Microsoft done about it? In Vista, you no longer can patch the syscall table in the same manner that you can now, making it much more difficult to override default system calls which the rootkits use to their advantage.
Also, one must be careful in the current scope of what a 'rootkit' is. A LOT of the current security tools out there now (including some of mine) use these same techniques in their architecture. In other words, we have similar traits that could possibly make us look like a rootkit. When in the kernel, its far too easy to blurr the lines between good and bad intent. Microsoft can't arbitrarily decide if an AV filter or Host IPS is good or bad, and can't simply make judgement calls to remove it. In Vista they stepped back and said all vendors will now have to figure other ways to do this. Where possible, they have tried to make kernel APIs that give the functionality needed (such as the registry call back routines instead of hooking the registry calls directly), while limiting the impact of malicious code.
Also remember... the code base that rootkits are prevailant on was written years ago. All the REAL security changes going on at Microsoft won't be seen until things like Vista come to market. Windows Defender (aka Microsoft Antispyware) is just now getting signatures for known rootkits which it can now remove. As an example, I believe it can now remove the Sony DRM crap. It takes time. It won't be done over night.
I'm only an amateur at security testing (while being a professional developer), but I do know that an exploit that crashes a client application might indicate a remote-code-execution vulnerability. For those less aware, here's the reason. Let's assume the vulnerability is actually remote code execution, but the application crashes rather than executes some exploit-provided code. What's happened is that the part of the exploit input that was executed either contained an illegal instruction (some sequence of bytes that the processor does not recognise), accessed some part of memory that was protected against that access (tried to write to read-only memory, or read from or write to an address that wasn't allocated at all), or corrupted some part of internal program state such that a later part of the program then caused one of the former two actions to occur.
So Microsoft - and anyone else - should be considering client-side DoS vulnerabilities seriously, as if it were actually a remote code execution. Therefore I'd expect the patch to actually be written. But it still might not be _released_. Why not release a patch? Any change to code can cause errors, as Eric Sink pointed out. You want to keep your patch in testing as long as possible. If no serious exploit for the vulnerability appears, you simply roll the patch into the next main release of your software (I note that IE 7.0 Beta 1 doesn't seem to be vulnerable to this issue), or you might roll it into a forthcoming cumulative update. When the more serious exploit is reported, you can then release the patch with confidence that it's as well tested as you can make it.
I can't fault M$ for shipping buggy software, for miscategorizing a reported issue as insufficiently dangerous, or for being blindsided when researchers developed a way to escalate the bug into a full-on exploit- and released the code.
However, I can't let them off the hook for allowing the business need to destroy netscape override good design. It's clear that many of the nasty IE exploits and bugs are because IE takes a privileged position in the operating system. It's equally clear that IE was put there to forestall the antitrust case. Had it been broken out as a standalone program, capable of running under a different security context, the localization and versioning issues would have been significantly reduced.
M$ screwed over tens of thousands of users- anyone who has ever had a spyware infection loaded in through an activeX control- because they were playing semantic games with the legal system. And in that context, I have a hard time feeling much sympathy for the world's richest software company when the inevitable bill comes due, and M$ has to go through a larger testing regime.
I have an issue with this idea;
>>They aren't going to go nail grandma's machine... there is very little in it for them.
because I'm afraid that's simply wrong, and suggests that contemporary threats aren't being understood.
Grandma's machine is PERFECT, exactly the kind of target most desired -- especially if Grandma is well-off enough to afford a broadband connection. She's not informed or savvy enough to realize her machine's been zombified to distribute spam, and participate in DoS attacks. She probably doesn't even have, let alone regularly run detection software, and wouldn't know what to do about it if she did. The script kiddies may not be too interested in her machine, but for the serious, malicious hacker she's exactly what they want.
She, herself is in no danger. If the hacker has his way, her machine will continue to work quietly and well for her, never giving any hint that it has been suborned, for weeks, months, even years before anyone notices. It's the net as a whole that will suffer.
From that perspective, the severity looks a little different, no?
I really dislike the practice of evaluating a defect's "severity": that's a coder's word, a techie word, and glosses over too much.
Instead, change to the customer's perspective, and use the dual words, "pain", and "effort" -- how much trouble does it create for the customer (even unknowingly) and how hard is it for the customer to work around or do without? (Or, if you like, how much does this encourage the customer to try the competition's product instead?)
If you stop asking "what's the severity?", and ask instead, "how much pain does it cause?", you get a lot closer to understanding the real impact of a defect, and a whole other perspective on what your priorities ought to be.
Tim, you are completely and utterly wrong. IE does _not_ have a privileged position in the operating system. It is purely user-mode code. However, since the vast majority of the code takes the form of published APIs, other teams at Microsoft, and third-party developers, have made use of that code - as Microsoft intended. This does mean that errors in the rendering engine, HTML and URL parsing and HTTP libraries have a greater impact than might have been the case had Microsoft built Internet Explorer (as they did for versions 1.0 and 2.0) as a monolithic program. A vulnerability has the potential to impact many more programs - however, since the libraries are system libraries and therefore not installed side-by-side, Microsoft's single patch should in theory fix the vulnerability in all those programs.
As always when using system libraries, third-party developers are somewhat at the mercy of the system developer when it comes to fixing vulnerabilities - and in resolving compatibility issues with patches.
Don't put the cart before the horse. Internet Explorer 3.0 was componentized because there was plenty of functionality in there that made sense to expose as platform features; I've used WinInet numerous times, mainly for FTP but on a couple of occasions for HTTP. Windows 98's HTML Help made use of IE's rendering engine because Microsoft wanted a richer experience for help, with the ability to use more standard tools - the previous generation of Help used RTF files - that could more easily produce both standalone help and online help on the Web.
Consider two claims:
1) Every security issue should have been disclosed immediately !
2) Every product has unknown number of bugs, some part of them are security issues
It gives simple result that we cannot use Internet anymore...