July 12, 2004

Office 2003/XP Add-in: Remove Hidden Data

If you have some time to blow one day, go onto google and do a search for some interesting files. During some competitive research I was doing one day I typed something like:

filetype:xls inurl:sales

It ended up giving me results for sale forecast information for various companies who have erroneously configured their servers. I refined the search with inurl, and was even able to pick up some interesting research in my industry.

Why do I tell you this? Well, although the powers of google made access to the document possible, it was a weakness in the document format that gave me the most competitive intel. Office documents routinely have extra information that the publisher may not have intended. You can see who worked on the document. With tracked changes you can even see what has changed over time. In my case, I was able to get a sale forecasting chart which included demograph details, with comments in particular cells of interest.

Scared? You should be. When you send an office document electronically, are you sure there is no extra information disclosure issues? Is proprietary information being leaked to competitors who take time to do this sort of research? This is one of the primary reasons our office policy is to only ship out PDF documents where possible. It removed such risks.

Well, Microsoft acknowledges this potential disclosure issue, and has come out with a solution. You can download the Office 2003/XP Add-in: Remove Hidden Data, which as the name implies, removes hidden data from office documents before you distribute them. You can do this in not only excel files, but word, powerpoint etc.

This is a great tool to have if you are distributing office documents outside of the office. Consider downloading and installing it today!

Posted by SilverStr at July 12, 2004 07:50 AM | TrackBack
Comments

Good morning Dana.

I am quite surprised no one has posted any comment upon your post. Without doubt, the subject of "corporate" data been indexed by searchbots, although interesting is commonly undervalued.

Lots of admins for whatever reason, decide to post corporate information (not intended for third party viewing or marked with a privacy indicator, etc) in public sites, whithout taking into notice that there is a possiblity of such content been indexed by searchbots (robots.txt mis configuration, rouge links pointing to indexable private content, etc).

I think you have stepped upon a quite interesting subject, and hope you can dedicate further investigation in your blog to it :)

Indexed Office documents is but a mere glimpse of the risks posed by misunderstanding search bots technology. Let me show you some google "queries" for all to skim the myriad of problems around this:

inurl:tsweb
"corporate presentation" filetype:ppt
"confidential" filetype:pdf
pwd filetype:txt
intitle:log filetype:log site:.com.ar
etc

he underground communnity has been aware of this for years, and have been using it for some more:
- footprinting
- information scavanging

Wild guess: "lamo" hacker could get into systems only using google found information .. (was this the name of the hacker who claimmed he only needed a Browser and an internet connection to hack an enterprise site?)

Well hope this post is an eye opener.

Mario

Posted by: Mario e. Santoyo at July 15, 2004 07:22 AM

Hi!
Very interesting. It inspired me to blog about Office meta data, tracking changes and XML. My blog is in Swedish so it may be abit hard to understand. :-) Anyway, thx for an interesting post!

Lars.

Posted by: Lars Olofsson at July 17, 2004 08:35 PM