The life stories of the documents we create are becoming increasingly important as the scrutiny of industries and governments gathers pace.
Weapons of mass destruction are being sought in Iraq
|
Every time you write or edit these files you leave a trail of information revealing what you did and when you did it.
Even if you turn off the change tracking options in popular word processing packages, background tasks keep a minimal log of what happened when.
With the right tools it is possible to extract this data and work out the trail of authors and workers who created a document.
Close scrutiny
The UK Government was just the latest in a long line of organisations that has learned to its cost just how much information can be gleaned from innocent looking files.
Earlier this year it issued a document about Iraq's concealment of weapons of mass destruction that was written using Microsoft Word.
This document was found to be largely based on a journal article by Ibrahim al-Marashi, a postgraduate student at the Monterey Institute of International Studies.
It yielded much more than this too because every Word document remembers who made the last few revisions to it.
Some documents are revealing...
|
Some of this information can be seen simply by right-clicking to view the properties of the downloaded document in a file listing.
"This is not 007 territory," said Julian Murfitt, Managing Director of document tracking and management firm Mekon. "It can be achieved with the tools that are available already."
Utility programs can get even more information from Word revision logs.
The log reveals the names of four of the people who prepared the Iraq document for publication and the government Communications Information Centre that some of them work for.
It was this log that Number 10 press chief Alastair Campbell had to explain to the House of Commons Foreign Affairs Select Committee in late June as part of its investigation into the Iraq dossier's history.
Nick Spenceley, founder director of computer forensics firm Inforenz, said the format of text copied into a file and the templates used to style it can also reveal its origin.
The Word version of this document has now been removed from government websites but copies of it are still available elsewhere on the net.
Anyone downloading it can see that the last revision was made by MKhan, the logon identity for Murtaza Khan, a junior press officer.
Dr Glen Rangwala, who discovered that the February Iraq document was copied from the journal article of Mr al-Marashi, said the government seems to have learned its lesson and now issues documents in the Adobe Portable Data Format (PDF) format.
Perhaps coincidentally, he said, many of the older Word format documents on the 10 Downing Street site have now been removed and replaced with PDF versions.
Format fun
David Stevenson, Business Development Manager at Adobe, said PDF files were typically the final version of a document and did not reveal revision histories.
"If you go in to the document description then you will have the basic information," he said. "What you get will depend on the authoring software used to create it."
...but others tell you little.
|
He said tools in Adobe Acrobat, which is used to create PDFs, can log who worked on a document and ensure that only people who have explicit permission can make changes.
Like other sorts of files, PDFs can be signed with digital certificates that guarantee their origins.
Mr Stevenson said Adobe was working on ways to use web technologies to preserve the structure and format of documents to make them easier to ship around and share.
Mr Murfitt from Mekon said many firms were now looking at installing systems that make it easier to collaborate on documents and that log who did what.
He said banking and legal regulators imposed strict working practices on firms that force them to record the life histories of documents that result in new products or are involved in court cases.
But, he added, other firms were putting in place document tracking systems to help teams work together.
Often these systems use a single copy of a document that workers comment on, correct or annotate before a final edit.
"That's where collaboration becomes really useful," he said.