Researchers are fighting against time to save decades of data on the world's endangered languages from ending on the digital scrap heap.
Preserving languages for future generations
|
Computer scientist and linguist Professor Steven Bird of Melbourne University says most computer files, documents and original digital recordings created more than 10 years ago are now virtually irretrievable.
Linguists are worried because they have been enthusiastic digital pioneers.
Attracted by ever smaller, lighter equipment and vastly improved storage capacity, field researchers have graduated from handwritten notes and wire recordings to laptops, mini-discs, DAT tape and MP3.
"We are sitting between the onset of the digital era and the mass extinction of the world's languages," said Prof Bird.
"The window of opportunity is small and shutting fast."
Languages disappearing
"The problem is we are unable to ensure the digital storage lasts for more than five to 10 years because of problems with new media formats, new binary data formats used by software applications and the
possibility that magnetic storage just simply degrades
over time," said Professor Bird.
When you record material in MP3 format now, what will happen in five years' time when a new format comes along?
Prof Peter Austin, University of London
|
There are a number of initiatives across the world to ensure that endangered languages are saved for future generations.
"Linguists estimate that if we don't do anything, half of the world's languages will disappear in the next 100 years," said Professor Peter Austin of the School of Oriental and Africa Studies at the University of London.
"There are currently about 6,500 languages in the world, so that's 3,000 languages completely going, lost forever," he told the BBC programme Go Digital.
Professor Bird is involved in the Open Language Archive Community (OLAC), an attempt to create a international network of internet-based
digital archives, using tailor-made software designed to be future-proof.
"We're devising ways of storing linguistic information
using XML or Extensible Markup Language, which is
basically a language for representing data on the
web," said Prof Bird.
"XML is an open format that we can be sure will be accessible indefinitely into the future."
Cultural sensitivities
Researchers across the world see the potential of XML, but are aware of the burden this places on them.
"When you record material in MP3 format now, what will happen in five years' time when a new format comes along?" asked Prof Austin.
"The real challenge for us as archivists is to constantly upgrade the video, audio and image files that we have so that they can be integrated with these new XML documents," he said.
There are problems, however, with using the internet as a storage medium.
Many indigenous communities fear it could lead to unrestricted access to culturally sensitive material, such as sacred stories, which
could be abused or exploited, perhaps for commercial
gain.
Professor Bird says linguists recognise it is not a good idea to put sensitive material onto the internet without any safeguards.
"We are [looking at] the technologies used in internet banking for secure transfer and control - right at the point this material
is first captured."
In theory, a field researcher would enter information
about future restrictions as the material is recorded
or written down and those safeguards would accompany
the recording right through the data chain.