By Paul Rincon
BBC News Online science staff
An international team of 152 scientists has published a detailed map of more than 21,000 human genes.
The work is seen as a major advance in the efforts to make sense of the genome, the DNA code that guides the building and maintenance of our bodies.
Sequencing of the human genome was officially finished in 2003, but scientists still need to interpret this vast resource of raw information.
The H-Invitational Consortium's work should aid the investigation of disease
The consortium, led by Takashi Gojobori of the Institute of Advanced Industrial Science and Technology in Japan, compiled the database of genes characterised through the research so far.
It carried out an exhaustive analysis of 41,118 existing sets of complementary DNAs (cDNAs), synthetic molecules derived from RNA, which in turn correspond to genes in the genome.
This allowed the team to validate 21,037 functioning genes and identify 5,155 new gene candidates.
"The gene is a very nebulous concept," co-investigator Anthony J Brookes, of the Karolinska Institute in Stockholm, Sweden, told BBC News Online.
THE DNA MOLECULE
The double-stranded DNA molecule is held together by chemical components called bases
Adenine (A) bonds with thymine (T); cytosine(C) bonds with guanine (G)
Groupings of these letters form the "code of life"; there are about 2.9bn base pairs in the human genome wound into 24 distinct bundles, or chromosomes
Written in the DNA are about 30,000 genes which human cells use as starting templates to make proteins; these sophisticated molecules build and maintain our bodies
"A string of sequence can be used in many different ways to make different RNAs and different proteins. Those can be expressed in different cells in different places at different times. Should you call it all one gene? That is now a problem."
The analysis also shows that about 4% of the human genome sequence is missing or misassembled, say the researchers. Professor Brookes added that the research supported the theory that much of our DNA has no function.
"The genome wasn't designed by a computer programmer, from top to bottom. It keeps evolving all the time. There are bits of the genome and RNA molecules that are probably not doing much. Maybe they did once, but they don't now. Or maybe they're evolving a function."
Elspeth Bruford of the Hugo Gene Nomenclature Committee, at University College London, UK, told BBC News Online: "There are several databases out there that already do this sort of thing. But many work by electronically predicting the genes.
"The main thing that was different about this process was that you had human curation of every single entry.
"Then they tried to cluster [the cDNAs] to find out which were splice variants, and proceeded to identify similarity to known genes and look for encoded domains in proteins to predict their possible functions."
The H-Invitational Database contains different forms of proteins encoded by the genes, called splicing isoforms, predictions of the proteins that are manufactured and sites where the genes are active in the body.