The Human Genome is More and Less Than We Expected to Find

Maybe Genes Define Us, But How Do We Define a Gene?
We've all heard of the Human Genome Project. What you might not have known though, is that after 15 years it turns out that biologists might not even know what a gene is! For a long time now biologists have defined genes based on something called the "central dogma". This idea says that a gene is a piece of DNA that codes for a piece of messenger RNA (mRNA) that in turn codes for a protein. Our understanding of this last part is what's changing how we think of genes. Where did the central dogma come from? Why did biologists define a gene this way? In bacteria this is how it works. And the famous biologist Jacques Monod said "What was true for E.coli would be true for the elephant," meaning that things higher than bacteria (including elephants and us) would have the same system. Seems simple enough, right? Well, as the human genome project has come to completion, geneticists have to rethink this. It turns out that our DNA codes for lots of different kinds of RNA, only some of which become proteins. We knew about a couple types of non-protein coding RNAs like rRNA and tRNA (these have to do with helping to make proteins from mRNA) but what we didn't count on was the huge number of other RNAs that would be found. Which of these pieces of DNA we should count as genes is up for debate.
The Final Draft?
You may be thinking this is old news. Didn't they announce the completion of the Human Genome Project back in 2001? Actually the version released in 2001 was only a rough draft. Since then, scientists have been working hard to fill in some gaps and do some very important proofreading. They're still not 100% done, but they've done as much as they can with the current technology. Each time they check over the genome sequence and get a better picture of our DNA, the number of genes they say we have goes down. Before our DNA was sequenced, people thought we'd have 100,000 genes. Once we got the rough draft in 2001, the number fell to around 30,000. Now the final draft of the human genome puts the number between 20,000 and 25,000 genes. Why were initial estimates so far off? And why does it matter? Before we started to figure out our DNA, scientists looked at a number of other organisms. The simple bacterium E. coli was found to have about 4300 genes. The little worm C. elegans, which only has about 1000 cells, was found to have 19,000 genes. So since we are so much more complex than these organisms, everyone though that our genome would hold many more genes. Looks like this isn't the case. It's not that we are not as complicated as we thought we were. What scientists got wrong was that if something is complicated, it needs more genes. What it looks like is going on is that we are more complex because we have other things going on besides genes...at least the way we used to think about them. We have lots of little bits of DNA that code for RNAs that never code for proteins but instead control the RNAs that do.

More Information

Human genome nearly complete
Almost as many genes as you
Non-coding RNAs and Gene Regulation
If we think of building an organism like building a house we have to think about two things -- the list of building materials and the plans for putting them all together. These "plans" are what makes one organism more complex than another. Organisms all have the same building materials to start with, its just how you put them together that makes one a simple bacterium and one a complex human. For organism building, the plans consist mostly of gene regulation -- when and in what cells a gene will be turned on to make a certain protein. Until recently, scientists thought that most of the work of deciding when, where, and how much of a protein gets made was decided by other proteins. These other proteins would stick to DNA near a gene and turn that gene up or down. Now it looks like there is another level of regulation, at least in more complicated organisms. This level uses RNAs to affect how much of a protein gets made. And instead of sticking to the DNA to turn a gene on or off, these RNAs work directly on the mRNA from a gene. These RNAs can affect when and where a protein is made by getting rid of the protein-coding mRNA before it has a chance to make the protein. The DNA gets made into mRNA but then the non-coding RNA comes along and tells the cell, "Nope, not today. Send this mRNA to the junkyard!" Where do some of these non-coding RNAs come from? Believe it or not, from the middle of genes! In higher organisms like corn, elephants, and us, genes are set up differently than in bacteria. The parts of the genes that code for proteins are broken up into lots of pieces. The DNA between the "protein-coding" regions gets turned into RNA too but then is cut out or "spliced". The parts of the RNA that code for protein are then stitched back together and sent off to be read. People used to believe that the pieces of non-coding RNA that are spliced out were junk. Now it seems that they may in fact be an added layer essential to regulating gene expression. They're some of the non-coding RNAs that go around sending other RNAs to the junkyard. So what does this have to do with how a bacterium is so simple and a person is so complex? Let's go back to our example of building a house. When life on Earth first started, there wasn't much in the way of gene regulation -- it's like there were no architects. Bacteria and other simple organisms had all the building blocks and they put them together in a straightforward way. It's like they were building huts. Around 525 million years ago organisms suddenly got more complex. This is around the same time that these RNAs that directly regulate genes first appeared on the scene. It would have been like a bunch of architects moved to town. Suddenly the building materials that had always been around could be put together in many different ways. In addition to huts now there could be houses and even mansions. And just as you would expect that some of the architects that moved in would be world-class and some would just be pretty good, the same thing happened with evolution. The non-coding RNAs formed some plans that were really complicated (like the ones that make us) and some that were not so complicated (the plans that make fish, for example). Thinking about genetics in this way is more complicated than the DNA to mRNA to protein central dogma that biologists have followed for so long. But it helps to explain what we see in the world. We humans are made of the same building blocks as apes and even fruit flies, but are clearly different. It has to be because of the way our genes are regulated. If we consider the DNA that makes non-coding RNAs to be genes too, then what we thought when we first started the Human Genome Project would be true -- the more complex an organism is, the more genes it has. Whether the DNA that makes non-coding RNAs should be considered to be separate genes is a vocabulary issue for biologists to work out. One thing is very clear though. Now that the Human Genome Project has a complete listing of the building materials that make us what we are, scientists have a lot of work ahead of them in figuring out the plans.
Old school gene regulation
RNA can regulate genes