Other Fun Stuff

How do scientists figure out that a certain gene is involved in a certain trait? Like, how did they find the blue eye gene?

-A curious adult from California

December 16, 2009

That is a great question that isn't easy to answer. I'll focus on the blue eye story as I answer it but it really works for any relatively simple trait.

First off, there isn't really a blue eye gene. Instead, there is a DNA difference in an eye color gene that leads to blue eyes. This might sound like I'm nitpicking but I'm really not"¦the same eye color gene can cause brown eyes too!

To find this DNA difference, scientists need to compare the DNA of people with and without blue eyes. They then look for DNA that brown-eyed people share and that blue-eyed people don't*. From this they'd find the one difference that leads to blue eyes.

This might sound easy but it isn't. There are lots of other differences besides blue eyes between the two groups. Some DNA differences will lead to blonde hair, dark skin, nearsightedness, etc. And some won't lead to any obvious differences.

What the scientists need to do is sift through all of the DNA differences and find the one that causes blue eyes. This is a lot of DNA to search"¦literally trillions of bits of data.

Simple Alphabet, Complex Instructions

Your DNA has the instructions for making you. And since you're so complicated, so too are these instructions.

The code that DNA is written in is made up of just four different letters -- A, G, C, and T. Humans have around 6 billion of these letters strung together in a certain order.

What makes each of us unique are small differences scattered throughout these 6 billion letters. On average, we all have a difference every thousand or so letters.

At first this makes finding certain differences sound easy because we are all actually pretty similar. Until you start crunching some numbers, that is.

One in a thousand translates into around 6 million differences between you and me. And the differences don't all happen at the same spots. We share some differences and not others. This means that we have many more than 6 million areas of DNA to look at.

So this is our job. We need to sift through these millions of differences in thousands of people to find the one that matters for blue eyes. We can't do this by just reading all the trillions of letters involved.

We don't yet have the technology to read all that DNA (although we are getting closer). Nor do we have the computer programs to analyze it. Instead, we need to break off more bite sized pieces of DNA to look at.

The Book of Life

It is easier to think about DNA if we treat it as a book. Since I have blue eyes, we'll use my DNA as an example.

As I said before, my book is about 6 billion letters long. If we were to type that out single space with Courier font, type size 10, we'd have 1.5 million pages. That's a stack of paper over 500 feet tall. Pretty big book!

Somewhere in there is the DNA responsible for my blue eyes. To find it, we're going to need a few tricks up our sleeves.

One thing we know is that there are certain spots on our DNA that tend to be different more often between people. Think of them as commonly misspelled words like acquire or accidentally.

If we were looking for spelling errors is a really big book, we might first focus on the top 100 misspelled words. We'd look for aquire, accidentelly, etc. and fix them as a first pass. We do the same thing with our DNA.

Most scientists start out looking at anywhere from 500,000 to one million of these spots all at once. Of course they don't control F in a Word program. Instead they use something called a DNA chip to look at these areas of DNA.

A DNA chip is a glass slide with millions of individual DNAs attached to it. A scientist would take my DNA and chop it up and treat it so that it can be seen. My DNA would then be washed over the slide and it would stick to any DNAs that matched.

We would then have a readout of what my DNA looks like at all of these different spots. We'd then do the same with lots of other people with blue eyes and compare them to a bunch of people with brown eyes.

To make things simpler, scientists often concentrate on family member who happen to have different traits. For example, in our case we would look for a family where some siblings had blue eyes and others had brown.

Siblings share around 50% of their DNA because they have the same parents. So if we use them in our study, we can rule out lots of the DNA they have in common. This makes our jobs much easier.

In one of the first studies that looked at blue eyes, researchers looked at over 3000 different people's DNA. That is literally trillions of letters of DNA.

It is very rare in these studies to actually find the key difference on the first pass. Instead, what we'd probably find is a DNA difference near the one we're actually interested in.

Homing in on Blue Eye DNA

Imagine we found a few differences that brown-eyed people tended to share compared to blue-eyed people. The next step would be to look more closely at the DNA around those spots.

This works because DNA tends to be inherited from our parents in blocks. What this means is that a lot of people with blue eyes will share many DNA differences around the one that causes their blue eyes.

Our DNA comes in 46 big chapters called chromosomes. Scientists use their initial DNA chip data to narrow down what part of which chromosome the DNA difference is on. For blue eyes, the scientists focused on a section of chromosome 15.

So they did another DNA chip study and looked at more common DNA differences in this region. They found a few that all brown-eyed people shared and that blue-eyed people didn't have. The next step was to look for any genes that might be nearby.

If chromosomes are chapters, then genes are the instructions for one small part of whatever you're building. They're sort of like those Lego instructions for building a head that then gets attached to a body. Or for a single recipe in a cookbook.

The scientists focused on a gene in this region called OCA2. This gene was known to sometimes be involved in a decrease of pigment (albinism) so it was a good guess. And they got the right gene. But not the right DNA difference.

Scientists need to be careful not to let their preconceived notions skew the interpretation of their data. OCA2 was a logical guess but the actual DNA difference that causes blue eyes appears to be within a nearby gene called HERC2.

This DNA difference actually ends up affecting how OCA2 works. So the scientists weren't hugely off in their guess.

As you can see, finding a DNA difference or a gene for a simple trait is no picnic. Something more complicated like obesity or diabetes is even harder.

Something Harder Still

Finding genes associated with complicated traits like obesity is tricky. Scientists have to wrestle with the fact that many different genes can contribute to obesity. And that the environment can affect the trait too. In these situations, it is very hard to tease out which DNA differences matter and which don't.

This all means that a researcher may not find a lot of people who share the exact same reason for ending up obese. As I said, there is more than one gene involved. Imagine one group of obese people has DNA differences X, Y, and Z that increase their chances for being obese. Another group has A, B, and C. The third group has X, Y, and A. And so on.

Now imagine that another group is obese because they eat a lot of saturated fats and have a certain DNA change. And another group is obese because they don't exercise as much as their genes tell them to. And so on.

To figure out obesity, scientists need to look at lots and lots of different people. And who knows if they'll ever figure it out. It may be that there are so many variables involved in becoming obese that 6 billion people isn't enough to sort it all out.

* They had to focus on brown eyes at this point because blue eyes are recessive. That means some brown-eyed people can have a copy of the blue-eyed gene.

How DNA chips (or microarrays) work.

By Dr. Barry Starr, Stanford University

Our instruction are made up of just 4 letters.

Typed out, your instructions would need a stack of paper over 500 feet tall.













Scientists find genes using DNA chip data like this.