Two decades after the draft sequence of the human genome was unveiled to great fanfare, a team of 99 scientists has finally figured it out. They have filled in vast gaps and corrected a long list of errors in previous versions, giving us a fresh view of our DNA.
Posted by union six papers online in recent weeks in which they describe the complete genome. The scientists say these hard-sought data, now under review by scientific journals, will give scientists a deeper understanding of how DNA influences disease risks, and how cells store it neatly rather than in molecular entanglements. How are chromosomes arranged?
For example, researchers have uncovered more than 100 new genes that may be functional, and have identified millions of genetic variations among people. Some of those differences probably play a role in diseases.
For Nicolas Altemos, a postdoctoral researcher at the University of California, Berkeley, who worked on the team, the view of the entire human genome feels a bit like close-up photos of Pluto from the New Horizons space probe.
“You could see every crater, you could see every color, from what we previously understood only the faintest,” he said. “It’s just an absolute dream come true.”
Experts who were not involved in the project said it would help scientists explore the human genome in more detail. Large parts of the genome that were left blank are now so clearly understood that scientists can begin to study them in earnest.
“The fruits of this sequencing effort are amazing,” said Yukiko Yamashita, a developmental biologist at the Whitehead Institute for Biomedical Research at the Massachusetts Institute of Technology.
A century ago, scientists knew that genes were spread across 23 pairs of chromosomes, but these strange, worm-like microscopic structures remained largely a mystery.
By the late 1970s, scientists had acquired the ability to pinpoint certain individual human genes and decode their sequence. But his equipment was so crude that hunting down a single gene could end an entire career.
At the turn of the 20th century, an international network of geneticists decided to attempt to sequence all the DNA in our chromosomes. The Human Genome Project was an audacious undertaking, given how much was in the sequence. Scientists knew that the twin strands of DNA in our cells are about three billion pairs of letters—a text that’s enough to fill hundreds of books.
When that team began their work, the best technology scientists could use was sequenced bits of DNA only a few dozen letters, or bases, long. Researchers were left to put them together like pieces of a giant puzzle. To assemble the puzzle, they looked for pieces with similar ends, meaning they came from overlapping parts of the genome. It took him years to slowly assemble the sequenced pieces into large swaths.
The White House announced in 2000 that scientists had completed the first draft of the human genome, and details of the project were published the following year. But long stretches of the genome remained unknown while scientists struggled to figure out where the millions of other bases lie.
It turned out that the genome was a very difficult puzzle to put together from small pieces. Many of our genes exist as multiple copies that are almost identical to each other. Sometimes different copies do different things. Other copies – known as pseudogenes – are disabled by mutation. A small piece of DNA from one gene can also fit into another gene.
And genes only make up a small percentage of the genome. Otherwise it may be even more shocking. Most genomes are made up of virus-like stretches of DNA that exist largely just to make new copies of themselves which are added back to the genome.
In the early 2000s, scientists did a little better at putting together the genome puzzle from smaller pieces of it. They made more fragments, read them more precisely, and developed new computer programs to assemble them into larger chunks of the genome.
From time to time, researchers will unveil the latest, best draft of the human genome – known as the reference genome. The scientists used the reference genome as a guide for their own sequencing efforts. For example, clinical geneticists will catalog disease-causing mutations by comparing patients’ genes to reference genomes.
The latest reference genome appeared in 2013. It was much better than the first draft, but it was way too long to be perfect. Eight percent of it was just empty.
“Basically an entire human chromosome went missing,” said Michael Schatz, a computational biologist at Johns Hopkins University.
In 2019, it was founded by two scientists – Adam Philippi, a computational biologist at the National Human Genome Research Institute, and Karen Miga, a geneticist at the University of California, Santa Cruz. Telomere-to-Telomere Consortium to complete the genome.
Dr. Philippi admitted that part of his inspiration for such an audacious project was that the missing gaps bothered him. “They were really bothering me,” he said. “You take a beautiful landscape puzzle, pull out a hundred pieces, and look at it—that’s too baffling for a perfectionist.”
Dr. Filippi and Dr. Miga call on scientists to join forces to solve the puzzle. He ended up with 99 scientists working directly on the sequencing of the human genome, pitching in dozens more to make sense of the data. The researchers worked remotely through the pandemic, coordinating their efforts on a messaging app, Slack.
“It was a surprisingly good ant colony,” said Dr. Miga.
The consortium took advantage of new machines that can read stretches of DNA reaching tens of thousands of bases. Researchers also invented techniques to find where particularly mysterious repetitive sequences lie in the genome.
All told, the scientists added or fixed more than 200 million base pairs in the reference genome. They can now say with confidence that the length of the human genome is 3.05 billion base pairs.
Within those new sequences of DNA, the scientists discovered more than 2,000 new genes. Most appear to be disabled by the mutation, but 115 of them look like they can produce a protein – the function of which could require years for scientists to figure out. The association now estimates that the human genome contains 19,969 protein-coding genes.
With a complete genome finally in place, researchers can get a better look at the variation in DNA from person to person. They discovered more than two million new places in the genome where people differ. Using the new genome also helped them avoid identifying disease-associated mutations where none are actually present.
“This is a major advance for the field,” said Dr. Midhat Farooqui, director of molecular oncology at Children’s Mercy, Kansas City, Mo. A K hospital, who was not involved in the project, said
Dr Farooqui has begun using the genome for his research into rare childhood diseases, aligning DNA from his patients against newly filled gaps to search for mutations.
However, switching to new genomes can be a challenge for many clinical laboratories. They must transfer all their information about the relationship between genes and diseases to a new map of the genome. “It will be a huge effort, but it will take a few years,” said Dr., a medical geneticist at Baylor College of Medicine in Houston. Sharon Plone said.
Dr. Altemos plans to use the entire genome to locate a particularly mysterious region in each chromosome known as the centromere. Instead of storing genes, centromeres are the anchor proteins that divide chromosomes around a cell. The centromere region consists of thousands of repeated segments of DNA.
At first glance, Dr. Altemos and his colleagues were amazed at how different centromere regions could be from person to person. That observation suggests that centromeres are rapidly evolving, as mutations insert new pieces of DNA into repetitive regions or cut other fragments.
While some of this repetitive DNA may play a role in separating chromosomes, researchers have also found new segments—some of them millions of bases long—that are not involved. “We don’t know what they are doing,” Dr Altemos said.
But now that the empty regions of the genome have been filled in, Dr. Altemos and his colleagues can study them more closely. “I’m really excited to see all the things we can discover,” he said.