Management

This federal team put the last few pieces into the human genome puzzle

Like so many projects, sequencing human genomes has gotten harder the closer the work came to completion. A National Institutes of Health team spent seven years...

Tom Temin@tteminWFED

August 3, 2023 1:19 pm

10 min read

Like so many projects, sequencing human genomes has gotten harder the closer the work came to completion. A National Institutes of Health team spent seven years heading up a worldwide consortium assembling the last 8% of the human genetic code. For its work, the team has made the finals of this year’s Partnership for Public Service’s Service to America Medals program, aka “the Sammies.” Federal Drive with Tom Temin spoke with Dr. Adam Phillippy, who is part of the NIH team, which also included scientists Sergey Koren and Arang Rhie.

Interview Transcript:

Tom Temin Now, we all heard about the human genome mapping and there was a public project. I think Dr. Collins, you know, headed that up years ago. So I guess people assumed it was all done. We had the whole human genome sequence, but apparently not the case.

Adam Phillippy Yeah. In fact, even as recently as a few years ago, I would run into some colleagues on the NIH campus and we would talk to them about the human genome, the human reference genome, and they would be shocked in some cases to find that if you actually would open up the file of that genome and look literally at the ACGs and T’s, there were some stretches of millions and millions of the letter N for unknown, and those were the bits of the genome that we wanted to go in and tackle. You might have heard it back in the early 2000 when our first draft was released in 2001 as mapping the human genome. And so one way to think of it is that you’ve got the map, but there’s a bunch of terra incognita on there that are just gaps unknown. Nobody’s seen what’s in there before. And those were the bits that we were very curious about, wanting to figure out what was in that Unknown.

Learn about the government’s efforts to move away from storing hard copy records in our latest ebook, sponsored by Canon.

Tom Temin And did the unknown territories relate to some important part of the anatomy, like human intellect, for example, versus the limbic system that every animal has or something like that?

Adam Phillippy Yeah, in fact, they are some of the most important bits of the cell for basic biology, like things like cell division. So the centromere is are where the chromosomes come together and get pulled apart during cell division for anybody that remembers their high school biology. Also the production of the ribosome. So these are the molecular machines that crank out the proteins that are needed for every action of your cell. The genes that encode for some of those components of the ribosomes are contained within these unknown regions of the genome. And the hallmark of these unknown bits is that they were the hardest and they were the hardest because they’re highly repetitive. And so if you’re thinking of a book, this is like the same phrase repeated over and over and over again many times. And that makes it difficult to reconstruct. If you think about putting a puzzle together and it’s a jigsaw puzzle and you have a hundred copies of like the same house or the same person, or sometimes I give the Where’s Waldo example of the same character repeated many, many times when you pick up a piece and it has Waldo on it, you don’t know which of the Waldos it is. And so figuring out where in the genome that particular copy goes is what makes it difficult. And indeed, it’s replicated in some cases because it is important. You need a lot of these machines to crank out your proteins. And so there’s a lot of copies of that particular gene and that makes it difficult to reconstruct.

Tom Temin And was the project that you did mostly an informatics computational exercise, or were you still looking into cells with electron microscopes?

Adam Phillippy So it is a long term study in the sense that, you know, we put the capstone on the end here, but it’s really building on 20 years of technology development, both in the private and commercial sectors and both on the actual biochemistry and reading DNA and the informatics of processing the outputs. The real critical breakthrough happened within the last ten years or so, so-called long read DNA sequencing. And that means very simply that we can read longer stretches of the genome than we were able to back in the early 2000s with the Human Genome project that Francis and others led, we could tap out at maybe 500 to 700 individual letters at a time, and then you would have to put all of those pieces together like a giant puzzle. That was the computational challenge. It turns out that that was an impossible computational challenge for some regions of the genome. The pieces were just too small. Within the last ten years, we have new technologies now that can read 10,000, even up to a million characters at a time. And now the puzzle pieces finally reached the length that they’re big enough. We were able to develop some new algorithmic approaches to go along with those and stitch those very long pieces back together again and get a very complete and accurate view of this genome map.

Tom Temin So the durability of Moore’s Law, you might say, is what enabled this to happen 20 years after the original sequencing.

Adam Phillippy Yeah, and it’s a really cool hand in hand of the computational advancements as well as the biochemistry and the engineering advancements, because I think it’s fair to say that these technologies like Nanopore sequencing that we use in this approach wouldn’t even be possible 20 years ago because you didn’t have machine learning of the type that we have now. And a lot of these machine learning algorithms that you hear in the press for natural language processing and so forth are used to translate this electrical signal that we get from the nanopores into predictions of the ACGs and TS. And so it’s really I call myself a bioinformatician, and that means I straddle this line between computer science and algorithms and the biochemistry and molecular biology. And it’s really gratifying to see those two fields progress over the last 20 years and the way they have. And neither of them would have progressed without the other in this case.

Tom Temin We’re speaking with Dr. Adam Phillippy. He is the head of the Human Genome Informatics section at the National Human Genome Research Institute, part of the NIH. He’s also a finalist in this year’s Service to America Medals program. And is there practical application for this final 8% of the mapping? Can it help medicine advance in some way or or some other area that might be useful that we could not do before this capstone, as you put it?

Adam Phillippy Yeah, it just makes everything easier and more accurate. And so when we talk about medical diagnostics, if you have your genome done in the clinic and they’re looking for a causative variant of a disease, they’ll map your individual genome and they’ll compare it against a known reference genome and they’ll look for differences. And we call those differences variants. And if you have differences in critical parts of the genome, those rise to the top as a candidate that might be responsible for your disease. And if they can pinpoint exactly what in your genome caused the disease, the hope is you can develop therapies to then treat it. And so imagine you’re looking for this needle in the haystack, but you’re missing 8% of the haystack. If you get very unlucky in that needles in that 8% that you’re missing, you will never find it. And so the hope of this project is that some of the rare diseases that have gone yet undiscovered in terms of a genetic cause, we know it’s a disease. We know it’s genetic because you can see the inheritance pattern from your parents and your grandparents and so forth. But we haven’t found the cause. The hope is we’ll find some of those causes now in this new 8%. And we’re optimistic about that because as we discussed earlier, some of this part of the genome really relates to fundamental cellular processes. And so we think that could have significant effects.

Read more: Management

Tom Temin And just describe the worldwide consortium aspect of this. The three of you at NIH kind of led this, but it sounds like it was a really vast effort with a lot of coordination around the globe.

Adam Phillippy Yeah, well, first and foremost, you know, the three of us are Sammies finalists by way of being federal employees. There was a few other very essential partners that ended up leading this consortium. In fact, I launched this consortium and, you know, around 2017 with Dr. Karen Miga, who’s an assistant professor and co-director of the Genomics Center at University of California, Santa Cruz. And Karen was just an incredible partner throughout this project, and it would not have happened without her partnership and taking this on together. Evan Eichler was also a key contributor at the University of Washington and all of our other contributors around the U.S. and the globe. But what was really gratifying about this project is that rather than kind of the initial human genome project, that was really kind of a top down made at governmental levels, we’re going to finish the human genome. Let’s assign millions and millions of dollars to this project and go. This was much more of a grassroots kind of bottom up effort that it started with just Karen and I and really no dedicated funding for this project. And we said, Let’s do it. And we just started building this coalition of people that were similarly interested in these regions of the genome and, you know, like rolling this small little snowball downhill. It just started picking up steam over the years and we made some really big successes in like 2019, 2020. We finished the first chromosome that was chromosome X at the time. And that really kind of proved a community that we had the capability of doing this. And people then just started coming out of the woodwork and joining the consortium as we went with all sorts of complementary experiences that in the end we were able to put together this very nice collection of papers that not just showed the complete genome, but also showed all of the interesting biology that was happening in these unique regions. So it was a very organic growth of the consortium because, you know, when you’re doing great science and making exciting discoveries, you know, everybody wants to be a part of it. And so it was not hard to make friends throughout the course of this consortium.

Tom Temin And was there a single moment when you all realized, by gosh, were there? We’ve got it.

Adam Phillippy Yeah. In fact, maybe not the single moment when we were done, but the single moment when we realized that we could be done, we just put a little more work into it. And that was really brought on by my postdoc at the time. Another Sergey, Sergey Nurk, who was a visiting postdoc in my lab at the NIH. He brought some early results to me right at the beginning of the pandemic in the spring of 2020, and kind of, I’d like to say laid it on my desk, but it was on a computer screen, brought his laptop in, showed me these early results. And this is when we had taken all of the latest DNA sequencing technologies and combined it together and showed what we could actually do with the latest and greatest sequencing technologies and some of the methods that Sergey himself had developed. And the puzzle was snapping together for the first time. And there was parts of the genome that we had never seen assemble, which is the work we call putting this puzzle together. Those parts just snap together, just like everything made sense. We looked at it and it was that moment where we looked at each other and thought, wow, we have a chance of actually doing this.