The 100,000 Genomes Project has finished

This week I helped write a blog post for The Institute of Cancer Research to mark the completion of the 100,000 Genomes Project. This blog post was co-written by a former colleague, Dr Sam Dick, who wrote the majority of the article:

Read the blog post:

Reflecting on this milestone achievement, I also took to Twitter this week for a lengthy (and admittedly rambling) thread that reflected on how far genomics has come as a field. Click on the tweet below to see the full Twitter thread:

Hopping back for another JABBA award

jabba logo.png

So I was meant to have retired myself from handing out JABBA awards to recognise instances of ‘Just Another Bogus Bioinformatics Algorithm’. However, I saw something this week which clearly merits an award.

And so, from a new paper recently published in PLoS ONE I give you:

The three lower-case letters signal that there is going to be some name wrangling going on…so let’s see how the authors arrive at this name:

GRASShopPER: GPU overlap GRaph ASSembler using Paired End Reads

That’s how it is described in the paper, so I guess it could have also been called ‘GOGAUPER’? I find this is another example of a clumsily constructed acronym that could have been avoided altogether.

‘Grasshopper’ is a cool, and catchy, name for any software tool and it doesn’t really need to be retconned into making an awkward acronym.

It does, however, give me one new animal for the JABBA menagerie!

The changing landscape of sequencing platforms that underpin genome assembly

 From Flickr user  itsrick208 .  CC BY-NC 2.0

From Flickr user itsrick208CC BY-NC 2.0

In my last blog post I looked at the the amazing growth over the last two decades in publications that relate to genome assembly.

In this post, I try seeing whether Google Scholar can also shed any light on which sequencing technologies have been used to help understand, and improve, genome assembly.

Here is a rough overview of the major sequencing platforms that have underpinned genome assembly over the years. I’ve focused on time points when there were sequencing instruments that people were actually using rather than when the technology was first invented or described. This is why I start Sanger sequencing at 1995 with the AB310 sequencer rather than 1977.

Click to enlarge

Return to Google Scholar

So how can you find publications which concern genome assembly using these technologies? Well here are my Google Scholar searches that I used to try to identify relevant publications.

  1. Sanger — "genome assembly"|"de novo assembly" sanger -sanger.ac.uk — I had to exclude the Sanger’s website address as this was used in many papers that might not be talking about Sanger sequencing per se.
  2. Roche 454 — "genome assembly"|"de novo assembly" 454 (roche |pyrosequencing) — another tricky one as ‘454’ alone was not a suitable keyword for searching.
  3. Illumina — "genome assembly"|"de novo assembly" (illumina|solexa) — obviously need to include Solexa in this search as well.
  4. ABI SOLiD — "genome assembly"|"de novo assembly" “ABI solid”
  5. Ion Torrent — "genome assembly"|"de novo assembly" "ion torrent”
  6. PacBio — "genome assembly"|"de novo assembly" ("PacBio"|"Pacific Biosciences”)
  7. Oxford Nanopore Technologies — "genome assembly"|"de novo assembly" "Oxford Nanopore”

Now obviously, many of these searches are flawed and are going to miss publications or include false positives. This makes comparing the absolute numbers of publications between technologies potentially misleading. However, it should still be illuminating to look at the trends of how publications for each of these technologies have changed over time.

The results

As in my last graph, I plot the number of publications on a log scale.

Click to enlarge

Observations

  1. Publications about genome assembly that mention Sanger sequencing dominate the first decade of this graph before being overtaken by Illumina in 2009.
  2. The growth of publications for Sanger is starting to slow down
  3. Publications for Roche 454 peaked in 2015 and have started to decline
  4. Publications concerning Ion Torrent peaked a year later in 2016
  5. ABI SOLiD shows the clearest ‘rise and fall’ pattern with five years now of declining publications about genome assembly
  6. The rate of growth for PacBIo publications has been pretty solid but may have just slowed a little in 2017
  7. Oxford Nanopore, the newest kid on the block — in terms of commercially available products — has been on a solid period of exponential growth and looks set to overtake Ion Torrent (and maybe Roche 454) this year.