Next-generation sequencing must die (part 3) — a tale of two titles

This morning I came across two new papers. Compare and contrast:

  1. De Novo Assembly and Annotation of Salvia splendens Transcriptome Using the Illumina Platform — Ge et al. PLOS ONE
  2. RepARK—de novo creation of repeat libraries from whole-genome NGS reads — Koch et al. Nucleic Acids Research

The former paper lets me know that the research is based on a specific sequencing technology whereas the latter paper is possibly suggesting that the RepARK tool might work with any 'NGS' data.

Given the wide, and sometimes inappropriate, use of the 'NGS' phrase, it is not always obvious what someone means when they refer to 'NGS reads'. This could include 25–30 bp reads from older Illumina sequencing all the way to PacBio reads that may be >15,000 bp (and which contain a high fraction of indels).

Reading the Methods section of the paper, I see that they only used simulated 101 bp reads as well as real Illumina reads (average length = 82 bp). They do point out in the discussion that "long [PacBio] reads may also provide new opportunities for de novo repeat prediction" . This is something that I have an interest in because we have previously published data that used PacBio data to find tandem repeats. 

In order to find out that they don't have any PacBio data, I had to read the title, abstract, methods, and then scan the rest of the paper. I accept that 'NGS' is a convenient term to use, but it would have been helpful (to me anyway) if at least the abstract could have pinpointed which NGS technologies the paper was using.

Next-generation sequencing must die (part 2) — understanding the generation gap

Screen Shot 2014-03-10 at 2.58.05 PM.png

As a brief follow-up to my previous post, I'd like to clarify that next-generation sequencing may refer to technologies from Illumina, 454, SOLiD, Helicos, Ion Torrent, Complete Genomics, PacBio, or Oxford Nanopore (these links all refer to different papers).

If we want to get more specific, we need to recognize that Complete Genomics is a second generation technology...except when it is a third generation technology. In contrast, we should be clear that Oxford Nanopore is the only example of fourth generation technology...apart from when it is third generation technology. We can at least be sure that Ion Torrent is definitely a second generation platform...unless it's a third generation platform. One paper clarifies this situation by observing that Ion Torrent "sits between" the second and third generation categories.

Further illumination on this subject is provided by the confirmation that PacBio is either a second generation, third generation...or even a "2.5th" generation technology. Likewise, Helicos is also a second generation, third generation, or lies "in between the transition of next-generation sequencing to third generation" sequencing technologies.

So hopefully that's a lot clearer now.

Next-generation sequencing must die!

Screen Shot 2014-03-07 at 4.05.14 PM.png

I hate the phrase next-generation sequencing (NGS) with a passion. Here's why...

The first published use of this phrase (that I can find) is from an article in Drug Discovery Today: Technologies by Thomas Jarvie in 2005. This paper had the succinct title Next generation sequencing technologies, and while this may represent the first time this phrase made it into print, it certainly wouldn't be the last.

Illumina sequencing may be the most obvious technology that springs to mind when people think of NGS, but there is also Pyrosequencing (developed circa 1996), Massively Parallel Signature Sequencing (circa 2000), ABI SOLiD sequencing (circa 2008), and Ion semiconductor sequencing (circa 2010).

Of course we also have single molecule real time sequencing by Pacific Biosciences. They were founded in 2004 but didn't launch their PacBio RS machines until 2010. The current darling of the sequencing world is nanopore technology, something which has been in development since the mid 1990s.

So do we refer to this entire period (from development to finished technology) as the NGS era? If so, then NGS technologies have already been around for almost 20 years. It doesn't strike me as particularly helpful to keep on labeling all of these different technologies with the same name.

Some people have tried to make things clearer by introducing yet more levels of obfuscation. This has led to some of these technologies being referred to as either second generation, third generation, fourth generation, next-next-generation, and even next-next-next generation. And of course these definitions are all subjective and one person's third-generation technology is another person's fourth-generation technology.

Other alternatives to NGS such as high-throughput sequencing or long-read sequencing are equally useless because 'high' and 'long' are both relative terms. The output from a high-throughput sequencing platform of 2008 might seem like 'low-throughput' today. The weakness of length-based descriptions is the reason why the 'Short Read Archive' was (thankfully) reborn as the Sequence Read Archive.

So here is my proposed three-step solution to rid the world of this madness:

  1. Don't use any of these terms, ever again.
  2. Just refer to a technology by a name that describes the methodology (e.g. sequencing-by-synthesis) or by the name of a company that has developed a specific product (e.g. Oxford Nanopore).
  3. You could even just use the term 'current sequencing technologies' long as your paper/talk/blog/book has a date associated with it, then I'm confident people will know what you mean.

Update 12th March: I have added a follow-up post to this one.