This morning I came across two new papers. Compare and contrast:
- De Novo Assembly and Annotation of Salvia splendens Transcriptome Using the Illumina Platform — Ge et al. PLOS ONE
- RepARK—de novo creation of repeat libraries from whole-genome NGS reads — Koch et al. Nucleic Acids Research
The former paper lets me know that the research is based on a specific sequencing technology whereas the latter paper is possibly suggesting that the RepARK tool might work with any 'NGS' data.
Given the wide, and sometimes inappropriate, use of the 'NGS' phrase, it is not always obvious what someone means when they refer to 'NGS reads'. This could include 25–30 bp reads from older Illumina sequencing all the way to PacBio reads that may be >15,000 bp (and which contain a high fraction of indels).
Reading the Methods section of the paper, I see that they only used simulated 101 bp reads as well as real Illumina reads (average length = 82 bp). They do point out in the discussion that "long [PacBio] reads may also provide new opportunities for de novo repeat prediction" . This is something that I have an interest in because we have previously published data that used PacBio data to find tandem repeats.
In order to find out that they don't have any PacBio data, I had to read the title, abstract, methods, and then scan the rest of the paper. I accept that 'NGS' is a convenient term to use, but it would have been helpful (to me anyway) if at least the abstract could have pinpointed which NGS technologies the paper was using.