Damn and blast…I can't think of what to name my software

1441920213651.png

As many people have pointed out on Twitter this week, there is a new preprint on bioRxiv that merits some discussion:

The full name of the test that is the subject of this article is the Bron/Lyon Attention Stability Test. You have to admit that 'BLAST' is a punchy and catchy acronym for a software tool.

It's just a shame that is also an acronym for another piece of software that you may have come across.

It's a bold move to give your software the same name as another tool that has only been cited at least 135,000 times!

This is not the first, nor will it be the last, example of duplicate names in bioinformatics software, many of which I have written about before.

Finding bogus bioinformatics acronyms sometimes requires a laser-like focus

jabba logo.png

A new paper has been published in the journal BMC Research Notes:

This name is:

  1. Bogus — the word 'genome' doesn't contribute any letters to 'LASER' and two letters ('S' and 'R') are not derived from the initial letters of words.
  2. Duplicated — there are at least two other bioinformatics tools called LASER (see here and here).
  3. Undiscoverable — you really need to search Google for LASER genome assembly before you see this as a top result.
  4. Ambiguous — large is a very subjective term. The authors imply that LASER is suitable for human genomes. These are larger than some genomes but smaller than others.
  5. Inconsistent — the paper reveals that LASER is built on the code of QUAST (Quality Assessment Tool for Genome Assemblies). This means you end up with the somewhat bizarre documentation for how to run the program called LASER:

The example included with LASER installation can be run as:

./quast.py testdata/contigs1.fasta testdata/contigs2.fasta \ -R testdata/reference.fasta.gz -G testdata/genes.txt \ -O test_data/operons.txt

The output of LASER program can be viewed in file: ./quast_results/latest/report.txt

So to run LASER just type 'quast'!

Trying to locate the source of duplicated software names

Thanks to Andrew Su (@andrewsu) and Mick Watson (@BioMickWatson) for alerting me to the following:

The former paper is from 2009, the latter paper is from 2015. Neither paper has anything to do with this 2010 paper which introduced something called the Genome Positioning System (GPS). Most importantly, none of these papers have anything at all to do with GPS (as most people understand the term).

If I run a Google search for GPS Bioinformatics the top hit that I see is for the MSc course in Bioinformatics as part of Brandeis University's Graduate Professional Studies program.

The usual disclaimer applies:

  1. Check existing literature before you name your software (at the very least run a Google search).
  2. Double check the name by adding the word 'bioinformatics' or 'genomics' to the search terms.
  3. Avoid names which wholly or partially contain words or terms that have nothing to do with your software.

The name of this bioinformatics tool merits close inspection

  1. Bogus bioinformatics acronyms = mildly annoying
  2. Names that clash with previouly published tools = mildly annoying
  3. Bogus bioinformatics acronyms that clash with previouly published tools = very annoying

Step forward a new paper published in journal of Bioinformatics:

How does INSPEcT derive its name?

  • INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-course analysis)

Inclusion of the 'E' from 'degradation' and omission of 'R', 'C', or 'A' (from 'Rates', 'Course', and 'Analysis') earns this tool a JABBA award. It also earns a 'Duplications' award because of: