Bad bioinformatics software names revisited

I recently have been sorting through lots of old notes files, including many from my time as a genomics researcher at UC Davis. One note file I had was called ‘Strategies for naming bioinformatics software’ and I initially assumed it was one of the blog posts posted on this blog.

However, I couldn’t find it as an actual post and when I did a quick web search, I instead discovered this ‘Bioinformatics lab’ podcast from earlier this year:

I have been out of the field of genomics/bioinformatics for many years now and didn’t know about The Bioinformatics Lab podcast which describes itself as ‘ramblings on all things bioinformatics’.

The conversation between the hosts (Kevin Libuit and Andrew Page) is good and listening to it brought back lots of memories from the many things I’ve written about on this blog. At the end of the episode, Andrew concludes:

“It’s kind of hard. People should bit a bit of effort into it”

100% this! Naming software should definitely not be an afterthought. Andrew goes on:

“Before you do any development on anything, go and choose a really good name and make sure it doesn’t conflict with any trademarks or existing tools, you can Google it easily and it’s not offensive in any language.”

These are the types of things that I have written about extensively on this blog. If you are interested, perhaps start with

Then you can ready any one of the nearly forty posts I wrote which handed out ‘JABBA awards’ which stands for Just Another Bogus Bioinformatics Acronym.

This award series started all the way back in 2013 and the inaugural award went to a tool with the crazy capitalisation of 'BeAtMuSiC'.

There’s also a series of posts on duplicate names in bioinformatics where people haven’t checked whether their software name is stepping on someone else’s toes.

This includes a post about the audacious attempt to name a new piece of bioinformatics software BLAST. There is also a post about the five different tools that are all called ‘SNAP’.

Admittedly I’ve been out of the loop for so long there is the possibility of there being many more SNAPs out there now!

The moral of this blog post is that names are important and it is very easy to mess them up which could end up meaning that fewer people ever discover your tool in teh first place.

Damn and blast…I can't think of what to name my software

1441920213651.png

As many people have pointed out on Twitter this week, there is a new preprint on bioRxiv that merits some discussion:

The full name of the test that is the subject of this article is the Bron/Lyon Attention Stability Test. You have to admit that 'BLAST' is a punchy and catchy acronym for a software tool.

It's just a shame that is also an acronym for another piece of software that you may have come across.

It's a bold move to give your software the same name as another tool that has only been cited at least 135,000 times!

This is not the first, nor will it be the last, example of duplicate names in bioinformatics software, many of which I have written about before.

Finding bogus bioinformatics acronyms sometimes requires a laser-like focus

jabba logo.png

A new paper has been published in the journal BMC Research Notes:

This name is:

  1. Bogus — the word 'genome' doesn't contribute any letters to 'LASER' and two letters ('S' and 'R') are not derived from the initial letters of words.
  2. Duplicated — there are at least two other bioinformatics tools called LASER (see here and here).
  3. Undiscoverable — you really need to search Google for LASER genome assembly before you see this as a top result.
  4. Ambiguous — large is a very subjective term. The authors imply that LASER is suitable for human genomes. These are larger than some genomes but smaller than others.
  5. Inconsistent — the paper reveals that LASER is built on the code of QUAST (Quality Assessment Tool for Genome Assemblies). This means you end up with the somewhat bizarre documentation for how to run the program called LASER:

The example included with LASER installation can be run as:

./quast.py testdata/contigs1.fasta testdata/contigs2.fasta \ -R testdata/reference.fasta.gz -G testdata/genes.txt \ -O test_data/operons.txt

The output of LASER program can be viewed in file: ./quast_results/latest/report.txt

So to run LASER just type 'quast'!

Trying to locate the source of duplicated software names

Thanks to Andrew Su (@andrewsu) and Mick Watson (@BioMickWatson) for alerting me to the following:

The former paper is from 2009, the latter paper is from 2015. Neither paper has anything to do with this 2010 paper which introduced something called the Genome Positioning System (GPS). Most importantly, none of these papers have anything at all to do with GPS (as most people understand the term).

If I run a Google search for GPS Bioinformatics the top hit that I see is for the MSc course in Bioinformatics as part of Brandeis University's Graduate Professional Studies program.

The usual disclaimer applies:

  1. Check existing literature before you name your software (at the very least run a Google search).
  2. Double check the name by adding the word 'bioinformatics' or 'genomics' to the search terms.
  3. Avoid names which wholly or partially contain words or terms that have nothing to do with your software.

The name of this bioinformatics tool merits close inspection

  1. Bogus bioinformatics acronyms = mildly annoying
  2. Names that clash with previouly published tools = mildly annoying
  3. Bogus bioinformatics acronyms that clash with previouly published tools = very annoying

Step forward a new paper published in journal of Bioinformatics:

How does INSPEcT derive its name?

  • INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-course analysis)

Inclusion of the 'E' from 'degradation' and omission of 'R', 'C', or 'A' (from 'Rates', 'Course', and 'Analysis') earns this tool a JABBA award. It also earns a 'Duplications' award because of:

More duplicate names for bioinformatics software: a tale of two HIPPIES

Thanks to Sara Gosline (@sargoshoe) for bringing this to my attention. Compare and contrast the following:

The former tool, published in 2012 in PLOS ONE, takes its name from 'Human Integrated Protein-Protein Interaction rEference' (it was doing so well until it reached the last letter). The latter tool ('High-throughput Identification Pipeline for Promoter Interacting Enhancer elements') was published in 2014 in the journal Bioinformatics.

Leaving aside the issue of whether these names are worthy of a JABBA award, the issue here is that we have yet another duplicate set of software names for two different bioinformatics tools. The authors of the 2nd paper could, and should, have checked for 'prior art'.

If you are planning to develop a new bioinformatics tool and have thought of a possible name, please take the time to do the following:

  1. Visit http://google.com (or your preferred web search engine of choice)
  2. In the search box type the proposed name of your tool followed by a space
  3. Then add the word 'bioinformatics'
  4. Click search
  5. That's it

24 carat JABBA awards

jabba logo.png

Here is a new paper published in the journal PLOSBuzzFeed…sorry, I mean PLOS Computational Biology:

It's a good job that they mention the name of the algorithm ninety-one times in the paper, otherwise you might forget just how bogus the name is. At least DIAMOnD has that lower-case 'n' which means that no-one will confuse it with:

This second DIAMOND paper dates all the way back to November 2014. Where does this DIAMOND get its name?

Double Index AlignMent Of Next-generation sequencing Data

This DIAMOND gets a bonus point for having a website link in the paper which doesn't seem to work.

So DIAMOnD and DIAMOND are both the latest recipients of JABBA awards for giving us Just Another Bogus Bioinformatics Acronym.

Bogus bioinformatics acronyms…there's a lot of them about

Time for some new JABBA awards to recognize the ongoing series of crimes perpetrated in the name of bioinformatics. Two new examples this week…

 

Exhibit A (h/t @attilacsordas): from arxiv.org we have…

CoMEt derives from 'Combinations of Mutually Exclusive Alterations'. Of course the best way of making it easy for people to find your bioinformatics tool is to give it an identical name as an existing tool which does something completely different. So don't be surprised if you search for the web for 'CoMEt' only to find a bioinformatics tool called 'CoMet' from 2011 (note the lower-case 'e'!). CoMet is a web server for comparative functional profiling of metagenomes.

 

Exhibit B: from the journal Bioinformatics — the leading provider of bogus bioinformatics acronyms since 1998 — we have…

MUSCLE is derived from 'Multi-platform Unbiased optimization of Spectrometry via Closed-Loop Experimentation'. Multi-platform you say? What platforms would those be? From the paper:

MUSCLE is a stand-alone desktop application and has been tested on Windows XP, 7 and 8

What, no love for Windows Vista?

Of course, it should be obvious to anyone that this bioinformatics tool called MUSCLE should in no way be confused with the other (pre-existing) bioinformatics tool called MUSCLE.

Choosing names for bioinformatics software: it's a snap

Image from flickr user plashingvole

Image from flickr user plashingvole

Compare the following published bioinformatics resources:

  1. SNAP: Semi-HMM-based Nucleic Acid Parser (published 2004)
  2. SNAP: Suite of Nucleotide Analysis Programs (published 2005)
  3. SNAP: SNP Annotation And Proxy search (published 2008)
  4. SNAP: Screening for NonAcceptable Polymorphisms (published 2008)
  5. SNAP: Scalable Nucleotide Alignment Program (published 2011)

Every new bioinformatics tool that decides to reuse an existing name — either wilfully or by ignorance — makes it that little bit harder for people to find one of the other similarly-named-tools that they might be searching for.

h/t to @byuhobbes for bringing some of these duplicates to my attention.