Bioinformatics software names: the good, the bad, and the ugly

The Good

Given that I spend so much time criticising bad bioinformatics names, I probably should make more of an effort to those occasional flag names that I actually like! Here are a few:

RNAcentral: an international database of ncRNA sequences

A good reminder that a bioinformatics tool doesn't have to use acronyms or intialisms! The name is easy to remember and makes it fairly obvious as to what you might expect to find in this database.

KnotProt: a database of proteins with knots and slipknots

A simple, clever, and memorable, name. And once again, no acronym!

WormBase and FlyBase

Some personal bias here — I spent four years working at WormBase — but you have to admire the simplicity and elegance of the names. 'WormBase' sort of replaced it's predecessor ACeDB (A Caenorhabidtis elegans DataBase). I say 'sort of' because ACeDB was the name for both the underlying software (which continued to be used by WormBase) and the specific instance of the database that contained C. elegans data. This led to the somewhat confusing situation (circa 2000) of there being many public ACeDB databases for many different species, only one of which was the actual ACeDB resource with worm data.

The Bad

These are all worthy of a JABBA award:

The human DEPhOsphorylation database DEPOD: a 2015 update

I find it amusing that they couldn't even get the acronym correctly captitalized in the title of the paper. As the abstract confirms, the second 'D' in 'DEPOD' comes from the word 'database' which should be capitalized. So it is another tenuous selection of letters to form the name of the database, but I guess at least the name is unique and Google searches for depod database don't have any trouble finding this resource.

IMGT®, the international ImMunoGeneTics information system® 25 years on

It's a registered trademark and that little R appears at every mention of the name in the paper. This initialism is the first I've seen where all letters of the short name come from one word in the full name.

DoGSD: the dog and wolf genome SNP database

I have several issues with this:

  1. It's a poor acronym (not explicitly stated in the paper): Dog and wolf Genome Snp Database
  2. The word 'dog' contributes a 'D' to the name, but then you end up with 'DoG' in the final name. It looks odd.
  3. What did the poor wolf do to not get featured in the database name?
  4. The lower-case 'O' means that you potentially can read this as dog-ess-dee or do-gee-ess-dee.
  5. Why focus the name on just two types of canine species? What if they wanted to add SNPs from Jackals or Coyotes, are they going to change the name of the database? They could have just called this something like 'The Canine SNP Database' and avoided all of these problems.

The Ugly

Maybe not JABBA-award winners, but they come with their own problems:

MulRF: a software package for phylogenetic analysis using multi-copy gene trees

Sometimes using the lower-case letter 'L' in any name is just asking for trouble. Depending on the font, it can look like the number 1 or even a pipe character '|'. The second issue is concerns the pronouncability of this name. Mull-urf? Mull-ar-eff? It doesn't trip off the tongue.

DupliPHY-Web: a web server for DupliPHY and DupliPHY-ML

This tool is all about looking for gene duplications from a phylogenetic perspective, hence 'Dupli' + 'PHY'. I actually think this is quite a good choice of name, except for the inconsistent, and visually jarring, use of mixed case. Why not just 'Dupliphy'?

ChiTaRS 2.1—an improved database of the chimeric transcripts and RNA-seq data with novel sense–antisense chimeric RNA transcripts

It's not spelt out in detail, but one can assume that 'ChiTaRS' derives from the following letters: CHImeric Transcripts And Rna-Seq data. So it is not being a bogus bioinformatics acronym in that respect. But I find it visually unappealing. Mixed capitalization like this never scans well.

DoRiNA 2.0—upgrading the doRiNA database of RNA interactions in post-transcriptional regulation

The paper doesn't explicitly state how the word 'DoRiNA' is formed other than saying:

we have built the database of RNA interactions (doRiNA)

So one can assume that those letters are derived from 'Database Of Rna INterActions'. On the plus side, it is a unique name easily searchable with Google. On the negative side, it seems strange to have 'RNA' as part of your database name, only with an additional letter inserted inbetween.