Does your bioinformatics software pass the 'elevator test'?

The name of your bioinformatics software is important. A good name should be clear, unambiguous, pronouncable, memorable, and meaningful. Sadly many (most?) names of existing tools do not satisfy all of these criteria. Here is a simple thought experiment that you can use when trying to decide on a new name for your software; this is something which might help you avoid many common naming problems that can arise.

Imagine that you are in an elevator going from the 6th floor of a building to the ground floor. The elevator stops at the 5th floor and a visiting bioinformatics/genomics scholar steps in. He/she is someone that you admire and someone who you would really like to know about the latest software tool that you've been working on.

They press the button for the 2nd floor. You have maybe 30 seconds to introduce the tool and hopefully make them curious enough to check it out when they next get back to their computer. You say something like:

Hi. I'm a big fan of your work. I wanted to let you know that I've been working on a tool that you might be interested in…it's called 'X'

In this example, we will assume that you may never see this person again and that you don't know when they will have time to look up your software tool. It might be days, so the name has to be something that they will remember. The more meaningful and pronouncable the name, the more chance that it will be memorable.

Now, let's consider the names of some recently published bioinformatics tools…do these pass the elevator test? You should always consider how you might have to spell out the name of your software:

  • tmle.npvi — tee-em-el-ee-dot-en-pee-vee-aye
  • EW_dmGWAS — Ee-double-you-underscore-dee-em-gee-was
  • do_x3dna — dee-oh- (or do?) -underscore-ex-three-DNA
  • R3D-2-MSA — ar-three-dee-dash-two-dash-em-ess-ay 
  • Pse-in-One — pee-ess-ee- (or see?) -dash-in-dash-one
  • (PS)2 — open-parentheses-pee-ess-close-parentheses-superscript-two

In these examples you would probably choose to omit details of the dots, dashes, underscores, parentheses, and superscript characters that are part of the name. So you should ask yourself whether you really need to include them in the first place.

The bottom line is that it is not enough for the name of your sofware to be comprehensible when read from a screen or page…it should also sound good!

And the award for needless use of subscript in the name of a bioinformatics tool goes to…

The following paper can be found in the latest issue of Bioinformatics:

MoRFs are molecular recognition features, and the tool that the authors developed to identify them is called:

MoRFCHiBi

So the tool's name includes a subscripted version of 'CHiBi', a name which is taken from the shorthand name for the Center for High-Throughput Biology at the University of British Columbia (this is where the software was presumably developed). The website for MoRFCHiBi goes one step further by describing something called the MoRFChiBi,mc predictor. I'm glad that they felt that some italicized text was just the thing to complement the subscripted, mixed case name.

The subscript seems to serve no useful purpose and just makes the software name harder to read, particularly because it combines a lot of mixed capitalization. It also doesn't help that 'ChiBi' can be read as 'kai-bye' or 'chee-bee'. I'm curious whether the CHiBi be adding their name as a subscripted suffix to all of their software, or just this one?

More duplicate names for bioinformatics software: a tale of two HIPPIES

Thanks to Sara Gosline (@sargoshoe) for bringing this to my attention. Compare and contrast the following:

The former tool, published in 2012 in PLOS ONE, takes its name from 'Human Integrated Protein-Protein Interaction rEference' (it was doing so well until it reached the last letter). The latter tool ('High-throughput Identification Pipeline for Promoter Interacting Enhancer elements') was published in 2014 in the journal Bioinformatics.

Leaving aside the issue of whether these names are worthy of a JABBA award, the issue here is that we have yet another duplicate set of software names for two different bioinformatics tools. The authors of the 2nd paper could, and should, have checked for 'prior art'.

If you are planning to develop a new bioinformatics tool and have thought of a possible name, please take the time to do the following:

  1. Visit http://google.com (or your preferred web search engine of choice)
  2. In the search box type the proposed name of your tool followed by a space
  3. Then add the word 'bioinformatics'
  4. Click search
  5. That's it