Another hard-to-pronounce bioinformatics software name

October 21, 2015 by Keith Bradnam

This was from a few months ago, published in the journal Nucleic Acids Research:

CATH FunFHMMer web server: protein functional annotations using functional family assignments

So how do you pronounce 'FunFHMMer'? I can imagine several possibilities:

Fun-eff-aitch-em-em-er
Fun-eff-aitch-em-mer
Fun-eff-hammer
Fünf-hammer

Reading the manuscript suggests that 'FunF' stems from 'FunFam(s)' which in turn is derived from 'functional families'. This would suggest that options 1 or 3 above might be the correct way to pronounce this software's name.

The fully expanded description of this web server's name becomes a bit of a mouthful:

Class Architecture Topology Homologous Superfamily Functional Families Hidden Markov Model (maker?)

If you want your bioinformatics software to have a memorable name, it helps if the name is pronounceable

August 12, 2015 by Keith Bradnam

There is a new paper in the journal Bioinformatics:

Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data

The paper describes a new method for implementing a Principle Components Analysis (PCA) of data. That new method has a name. That name has just seven characters. How hard can it be to pronounce?

S4VDPCA: ess-four-vee-dee-pee-cee-ay

It doesn't exactly trip off the tongue and having four 'ee-sounding' letters together (VDPC) doesn't make it easy to remember. When I first came across this paper, I skimmed the article, waited an hour, and then tried to remember the name. I could remember that it included '4', 'V', and 'D', but couldn't remember the order (or that it also included an 'S')

It is by no means essential that bioinformatics tools have easily pronounceable names, but this will help people remember the name of your software. In turn, this makes it easier for people to tell others about your software. I don't imagine that bioinformatics software developers ever want to overhear the following type of conversation:

Bob: "You should use that tool"

Sue: "What tool?"

Bob: "Umm, you know that PCA thingy. The S…something, something…PCA tool"

Sue: "The what?"

Bob: "Run a Google search for Bioinformatics PCA tools, it's probably the top hit."

Sue: <- facepalm ->

Unpronounceable bioinformatics database names

January 21, 2015 by Keith Bradnam

First a quick reminder that an acronym is something that is meant to be pronounced as an entire word (e.g. NATO, AIDS etc.). Sometimes these end up becoming regular, non-capitalized, words (e.g. radar, laser).

In contrast, an initialism is something where the component letters are read out individually (e.g. BBC, CPU). In bioinformatics, there are also names which are part acronym and part initialism (e.g.GWAS…which I have only every heard pronounced as gee-was).

Most initialisms that we use in everday life tend to be short (2–4 letters) because this makes them easier to read and to pronounce. As you move past 4 letters, you run the risk of making your initialism unprouncible and unmemorable.

So here are some recently published bioinformatics tools with names that are a bit cumbersome to repeat. For each one I include how someone might try to pronounce them. Try repeating these names quickly and for an added test, see how many of these names you can remember 5 minutes after you read this:

5 characters

CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics: cee-car-eff-dee-bee? — this assumes that 'Ce' and 'Ca' are not treated separately as two letters…one could argue that if it is not clear how your bioinformatics tool name should be pronounced, then it does not have a good name.
EHFPI: a database and analysis resource of essential host factors for pathogenic infection: ee-aitch-eff-pee-aye
PAIDB v2.0: exploration and analysis of pathogenicity and resistance islands: pee-ay-aye-dee-bee — this is a particularly bad choice of name as it will read to many as 'paid-bee'
rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development: ar-ar-en-dee-bee (the first 3 characters are not easy to say quickly!)
The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome: tee-tee-ess-em-aye

6 characters

DBTMEE: a database of transcriptome in mouse early embryos: dee-bee-tee-em-ee-ee — I accept that maybe this one is just pronounced dee-bee-tee-me, but once again do you really want there to be uncertaintly as to how the name of your bioinformatics tool is read by others?
euL1db: the European database of L1HS retrotransposon insertions in humans: ee-you-ell-one-dee-bee
SASBDB, a repository for biological small-angle scattering data: ess-ay-ess-bee-dee-bee
WDSPdb: a database for WD40-repeat proteins: dub-ball-you-dee-ess-pee-dee-bee

7 characters

BCCTBbp: the Breast Cancer Campaign Tissue Bank bioinformatics portal: bee-cee-cee-tee-bee-bee-pee
PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool: pee-eff-pee-slash-ee-ess-gee (only 6 characters if you omit the slash I guess)
PHI-DAC: protein homology database through dihedral angle conservation: pee-aitch-aye-dash-dee-ay-cee (shorter if you omit dash and/or pronounce 'DAC' as a word)

And the winner goes to…

BioVLAB-MMIA-NGS: microRNA–mRNA integrated analysis using high-throughput sequencing data: this is a 7-letter initialism that comes after a three syllable (non-standard) word, so to pronounce this you have to say bio-vee-lab-em-em-aye-ay-en-gee-ess!!!

Conclusions

If you want people to actually use your bioinformatics tools, then you should aim to give them names that are memorable and pronounceable.

How would you pronounce the name of this bioinformatics tool?

October 22, 2014 by Keith Bradnam

From the latest issue of Bioinformatics we have a new tool that is an R package for the analysis of GWAS studies. Rather than name the tool, I want you all to first see it exactly as it appears in the journal:

The first character in the name of this software is a character which can often be hard to identify, particularly when certain fonts makes it look like it could be the letters L or I, or even the number 1.

This is not a name that is worthy of a JABBA-award, but it does fall in to my category of posts which I call almost JABBA, for software names that have various other issues. The particular issue in this case is that the name is hard to read and therefore hard to pronounce. I feel that the use of lower-case characters makes it more likely that the reader will attempt to pronounce this as a word, rather than read it as an initialism. E.g. maybe you saw this name and read it as 'Lurgpurr', or 'Ergpurr'.

The reason behind the name is not explained in the article, but when you go to the linked software page, all is revealed:

It's a bit odd that one of the five words that appear in this name ('Gaussian') doesn't get mentioned anywhere in the paper. But more importantly, why did they feel the need for using lower-case characters? 'LRGPR' would have been much easier to read and comprehend than the font-dependent 'lrgpr'.

Unpronounceable — why can't people give bioinformatics tools sensible names?

June 13, 2014 by Keith Bradnam

Okay, so many of you know that I have a bit of an issue with bioinformatics tools with names that are formed from very tenuous acronyms or initialisms. I've handed out many JABBA awards for cases of 'Just Another Bogus Bioinformatics Acronym'. But now there is another blight on the landscape of bioinformatics nomenclature…that of unpronounceable names.

If you develop bioinformatics tools, you would hopefully want to promote those tools to others. This could be in a formal publication, or at a conference presentation, or even over a cup of coffee with a colleague. In all of these situations, you would hope that the name of your bioinformatics tool should be memorable. One way of making it memorable is to make it pronounceable. Surely, that's not asking that much? And yet…

GO2MSIG, an automated GO based multi-species gene set generator for gene set enrichment analysis – This is not so hard to pronounce (go-to-em-sig), but it is a little awkward and not very memorable.
AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data — I guess this only has one obvious pronunciation (abs-see-en-seq), but again not particularly memorable.
QCGWAS: A flexible R package for automated quality control of genome-wide association results — This sort of works if you separate out the two commonly used initialisms (QC + GWAS), but maybe not everyone will spot this straight away (especially if you are not familiar with GWAS). I still find this a bit of mouthful to say (cue-see-gee-was).
CMGRN: a web server for constructing multilevel gene regulatory networks using ChIP-seq and gene expression data — The lack of vowels means that can only ever be pronounced by uttering every consonant separately (see-em-gee-ar-en).
iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition — I don't know where to start with this one! Imagine that you had to spell this out to a journalist over the phone (something that can happen in science!): "The software name? Yes, it's aye (lower-case), en (upper-case), you-see (lower-case), hyphen, pee (upper-case), ess-ee (lower-case), and kay-en-see (upper-case)…hello, are you still there?".
MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation — Couldn't be simpler really. I look forward to telling my colleagues about em-eff-ess-pee-ess-ess-em-pred.
mRMRe: an R package for parallelized mRMR ensemble feature selection — This is not as long as some of the others, but trying saying this five times fast (em-ar-em-ar-ee).
LoQAtE—Localization and Quantitation ATlas of the yeast proteomE. A new tool for multiparametric dissection of single-protein behavior in response to biological perturbations in yeast — I get the feeling that this is meant to be pronounced 'LOCATE', but that's only a guess. Maybe it's really pronounced low-queue-at-ee? It's clumsy, ugly, and also an incredibly tenuous initialism.
HoPaCI-DB: host-Pseudomonas and Coxiella interaction database — This, like many of the above entries, also featured as a JABBA award recipient. This is not as bad an acronym/initialism as others, but it ranks highly for its lack of obvious pronunciation. Is it ho-pa-cee-aye-dee-bee, hop-pah-cee-aye-dee-bee, ho-pa-sigh-dee-bee, or even ho-pack-ee-dee-bee???

There is a lot of bioinformatics software in this world. If you choose to add to this ever growing software catalog, then it will be in your interest to make your software easy to discover and easy to promote. For your own sake, and for the sake of any potential users of your software, I strongly urge you to ask yourself the following five questions:

Is the name memorable?
Does the name have one obvious pronunciation?
Could I easily spell the name out to a journalist over the phone?
Is the name of my database tool free from any needless mixed capitalization?
Have I considered whether my software name is based on such a tenuous acronym or intialism that it will probably end up receiving a JABBA award?