An automated attempt to identify duplicated software names

September 22, 2015 by Keith Bradnam

From time to time I've been pointing out instances of duplicated software names in bioinformatics. I assume that many people reuse the name of an existing tool simply because they haven't first checked — or checked thoroughly — to see if someone else has already published a piece of software with the same name.

I am not alone in my concern over this issue and Neil Saunders (@neilfws on twitter) has gone one step further and written some code to try to track down instances of duplicated names. He recently wrote a post on his What You're Doing Is Rather Desperate blog to explain more:

Searching for duplicate resource names in PMC article titles

In the article, he describes how he used a Ruby script to parse the titles of articles downloaded from PubMed Central and then feeds this info into an R script to identify examples of duplicated software names. The result is a long list of duplicated software names (available on GitHub).

I'm not surprised to learn that generic names like 'FAST' and 'PAIR' appear on this list. However, I was surprised to see that in the same year (2011), two independent publications both decided to name their software 'COMBREX':

BUSCO — the tool that will hopefully replace CEGMA — now has a plant-specific dataset

September 21, 2015 by Keith Bradnam

With the demise of CEGMA I have previously pointed people towards BUSCO. This tool replicates most of what CEGMA did but seems to be much faster and requires fewer dependencies. Most importantly, it is also based on a much more updated set of orthologous genes (OrthoDB) compared to the aging KOGs database that CEGMA used.

The full publication of BUSCO appeared today in the journal Bioinformatics. I still haven't tried using the tool, but one critique that I have seen by others is that there are no plant-specific datasets of conserved genes to use with BUSCO. This appears to be something that the developers are aware of, because the BUSCO website now indicates that a plant dataset is available (though you have to request it).

Not to be confused with this website… →

September 21, 2015 by Keith Bradnam

I did a double-take when I first saw the title of this paper:

ATGme: Open-source web application for rare codon identification and custom DNA sequence optimization

Anatomy of an mainstream science piece →

September 20, 2015 by Keith Bradnam

A great blog post by Ewan Birney that describes the process of writing an commentary piece for the Guardian newspaper, and which also discusses the need for more involvement of scientists in the public discussion of science. I like the concluding remarks:

As practicing scientists, we need to continue laying the groundwork started long ago by many others…engaging consistently and non-judgmentally with our communities and policymakers about out work. There is a real task ahead of us in providing an accessible way for people to digest this information. We should take every opportunity to communicate on every level, from the most basic to state of the art. Only then can society really use the hard-earned information gleaned from genetics appropriately, and for the greater good.