An automated attempt to identify duplicated software names

From time to time I've been pointing out instances of duplicated software names in bioinformatics. I assume that many people reuse the name of an existing tool simply because they haven't first checked — or checked thoroughly — to see if someone else has already published a piece of software with the same name.

I am not alone in my concern over this issue and Neil Saunders (@neilfws on twitter) has gone one step further and written some code to try to track down instances of duplicated names. He recently wrote a post on his What You're Doing Is Rather Desperate blog to explain more:

In the article, he describes how he used a Ruby script to parse the titles of articles downloaded from PubMed Central and then feeds this info into an R script to identify examples of duplicated software names. The result is a long list of duplicated software names (available on GitHub).

I'm not surprised to learn that generic names like 'FAST' and 'PAIR' appear on this list. However, I was surprised to see that in the same year (2011), two independent publications both decided to name their software 'COMBREX':