BioDocker and BioBoxes: the containerization of bioinformatics

Thanks to a post on the BioCode's Notes blog I have discovered that there is a project called BioDocker which aims to generate lots of Docker containers to help make bioinformatics more reproducible by standardizing how bioinformatics software is packaged. From the BioDocker website:

The main purpose of this project is to spread the use of Docker on the Bioinformatics and Computational Biology areas. By using pre-configured containers with different bioinformatic softwares some critical aspects of Bioinformatics like reproducibility are minimized. Here you will find a list of containers with different bioinformatics software and how to use it.

BioDocker was created by Felipe da Veiga Leprevost in 2014, and the associated GitHub repository currently has a dozen or so containers.

When I was first read about BioDocker I was confused because I know that there is also the Bioboxes project which aims to er…make bioinformatics more reproducible by standardizing how bioinformatics software is packaged. From the Bioboxes manifesto:

Software has proliferated in bioinformatics and so have the problems associated with it: missing or unobtainable code, difficult to install dependencies, unreproducible workflows, all with terrible user experiences. We believe a community standard, using software containers, has the opportunity to solve these problems and increase the standard of scientific software as a whole.

I think the aims of these two projects are similar, but not identical and Bioboxes probably has a broader remit. Both projects are aware of each other and it looks like they have had some productive exchanges.

All of this makes me feel that the bioinformatics community seems to be slowly, but steadily, embracing Docker. Any approaches to standardize how we do bioinformatics should be welcomed, but some of us with long memories will recall that we have been in this situation before. Anyone remember the promises of how CORBA and then SOAP were going to increase interoperability in bioinformatics?

The name of this bioinformatics tool merits close inspection

  1. Bogus bioinformatics acronyms = mildly annoying
  2. Names that clash with previouly published tools = mildly annoying
  3. Bogus bioinformatics acronyms that clash with previouly published tools = very annoying

Step forward a new paper published in journal of Bioinformatics:

How does INSPEcT derive its name?

  • INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-course analysis)

Inclusion of the 'E' from 'degradation' and omission of 'R', 'C', or 'A' (from 'Rates', 'Course', and 'Analysis') earns this tool a JABBA award. It also earns a 'Duplications' award because of:

Bioinformatics is just like bench science and should be treated as such

A great post by Richard Edwards on his Cabbages of Doom blog, which includes a list of 8 shocking ways that bioinformatics is just like bench science. Highly recommended reading. His conclusion bears repeating here:

Bioinformatics is science. Full stop. It is no better than other science. It is no worse than other science. People do it right. People do it wrong.

Awkward Bioinformatics Conversations #1

Image from flickr user hades2k

Bob: Hi Sue, it's Bob. Got a favor to ask. Could you load up the UCSC Genome Browser site in your web browser please?

Sue: Hi Bob. So just to clarify…do you want me to load the UCSC Genome Browser homepage, or the UCSC Genome Browser website Genome Browser tool?

Bob: Wait, what?

Sue: UCSC Genome Browser is both the name of the website — as identified in their HTML metadata — and also the name of a tool on that website.

Bob: Er, just go to the main website first.

Sue: Done.

Bob: So it says that it's the UCSC Genome Browser website?

Sue: Yes. And no.

Bob: Huh?

Sue: The homepage identifies itself as the UCSC Genome Bioinformatics site but it also welcomes you to the UCSC Genome Browser website.

Bob: Okay, that sounds like you're looking at the right page then. So can you now please click on the Genome Browser link at the top of the page?

Sue: Which one?

Bob: What? Er, the one in the toolbar I guess.

Sue: Which one?

Bob: I just told you which one.

Sue: No, which toolbar?

Bob: There's more than one?

Sue: There's the horizontal toolbar that mostly contains dropdown menus with links that expose most of the site's functionality…and then there's the vertical toolbar which mostly offers links to items that also exist in sections of the horizontal toolbar.

Bob: But surely there's only one toolbar link which says 'Genome Browser'?

Sue: No, there are two.

Bob: But they go to the same place, right?

Sue: No. The horizontal toolbar link for 'Genome Browser' takes you into the Genome Browser tool with data loaded for the human genome assembly. The vertical toolbar link for 'Genome Browser' takes you to an intermediate page that lets you access the 'Human Genome Browser gateway'. Which one do you want? Bob? Hello???