101 questions with a bioinformatician #23: Todd Harris

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

Todd Harris is a Bioinformatics Consultant and Project Manager at WormBase. I first came to know Todd when I was also working on the WormBase project. As part of the UK operation (based at the Sanger Institute), we would frequently refer to him as 'SuperTodd' for his amazing skills at single-handedly keeping the WormBase website updated and working smoothly.

In looking after the public face of WormBase, Todd has also had to deal with all of the problems that accompany an ever-expanding database. We can both remember the time when WormBase contained data from just one species (Caenorhabditis elegans). Nowadays, Todd has to manage genome data from over 25 different nematode species!

Away from WormBase, Todd has been hitting it out of the park with some of his recent blog posts, and I particularly encourage everyone interested in bioinformatics training to read his recent thoughts on It’s time to reboot bioinformatics education.

You can find out more about Todd by visiting his blog, or by following him on twitter (@tharris). And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

Spending a lot of time on the services and architecture side of bioinformatics, I’m most excited about the democratizing influence and new efficiencies of the cloud.

Bioinformatics projects are often reliant on glacial and, dare I say, surly IT departments. Institutional regulations can slow development and hinder data sharing. The cloud dispenses with these limitations. It has leveled the playing field for contributions from the smallest, primarily-teaching, undergraduate institution to the most remote agricultural school. It's also opened the door for lean startups in the bioinformatics space.

Now you can script your entire architecture and commit it to version control alongside the rest of your code. By placing your data in the cloud, other bioinformaticians can quickly verify your results without lengthy downloads, and without problems caused by differences in architectures or installed software stacks. It’s the ultimate in transparency and reproducibility.

010. What's something that you don't enjoy about current bioinformatics research?

My central complaint about bioinformatics research is that it regularly isn’t conducted with the same scientific rigor expected of wet lab projects. Oftentimes it isn’t hypothesis driven. Results can be difficult to replicate because of ephemeral data sets, obscure file formats, opaque and inconsistent file names and insufficient documentation. Many bioinformaticians have never worked at the bench and don’t seem to understand the value of a good lab notebook.

011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Learn the most current visualization tool. Pictures really are worth a thousand words when conveying complex ideas. Your chosen platform will eventually lose favor and fade to obscurity but learning to think visually will translate to any new language or tool. Also: pay attention to the ergonomics of your workspace. Good to keep your eyesight, shoulders arms, and wrists in operable condition for many years.

100. What's your all-time favorite piece of bioinformatics software, and why?

BioPerl. It does everything you need and many things you don't! Oh, and GBrowse which set a good early example for thorough documentation, customizability and extensibility.

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

I’m more of a motif person. TWANG has a good ring to it.