101 questions with a bioinformatician #27: Michael Barton

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

Michael Barton is a Bioinformatics Systems Analysis at the Joint Genome Institute (that makes him a JGI BSA?). His work involves developing automated methods for the quality checking of sequencing data and evaluating new bioinformatics software. He may introduce himself as Michael, but as his twitter handle suggests, he is really Mr. Bioinformatics.

His nucleotid.es website is doing amazing things in the field of genome assembly by using Docker containers to try to parcel up genome assembly pipelines. This is enabling the 'continuous, objective and reproducible evaluation of genome assemblers'. Related to this is the bioboxes project — a great name by the way — which may just succeed in revolutionizing how bioinformatics is done. From the bioboxes manifesto:

Software has proliferated in bioinformatics and so have the problems associated with it: missing or unobtainable code, difficult to install dependencies, unreproducible workflows, all with terrible user experiences. We believe a community standard, using software containers, has the opportunity to solve these problems and increase the standard of scientific software as a whole.

You can find out more about Michael by visiting his Bioinformatics Zen blog or by following him on twitter (@bioinformatics). And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

I like that I get to work on interesting problems from time to time. This may seem a trite and easy answer to give, however after university I worked in some jobs that I extremely disliked and I wondered where exactly my life had gone wrong to end up in this position. Luckily I was able to do a Masters degree, get into the field of bioinformatics, and change my career. This job can still be tedious at times, like any job, however overall I am always grateful to work on projects that people find interesting and useful, instead of answering phones in a call centre where someone may start yelling at me.

010. What's something that you don't enjoy about current bioinformatics research?

Getting different tools to work together. Take three different bioinformatics tools that should work together as a pipeline then actually try implementing that pipeline. Then try running it on 10 different datasets from 10 different publications.

As bioinformaticians we're our own worst enemy when it comes to agreeing on community standards so that we can effectively focus on doing research instead of writing glue code. My opinion is that this won't change until we move away from publications as the metric for an academic's value, and instead look at valuing other types of contributions.

011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Work can't be everything. I write this because you can work very hard, get everything right and still fail. At least that's my opinion of academia today and I think that's because of how little funding there is. In addition here is some concrete advice which I wish I had known earlier:

  • Learn a functional programming language well enough to write something non-trivial. Functional programming stimulates you to think about solving problems differently, in ways that are often more succinct and robust. Try Haskell or Clojure.
  • Learn linear algebra. Many analyses can be reduced to linear algebra operations, and many biological data can be expressed as matrices. Combining the two can help simplify and reason about problems.

100. What's your all-time favorite piece of bioinformatics software, and why?

Anything written by Hadley Wickham. R is the de-facto analysis language for bioinformatics and Hadley's libraries make R much more enjoyable to use. I highly recommend ggplot2 and dplyr.

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

T — because I drink a couple of pots of tea each day at work, usually Redbush.