101 questions with a bioinformatician #35: Aaron Darling

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

Aaron Darling is an Associate Professor at the ithree institute — where capital letters are in short supply? — which is part of UTS (University of Technology Sydney). His research focuses on developing computational and molecular techniques to characterize the hidden world of microbes. He helped develop the Mauve multiple genome alignment tool and continues to work on this and other software tools. Aaron also has a long-standing interest in poop:

Of course this interest is all part of an ongoing research project, one that is seeking to understand the development of the infant gut microbiome.

You can find out more about Aaron by visiting his lab's website, or by following him on twitter (@koadman). And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

The growing interplay between informatics, molecular biology, and experimental design is very exciting. In the past 10 years many problems that could only have been solved through decades of experimental work have been transformed from experimental problems to data analysis problems. I think this trend will only accelerate as our technology to interface digital computational systems with biological systems continues to improve. And data analysis feeds back to inspire new experimental designs in a feedback loop that's getting ever-shorter. As an informatician I find it especially fun to discover new ways of designing the lab work that solves long-standing data analysis problems.

010. What's something that you don't enjoy about current bioinformatics research?

Data wrangling and data mangling. This is almost certainly cliche by now but inconsistently implemented file formats are the bane of bioinformatics. This was apparent to me within weeks of starting in the field, as my first assigned task was to write a sequence file format parsing library for the E. coli genome project team. I often wonder why I didn't run as fast as I could in the opposite direction.

011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Early on I benefited from a nugget of wisdom in Dan Gusfield's sequence analysis book which emphasized the importance of solving biological data analysis problems that are core to the biology, not the technology platform used to measure the biology. For example the general sequence alignment problem vs. short read alignment. Those are the contributions that are going to matter over the long term. I wish I had also appreciated early on that the elegance and simplicity of the solution, and especially the code implementing it, matters just as much.

100. What's your all-time favorite piece of bioinformatics software, and why?

Probably BEAST, because I learned so much about phylogenetic models, MCMC, and software design from using it and coding up modules for it.

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

H, because as a teenager I always wanted to be a G but in reality was everything but.