101 questions with a bioinformatician #32: Aaron Quinlan

August 27, 2015 by Keith Bradnam

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

Aaron Quinlan is an Associate Professor of Human Genetics and Biomedical Informatics at the University of Utah and the Associate Director of the USTAR Center for Genetic Discovery.

His research focuses on "developing and applying computational methods towards the understanding of genetic variation in diverse contexts". This work had led to Aaron's involvement in the development of many popular bioinformatics tools, with Bedtools being one of the most well known. I wish he had time to blog more, because then we could all enjoy more writing like this:

Have you ever been incensed by the ridiculous number of chromosome naming and ordering schemes that exist in genomics? If the answer is “no”, then either you are an incredibly patient person, you enjoy unnecessary chaos, or you just haven’t done any detailed analysis of genomics datasets.

You can find out more about Aaron by visiting his lab's website, or by following him on twitter (@aaronquinlan). And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

I come from a creative family and have always enjoyed building things. There is pure joy in having the power to conceive and apply an algorithmic idea that has the potential to improve our understanding of the biology of the genome and the genetic basis of disease.

010. What's something that you don't enjoy about current bioinformatics research?

The fashion.

011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Take every math and statistics course possible and read constantly while you still have the time.

100. What's your all-time favorite piece of bioinformatics software, and why?

Without question, PolyBayes (Marth et al, 1999). I came to computational biology as a former software engineer without substantial training in biology. PolyBayes was the first Bayesian method for polymorphism detection and was written by my Ph.D. mentor, Gabor Marth. I spent much of my first year in graduate school dissecting the PolyBayes code (and the ACE file format)!!!) to understand the mathematic and data analysis strategies that were required at the time. That learning process has influenced much of the work I have done since.

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

N, since I constantly feel as though I am doing everything while also doing nothing.

DVD bonus materials

KRB: Because of the relative brevity of this interview, I thought that I would also share a couple of answers that Aaron gave me to some of the questions I also include when asking people to do these interviews (this info sometimes helps me write my introductions):

0111. What is the correct way of describing your current position or title(s)

Associate Professor of Human Genetics and Biomedical Informatics
Associate Director of the USTAR Center for Genetic Discovery
Sender of the emails and bringer of the donuts.

1001. In 1–2 sentences, describe what your role entails

Basically doing everything I can to not be a bottleneck for the people in my lab.

The Road to Hell is Paved with Bioinformatics Formats →

August 27, 2015 by Keith Bradnam

Another great post by Keith Robison:

If you really want to raise a bioinformaticist's blood pressure, loudly declare your new tool generates output in brand new data formats.

BioDocker and BioBoxes: the containerization of bioinformatics

August 26, 2015 by Keith Bradnam

Thanks to a post on the BioCode's Notes blog I have discovered that there is a project called BioDocker which aims to generate lots of Docker containers to help make bioinformatics more reproducible by standardizing how bioinformatics software is packaged. From the BioDocker website:

The main purpose of this project is to spread the use of Docker on the Bioinformatics and Computational Biology areas. By using pre-configured containers with different bioinformatic softwares some critical aspects of Bioinformatics like reproducibility are minimized. Here you will find a list of containers with different bioinformatics software and how to use it.

BioDocker was created by Felipe da Veiga Leprevost in 2014, and the associated GitHub repository currently has a dozen or so containers.

When I was first read about BioDocker I was confused because I know that there is also the Bioboxes project which aims to er…make bioinformatics more reproducible by standardizing how bioinformatics software is packaged. From the Bioboxes manifesto:

Software has proliferated in bioinformatics and so have the problems associated with it: missing or unobtainable code, difficult to install dependencies, unreproducible workflows, all with terrible user experiences. We believe a community standard, using software containers, has the opportunity to solve these problems and increase the standard of scientific software as a whole.

I think the aims of these two projects are similar, but not identical and Bioboxes probably has a broader remit. Both projects are aware of each other and it looks like they have had some productive exchanges.

All of this makes me feel that the bioinformatics community seems to be slowly, but steadily, embracing Docker. Any approaches to standardize how we do bioinformatics should be welcomed, but some of us with long memories will recall that we have been in this situation before. Anyone remember the promises of how CORBA and then SOAP were going to increase interoperability in bioinformatics?

The name of this bioinformatics tool merits close inspection

August 25, 2015 by Keith Bradnam

Bogus bioinformatics acronyms = mildly annoying
Names that clash with previouly published tools = mildly annoying
Bogus bioinformatics acronyms that clash with previouly published tools = very annoying

Step forward a new paper published in journal of Bioinformatics:

INSPEcT: a computational tool to infer mRNA synthesis, processing and degradation dynamics from RNA- and 4sU-seq time course experiments

How does INSPEcT derive its name?

INSPEcT (INference of Synthesis, Processing and dEgradation rates in Time-course analysis)

Inclusion of the 'E' from 'degradation' and omission of 'R', 'C', or 'A' (from 'Rates', 'Course', and 'Analysis') earns this tool a JABBA award. It also earns a 'Duplications' award because of:

Bioinformatics is just like bench science and should be treated as such →

August 24, 2015 by Keith Bradnam

A great post by Richard Edwards on his Cabbages of Doom blog, which includes a list of 8 shocking ways that bioinformatics is just like bench science. Highly recommended reading. His conclusion bears repeating here:

Bioinformatics is science. Full stop. It is no better than other science. It is no worse than other science. People do it right. People do it wrong.

Awkward Bioinformatics Conversations #1

August 21, 2015 by Keith Bradnam

Bob: Hi Sue, it's Bob. Got a favor to ask. Could you load up the UCSC Genome Browser site in your web browser please?

Sue: Hi Bob. So just to clarify…do you want me to load the UCSC Genome Browser homepage, or the UCSC Genome Browser website Genome Browser tool?

Bob: Wait, what?

Sue: UCSC Genome Browser is both the name of the website — as identified in their HTML metadata — and also the name of a tool on that website.

Bob: Er, just go to the main website first.

Sue: Done.

Bob: So it says that it's the UCSC Genome Browser website?

Sue: Yes. And no.

Bob: Huh?

Sue: The homepage identifies itself as the UCSC Genome Bioinformatics site but it also welcomes you to the UCSC Genome Browser website.

Bob: Okay, that sounds like you're looking at the right page then. So can you now please click on the Genome Browser link at the top of the page?

Sue: Which one?

Bob: What? Er, the one in the toolbar I guess.

Sue: Which one?

Bob: I just told you which one.

Sue: No, which toolbar?

Bob: There's more than one?

Sue: There's the horizontal toolbar that mostly contains dropdown menus with links that expose most of the site's functionality…and then there's the vertical toolbar which mostly offers links to items that also exist in sections of the horizontal toolbar.

Bob: But surely there's only one toolbar link which says 'Genome Browser'?

Sue: No, there are two.

Bob: But they go to the same place, right?

Sue: No. The horizontal toolbar link for 'Genome Browser' takes you into the Genome Browser tool with data loaded for the human genome assembly. The vertical toolbar link for 'Genome Browser' takes you to an intermediate page that lets you access the 'Human Genome Browser gateway'. Which one do you want? Bob? Hello???

Shining a light on more bogus bioinformatics acronyms

August 20, 2015 by Keith Bradnam

Courtesy of an anonymous tip off…

There is a new bioinformatics tool that was described in a recently published BMC Genomics article. Here is the full name of the tool with any capitalization removed:

automated tool for bacterial genome annotation comparison

So can you guess what acronym/name was extracted from this description?

ATBGAC?
AutoBGA
BGAC?

No. The JABBA-award-winning name of this tool is as follows:

BEACON: automated tool for Bacterial gEnome Annotation ComparisON

This name really isn't helped by the fact that it is shown as follows in the journal article title (with the G of 'Genome' also capitalized):

BEACON: automated tool for Bacterial GEnome Annotation ComparisON

101 questions with a bioinformatician #31: Morgan Taschuk

August 19, 2015 by Keith Bradnam

Morgan Taschuk is a Senior Manager for Genome Sequence Informatics at the Ontario Institute for Cancer Research (OICR). She manages the production sequence analysis team to analyse all of the sequence data sequenced at OICR, resulting in the generation of alignment files, variant calls, QC metrics and other bountiful amounts of sequence data for OICR researchers and collaborators.

She recently wrote a great blog post regarding the (sometimes contentious) issue of Biologists vs Bioinformaticians. Definitely worth a read. Morgan has also recently started to assemble a Twitter list of Women in Bioinformatics, now up to 179 members. I'm sure she would like to make that list even longer, so please let her know of any omissions.

You can find out more about Morgan by visiting her Modern Model Organism blog, or by following her on twitter (@morgantaschuk). And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

There's always something more to learn. I'm spending a lot of time with our genomics lab recently and learning about how lab processes impact our data fascinates me. Bioinformatics skills are usually in demand so I also get to work with a wide variety of people with different questions and problems and have to stretch my brain to apply myself.

010. What's something that you don't enjoy about current bioinformatics research?

Often people write their own scripts or software instead of looking for something that already exists out there. Not only is it wasted effort for very similar results, it sabotages any attempt to standardize across the field. Open-source software is there for everyone to change and improve. Why not build on a foundation instead of digging the hole yourself?

011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Since nobody can tell you what bioinformatics is, it's up to you to define it. I spent a long time fighting with imposter syndrome, not just because I felt inadequate but also because I was called a bioinformatician when I didn't fit the classical model. Nobody fits the classical model these days. Thinking about this question actually inspired me to write a blog post about the difference between bioinformaticians and computational biologists. Judging from the feedback on Twitter and the blog, the problem of defining what a bioinformatician is still really sticks in people's throats.

100. What's your all-time favorite piece of bioinformatics software, and why?

SAMtools. It's an amazing piece of very stable, utilitarian, open source code that forms the backbone of most sequencing pipelines.

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

I struggled the most with this question! Y, because 'pyrimidine' is a pretty word and so Y not.

~crickets~

The names of bioinformatics tools that help study evolution shouldn't feel that they also have to evolve

August 18, 2015 by Keith Bradnam

Thanks to Torsten Seemann for bringing this to my attention…

In 2003 a bioinformatics tool was published. A tool with a thoroughly sensible name and acronym:

BIBI: a Bioinformatics Bacterial Identification Tool

A simple name with a simple, and not-too-bogus. initialism. Bravo. However, a subsequent update to BIBI brought about a change to the name:

^le BIBI:

Where the 'le' refers to 'light edition'. It should be said that most references to this tool drop the superscript notation for 'le'. Let's move forward to the present day and the publication of another version of this tool:

leBIBI ^QBPP : a set of databases and a webtool for automatic phylogenetic analysis of prokaryotic sequences

The full expansion of this new name is as follows:

Light Edition Bioinformatics Bacterial Identification Tool Quick Bioinformatic Phylogeny of Prokaryotes

Quite a mouthful! Bonus points for including 'Bioinformatics' and 'Bioinformatic' as part of the same name, as well as the largely redundant inclusion of 'Bacterial' as well as 'Prokaryotes'.

Generally I find use of superscript in software names to be largely unnecessary. It can make the tool name harder to read and it is unlikely to reproduced verbatim by others who mention your software. Starting your software with a lowercase letter also means that this might appear in uppercase if used to start a sentence (as happens several times in the above paper). Not a terrible problem but it reduces the strength and consistency of your 'bioinformatics brand'.

New JABBA award for Just Another Bogus Bioinformatics Acronym

August 13, 2015 by Keith Bradnam

Here's a new tool that was described recently in the journal of Bioinformatics:

GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach

Let's see how the letters in 'GREGOR' are derived:

G = Genomic — a solid start
R = Regulatory — fine
E = Elements — all good so far
'and' — okay, we'll allow a conjunction or two
G = Gwas — hmm, including an acronym/initialism inside another acronym is rarely a good idea
O = Overlap — that's fine, time for the big finish…
R = algoRithm — oh come on!

Congratulations GREGOR — or should I say GREGOA? — you win a JABBA award!