101 questions with a bioinformatician #36: Alicia Oshlack

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Alicia Oshlack is the Head of Bioinformatics at the Murdoch Childrens Research Institute (they don't like apostrophes) in Melbourne, Australia. Her research focuses on four main project areas: methods for analysing RNA-seq data, epigenomics, clinical genomics data analysis, and cancer genomics.

Before moving into the field of genomics, Alicia had a background in astronomy and her Ph.D. work concerned the structure of radio quasars. Not many bioinformaticians can claim to have published papers on the topic of estimating the mass of black holes!

You can find out more about Alicia by reading her Wikipedia page or by following her on twitter (@AliciaOshlack). I also encourage you to check out her must read article for fellow computational biologists: A 10-step guide to party conversation for bioinformaticians. And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

I love the pace at which things are changing in the field. There is always something new to work on and there are so many ways to contribute something useful to the research community. I also really love the balance between collaborative analysis on really interesting biological problems and doing careful methods development.



010. What's something that you don't enjoy about current bioinformatics research?

I get frustrated that I need to spend so much of my time convincing people that bioinformatics is a real scientific research discipline where we have deep scientific training and use our brains to solve scientific problems. Hopefully I will have convinced everyone in Australia soon.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

I did my PhD in astrophysics and I often wonder if I would have been better off doing a more relevant subject but I really appreciate the skills I learnt doing that. Within this I probably would tell myself to put a bit more focus on programming and do statistics instead of applied mathematics.



100. What's your all-time favorite piece of bioinformatics software, and why?

I think limma is amazing. Have you seen the users guide? I think it's 145 pages long and although it was originally developed for microarray analysis more than 12 years ago it has adapted to the sequencing revolution and is used more than ever now. I believe it is the most widely used bioconductor analysis package ever.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

I think S = G/C because I'm always a little bit biased.

Beautiful logo redesign as part of the rebranding of Crossref

Crossref — the non-profit organization that helps make academic content easier to find, link, cite and assess — has today announced a rebranding. They will be announcing new names and new logos for all of their products, and the Crossref logo itself gains a beautiful looking new design. So we say 'goodbye' to this:

 

And 'hello' to this lovely logo:

 

The explanation for why they wanted to change the logo makes a lot of sense to me:

We needed an icon to give more flexibility across the web that a word mark cannot do alone. The icon is made up of two interlinked angle brackets familiar to those who work with metadata, and can also act as arrows depicting Metadata In and Metadata Out, two themes under which our services can generally be grouped.

As part of this rebranding, they are formalizing a change from CrossRef to Crossref (with lower case 'R'). Someone had a fun job updating their Wikipedia page:

Wikipedia edit history: CrossRef > Crossref. Click to enlarge.

Assemble a genome and evaluate the result [Link]

There is a new page on the bioboxes site (such a great name!) which details how bioboxes can be used to assemble a genome and then evaluate the results:

A common task in genomics is to assemble a FASTQ file of reads into a genome assembly and followed by evaluating the quality of this assembly. This recipe will explore using bioboxes to do this task.

A third Assemblathon contest came very close to launching earlier this year…except that it didn't — maybe this will be the subject of a future blog post! — and we planned to make biobox containers a requisite part of submitting assemblies. If Assemblathon 3 ever gets off the ground I feel happier knowing that the bioboxes team is doing so much great work that will make running such a contest easier to manage.

Time to toggle the JABBA-award status of this bioinformatics software name

Give me a B.
Give me a O.
Give me a G.
Give me a U.
Give me a S.

What have you got?

Another BOGUS bioinformatics acronym! This time it is courtesy of the journal BMC Bioinformatics:

I think you can already see why this one is going to win a JABBA award. The name 'TOGGLE' derives from TOolbox for Generic nGs anaLysEs. Using their same strategy, they could have also gone for BOGGLE, BONNY, or even BORINGLY.

How to ask for bioinformatics help online

Part two of a two-part series.

In part one I covered where to ask for bioinformatics help. Now it is time to turn to the issue of how you should go about asking for help. Hat tip to reader Venu Thatikonda (@nerd_yie) for pointing me out to this 2011 PLOS Computational Biology article that tackles similar ground to this blog post. Here are my five main suggestions, with the last one being further broken down into 9 different tips:

  1. Be polite. Posting a question to an online forum does not mean that you deserve to be answered. If people do answer, they are usually doing so by giving up their own free time to try to help you. Don't berate people for their answers, or insult them in any way.
  2. Be relevant. Choose the right forum in which to ask your question. Sites like SEQanswers have different forums that discuss particular topics, so don't post your PacBio question in the Ion Torrent forum.
  3. Be aware of the rules. Most online forums will have some rules, guidelines, and/or an FAQ which covers general posting etiquette and other things that you should know. It is a good idea to check this before posting on a site for the first time.
  4. Be clever. Search the forum before asking your question, there is often a good chance that your question has already been asked (and answered) by others.
  5. Be helpful. The biggest thing you can probably do in order to get a useful answer to a question is to provide as many useful details as possible, these include:
    1. Type of operating system and version number, e.g. Mac OS X 10.10.5.
    2. Version number/name of software tool(s) you are using, e.g. NCBI BLAST+ v2.2.26, Perl v5.18.2 etc. A good bioinformatics or Unix tool will have a -v, -V, or --version command-line option that will give you this information.
    3. Any error message that you saw. Report the full error message exactly as it appeared.
    4. Where possible, provide steps that would let someone else reproduce the problem (assuming it is reproducible).
    5. Outline the steps that you have tried, if any, to fix the problem. Don't wait for someone to suggest 'quit and restart your terminal' before you reply 'Already tried that'.
    6. A description of what you were expecting to happen. Some perceived errors are not actually errors at all (the software was doing exactly what was asked of it, though this may not be what the user was expecting).
    7. Any other information that could help someone troubleshoot your problem, e.g. a listing of your Unix terminal before and/or after you ran a command which caused a problem.
    8. A snippet of your data that would allow others to reproduce the problem. You may not be able to upload data to the website in question, but small data snippets could be shared via a Dropbox or Google Drive link, or on sites like Github gist.
    9. Attach a screenshot that illustrates the problem. Many forum sites allow you to add image files to a post.

Any other suggestions?

 

Updates

2015-11-08 09.44: Added link to PLOS Computational Biology article

Gender ratio of speakers at today's Festival of Genomics California conference

The Festival of Genomics Conference California conference starts today. From the speaker lineup I count 132 speakers with a gender ratio of 72.7% men and 27.3% women. This is a good ratio compared to many (most?) genomics conferences — see Jonathan Eisen's many excellent posts on this subject — and it exceeds the background level of women in senior roles in genome institutes around the world (a figure I previously calculated as 23.6%).

However, it was because the ratio of women speakers was below my self-imposed target of 33.3% that I withdrew Front Line Genomic's kind offer of a speaking position and requested that they instead offer my slot to a woman.

I think Front Line Genomics are ahead of many conference organizers in addressing gender bias, and I look forward to seeing the final lineup at their upcoming Festival of Genomics London conference.

This post is to serve as a reminder that we, as a community, still need to do much better at addressing gender bias in our field, and that men can actively help this process by refusing to speak or present at conferences which show extreme bias. Preferably, I would like others to adopt my 33.3% target as a minimum ratio that we should be aiming for (this applies both ways, though there doesn't seem to be much likelihood of men feeling underepresented any time soon).

A timely call to overhaul how scientists publish supplementary material [Link]

Great new editorial piece in BMC Bioinformatics by Mihai Pop and Steven Salzberg that tackles a subject that people probably don't think about too much:

They highlight some of the problems that arise from the growing trend in some journals to publish very short articles that are accompanied by extremely lengthy supplementary material. They single out a few particularly lop-sided papers — including a 6-page article that has 165 pages of supplementary material — and make some solid observations about why this facet of publishing has become problem. Perhaps most importantly, citations that are buried in supplementary material do not get tracked by citation indices.

They conclude the paper with a proposal:

The ubiquitous use of electronic media in modern scientific publishing provides an opportunity for the better integration of supplementary material with the primary article. Specifically, we propose that supplementary items, irrespective of format, be directly hyper-linked from the text itself. Such references should be to specific sections of the supplementary material rather than the full supplementary text.

Yes, yes, a thousand times yes!

Where to ask for bioinformatics help online

Part one of a two-part series. In part two I tackle the issue of how to ask for help online.

You have many options when seeking bioinformatics help online. Here are ten possible places to ask for help, loosely arranged by their usefulness (as perceived by me):

  1. SEQanswers — the most popular online forum devoted to bioinformatics?
  2. Biostars — another very popular forum.
  3. Mailing lists — many useful bioinformatics tools have their own mailing lists where you can ask questions and get help from the developers or from other users, e.g. SAMtools and Bioconductor. Also note that resources such as Ensembl have their own mailing lists for developers.
  4. Google Discussion Groups — as well as having very general discussion groups, e.g. Bioinformatics, there are also groups like Tuxedo Tool Users…the perfect place to ask your TopHat or Cufflinks question.
  5. Stack Overflow — more suited for questions related to programming languages or Unix/Linux.
  6. Google — I'm including this here because I have solved countless bioinformatics problems just by searching Google with an error message.
  7. Reddit — try asking in r/bioinformatics or r/genome.
  8. Twitter — this may be more useful if you have enough followers who know something about bioinformatics, but it is potentially a good place to ask a question, though not a great forum for long questions (or replies). Try using the hashtag #askabioinformatician (this was @sjcockell's idea).
  9. Voat — Voat is like reddit's younger, hipster nephew. However, the bioinformatics 'subverse' is not very active.
  10. Research Gate — you may know it better as 'that site that sends me email every day', but some people use this site to ask questions about science. Surprisingly, they have 15 different categories relating to bioinformatics.
  11. LinkedIn — Another generator of too many emails, but they do have discussion groups for bioinformatics geeks and NGS.

Other suggestions welcome.

 

Updates

2015-11-02 09.53: Added twitter at the suggestion of Stephen Turner (@nextgenseek).

A rare example of a simple, fun, non-bogus name for a bioinformatics tool

Recently published in the journal Genome Biology, we have:

I like this name a lot because it is:

  • Memorable
  • Pronounceable
  • Simple, but also clever (combining elements of HiC and 5C)
  • Fun (a play on 'high five')
  • Not an acronym (so not a bogus acronym either)
  • Unique (can't find any other tools with this name)
  • Relevant (the short name has a connection to the data that the tool works with).

Maybe I need to start designing some sort of 'Anti-JABBA' award?

10 years of Open Access at the Wellcome Trust in 10 numbers [Link]

A great summary of how the Wellcome Trust has helped drive big changes in open access publishing. Of the ten numbers that the post uses to summarise the last decade, this one surprised me the most:

20% – the volume of UK-funded research which is freely available at the time of publication
A recent study commissioned by Universities UK found that 20% of articles authored by UK researchers and published in the last two years were freely accessible upon publication. This figure increases to 24% within six months of publication, and 32% within 12 months.

If you had asked me to guess what this number would be, I think I would have been far too optimistic. Even the figure of 32% of articles being free within 12 months seems lower than I would imagine. Lots of progress still to be made!