Finding bogus bioinformatics acronyms sometimes requires a laser-like focus

December 04, 2015 by Keith Bradnam

A new paper has been published in the journal BMC Research Notes:

LASER: Large genome ASsembly EvaluatoR

This name is:

Bogus — the word 'genome' doesn't contribute any letters to 'LASER' and two letters ('S' and 'R') are not derived from the initial letters of words.
Duplicated — there are at least two other bioinformatics tools called LASER (see here and here).
Undiscoverable — you really need to search Google for LASER genome assembly before you see this as a top result.
Ambiguous — large is a very subjective term. The authors imply that LASER is suitable for human genomes. These are larger than some genomes but smaller than others.
Inconsistent — the paper reveals that LASER is built on the code of QUAST (Quality Assessment Tool for Genome Assemblies). This means you end up with the somewhat bizarre documentation for how to run the program called LASER:

The example included with LASER installation can be run as:

./quast.py testdata/contigs1.fasta testdata/contigs2.fasta \ -R testdata/reference.fasta.gz -G testdata/genes.txt \ -O test_data/operons.txt

The output of LASER program can be viewed in file: ./quast_results/latest/report.txt

So to run LASER just type 'quast'!

Learn my Linux Bootcamp…all from within a web browser window

December 03, 2015 by Keith Bradnam

I awoke yesterday to see a lot of twitter notifications on my phone. Sometimes this happens when I've written a post on this blog, but I hadn't added anything for over a week. Turns out that the activity was triggered by this tweet by Richard Smith-Unna (@blahah404 on twitter):

I made a 'Command-line bootcamp' adventure for learning unix in the browser. https://t.co/IiDmqnCTDf #trendngs15 @kbradnam @denormalize
— ⓪ Rik Smith-Unna (@blahah404) December 2, 2015

As the screenshot below indicates, Richard has worked some amazing black magic to enable a single browser window to contain a fully interactive terminal as well as a file viewer/navigator; all alongside a (slightly modified) version of my original Linux bootcamp material.

This new interactive command-line bootcamp is a wonderful resource and means that the only barrier to learning some simple, but powerful, Linux/Unix commands is the availability of a web browser.

Richard explains a little about how he put all of this together:

The Infrastructure, including adventure-time and docker-browser-server, was built by @maxogden and @mafintosh. The setup of this app was based on the get-dat adventure.

Slides from my exit seminar

November 20, 2015 by Keith Bradnam

This morning I gave my last presentation at UC Davis. My highly informal exit seminar was a great opportunity to reflect on some of the many projects I've been involved with over the last decade here at Davis. Thank you to all who came, and a special thanks to Ian Korf for his kind introduction.

I include the slides below, but note that some of these slides won't make much sense without the narration (and you also get to miss out on two embedded videos). There was some video recorded via the Periscope app, but I found out today that Periscope only keeps video around for 24 hours, so unfortunately if you didn't watch the video when you had the chance it is now lost.

2015-11-21 12.34: Updated to reflect that Periscope video content is no longer available.

JABBA vs Jabba: when is software not really software?

November 18, 2015 by Keith Bradnam

It was only a matter of time I guess. Today I was alerted to a new publication by Simon Cockell (@sjcockell), it's a book chapter titled:

Jabba: Hybrid Error Correction for Long Sequencing Reads Using Maximal Exact Matches

From the abstract:

Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned. In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data

Now as far as I can tell, this Jabba is not an acronym, so we safely avoid the issue of presenting a JABBA award for Jabba. However, one might argue that naming any bioinformatics software 'Jabba' is going to present some problems because this is what happens when you search Google for 'Jabba bioinformatics'.

There is a bigger issue with this paper that I'd like to address though. It is extremely disappointing to read a software bioinformatics paper in the year 2015 and not find any explicit link to the software. The publication includes a link to http://www.ibcn.intec.ugent.be, but only as part of the author details. This web page is for the Internet Based Communication Networks and Services research group at the University of Gent. The page contains no mention of Jabba, nor does their 'Facilities and Tools' page, nor does searching their site for Jabba.

Initially I wondered if this is paper is more about the algorithm behind Jabba (equations are provided) and not about an actual software implementation. However, the paper includes results from their Jabba tool in comparison to another piece of software (LoRDEC) and includes details of CPU time and memory requirements. This suggests that the Jabba software exists somewhere.

To me this is an example of 'closed science' and represents a failure of whoever reviewed this article. I will email the authors to find out if the software exists anywhere…it's a crazy idea but maybe they might be interested if people could, you know, use their software.

Update 2015-11-20: I heard back from the authors…the Jabba software is on GitHub.

The five habits of bad bioinformaticians [Link] →

November 18, 2015 by Keith Bradnam

Mick Watson gets a few things off his chest in his latest post:

When ever I see bad bioinformatics, a little bit of me dies inside, because I know there is ultimately no reason for it to have happened

ACGT is now AFCW (Approved for Free Cultural Works): thoughts on switching to a CC-BY license

November 15, 2015 by Keith Bradnam

This website, as well as my personal website and Rescued by Code, licenses material under a Creative Commons license. Specifically, I've been using the Attribution Non-Commerical license, popularly known as CC BY-NC. My joint venture with Abby Yu, The Take-Home Message web comic, has been even more restrictive and has been licensing content under the Attribution Non-Commercial Share-Alike license (CC BY-NC-SA).

These choice of licenses is something that's been on my mind for a while. I've known that I'm not being as open as I could be and maybe this has stemmed from an unwarranted (not to mention unlikely) fear that someone would take all my blog posts and somehow seek to profit from them.

Today I saw a tweet by Rogier Kievit (@rogierK) that has helped me change my mind:

Three posts why CC-BY is better than restrictive licenses https://t.co/C34dWcixpu https://t.co/VnIgnJMXNh https://t.co/QesEriirXm
— rogier kievit (@rogierK) November 13, 2015

I found the third link — something that is now over a decade old — particularly persuasive and accordingly I have switched all of my website licenses to CC-BY. Apparently this means that all of my writings now fall into the category of Free Cultural Works. I am grateful to Abby Yu to agreeing to this change for The Take-Home Message.

This change also means that someone can now use my blog posts to write the definitive book on JABBA-awards…just as long as they give me appropriate attribution.

101 questions with a bioinformatician #36: Alicia Oshlack

November 12, 2015 by Keith Bradnam

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

Alicia Oshlack is the Head of Bioinformatics at the Murdoch Childrens Research Institute (they don't like apostrophes) in Melbourne, Australia. Her research focuses on four main project areas: methods for analysing RNA-seq data, epigenomics, clinical genomics data analysis, and cancer genomics.

Before moving into the field of genomics, Alicia had a background in astronomy and her Ph.D. work concerned the structure of radio quasars. Not many bioinformaticians can claim to have published papers on the topic of estimating the mass of black holes!

You can find out more about Alicia by reading her Wikipedia page or by following her on twitter (@AliciaOshlack). I also encourage you to check out her must read article for fellow computational biologists: A 10-step guide to party conversation for bioinformaticians. And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

I love the pace at which things are changing in the field. There is always something new to work on and there are so many ways to contribute something useful to the research community. I also really love the balance between collaborative analysis on really interesting biological problems and doing careful methods development.

010. What's something that you don't enjoy about current bioinformatics research?

I get frustrated that I need to spend so much of my time convincing people that bioinformatics is a real scientific research discipline where we have deep scientific training and use our brains to solve scientific problems. Hopefully I will have convinced everyone in Australia soon.

011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

I did my PhD in astrophysics and I often wonder if I would have been better off doing a more relevant subject but I really appreciate the skills I learnt doing that. Within this I probably would tell myself to put a bit more focus on programming and do statistics instead of applied mathematics.

100. What's your all-time favorite piece of bioinformatics software, and why?

I think limma is amazing. Have you seen the users guide? I think it's 145 pages long and although it was originally developed for microarray analysis more than 12 years ago it has adapted to the sequencing revolution and is used more than ever now. I believe it is the most widely used bioconductor analysis package ever.

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

I think S = G/C because I'm always a little bit biased.

Beautiful logo redesign as part of the rebranding of Crossref

November 11, 2015 by Keith Bradnam

Crossref — the non-profit organization that helps make academic content easier to find, link, cite and assess — has today announced a rebranding. They will be announcing new names and new logos for all of their products, and the Crossref logo itself gains a beautiful looking new design. So we say 'goodbye' to this:

And 'hello' to this lovely logo:

The explanation for why they wanted to change the logo makes a lot of sense to me:

We needed an icon to give more flexibility across the web that a word mark cannot do alone. The icon is made up of two interlinked angle brackets familiar to those who work with metadata, and can also act as arrows depicting Metadata In and Metadata Out, two themes under which our services can generally be grouped.

As part of this rebranding, they are formalizing a change from CrossRef to Crossref (with lower case 'R'). Someone had a fun job updating their Wikipedia page:

Wikipedia edit history: CrossRef > Crossref. Click to enlarge.

Assemble a genome and evaluate the result [Link] →

November 11, 2015 by Keith Bradnam

There is a new page on the bioboxes site (such a great name!) which details how bioboxes can be used to assemble a genome and then evaluate the results:

A common task in genomics is to assemble a FASTQ file of reads into a genome assembly and followed by evaluating the quality of this assembly. This recipe will explore using bioboxes to do this task.

A third Assemblathon contest came very close to launching earlier this year…except that it didn't — maybe this will be the subject of a future blog post! — and we planned to make biobox containers a requisite part of submitting assemblies. If Assemblathon 3 ever gets off the ground I feel happier knowing that the bioboxes team is doing so much great work that will make running such a contest easier to manage.

Time to toggle the JABBA-award status of this bioinformatics software name

November 09, 2015 by Keith Bradnam

Give me a B.
Give me a O.
Give me a G.
Give me a U.
Give me a S.

What have you got?

Another BOGUS bioinformatics acronym! This time it is courtesy of the journal BMC Bioinformatics:

TOGGLE: toolbox for generic NGS analyses

I think you can already see why this one is going to win a JABBA award. The name 'TOGGLE' derives from TOolbox for Generic nGs anaLysEs. Using their same strategy, they could have also gone for BOGGLE, BONNY, or even BORINGLY.