The most haplotyped place on Earth: 'DNA Land' is open for business!

DNA Land has opened! If you are curious what DNA Land is, well here is the concise description offered by the website:

DNA Land is a place where you can learn more about your genome while enabling scientists to make new genetic discoveries for the benefit of humanity. Our goal is to help members to interpret their data and to enable their contribution to research.

At the time I captured the above screenshot, the site boasted '2,483 genomes and counting'. At the time I started writing this piece it had already risen to '2,501' genomes. Erika Check Hayden gives a good overview of DNA Land in a Nature news item: Scientists hope to attract millions to 'DNA.LAND'.

So DNA Land is a place to learn more about your genome, which aims to attract millions of visitors, and where you can also earn badges. Hmm, makes me wonder whether I should wait for DNA World to open…especially if the lines are long.

'The amount of foil needed to wrap five breakfast sandwiches': a new metric for genomics?

Photo by Robyn Lee for seriouseats.com

The journal Genome Research is celebrating its 20th anniversary and has marked the occasion by issuing a number of 'perspective' articles. One of these — A vision for ubiquitous sequencing — includes one of the strangest comparisons that I've ever seen in the field of genomics (or really any field):

Back in 1990, sequencing 1 million nucleotides cost the equivalent of 15 tons of gold (adjusted to 1990 price). At that time, this amount of material was equivalent to the output of all United States gold mines combined over two weeks. Fast-forwarding to the present, sequencing 1 million nucleotides is equivalent to the value of ∼30 g of aluminum. This is approximately the amount of material needed to wrap five breakfast sandwiches at a New York City food cart.

Most people will understand the point that is being made here. Sequencing used to be really expensive whereas now it is very cheap. But is there really a need to explain what 30 grams of aluminum foil amounts to in a more, human-friendly, unit? And even if such a comparison is deemed necessary, is the use of 'breakfast sandwiches' from New York City food carts the most suitable choice?

Brief thoughts on Karyn Meltz Steinberg's ASHG 2015 talk on genome assembly improvement

I like it when people a) share their slides online and b) share their slides online soon after they give a talk somewhere. This is particularly helpful when want to quickly catch up on developments from a conference that you couldn't attend. Karyn Meltz Sternberg (@KMS_Meltzy on twitter) ticks both boxes because she posted her #ASHG2015 slides almost as soon as her talk finished. The title of her talk was:

Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing

Her slides — hosted on Slideshare — are embedded below.

What interested me from this talk is the use of sequence maps generated by the BioNano Genomics Irys platform to improve genome assemblies. This technology seems to be growing in popularity, offering an easier (and more powerful?) alternative to 'traditional' optical map solutions. This work is part of the McDonnell Genome Institute's Reference Genomes Improvement project, which includes the following — very laudable — aim:

  • We plan to identify and resolve issues (misassemblies, sequence errors, and gaps) within the current reference GRCh38.

I find it interesting that this project has also defined two levels of genome status:

Gold Genome: A high-quality, highly contiguous representation of the genome with haplotype resolution of critical regions.

Platinum Genome: A contiguous, haplotype-resolved representation of the entire genome.

Not clear from these definitions whether platinum genomes can still include short regions of unknown bases (Ns). A figure on the Reference Genomes Improvement project page also hints at a 'Silver' status, making me think it it only a matter of time before we see the addition of a credit-card-esque 'diamond' status level: no unknown bases, with full representation of tandem repeat arrays, e.g. centromeres, and priority booking for VIP tickets at major sporting events.

This JABBA-award winning software wants a shot at redemption

A new tool has been described in the journal Bioinformatics:

All of the words that contribute to the name of this acronym are right there in the article's title. But as this is a JABBA-award-worthy name, we don't expect each word to contribute its first letter (or only one letter):

REDEMPTION: REduced Dimension Ensemble Modeling and Parameter estimaTION

This is certainly far from being the most bogus bioinformatics acronym that I have seen and — as far as I can tell — the name is unique (within the context of bioinformatics). However, I am particularly wary of tools that use a short name which a) has no obvious connection to what the software actually does and b) has potentially emotive associations in other contexts, e.g. religion and/or politics.