BioNano Genomics are holding a webinar on October 12th about 'Hybrid Scaffolds' [Link] →

October 05, 2015 by Keith Bradnam

The Irys platform by BioNano Genomics seems to be a useful tool by which to help assess the completeness and contiguity of genome assemblies. I noticed today that they are holding an webinar which may be of interest some of the readers of this blog. From the webinar registration page:

Please join us for our Hybrid Scaffold webinar. We will discuss the applications of the software, which include the building of longer contigs, validation of existing contigs, and acquisition of novel map level information. Experimental design considerations will be reviewed for ideal data integration. We will walk through each stage of Hybrid Scaffold troubleshooting data optimization and pointing out files of interest.

Financial disclaimer: I do not own shares in any biotechnology company.

Take-Home Message #7 tackles the topical subject of 'personalized microbial clouds' [Link] →

October 05, 2015 by Keith Bradnam

Our latest comic presents a slightly different take on the recent news regarding the unique clouds of bacteria that surround us…

We at The Take-Home Message welcome our new (but actually ancient) microbial overlords.

What a difference a day makes: markets react to PacBio's new sequencing platform

October 02, 2015 by Keith Bradnam

Daily change in share price at close of trade on October 1st, 2015:

Update: Earlier version of this figure incorrectly cited a 19% drop in Illumina's share price. My bad: -18.6 was the price change in dollars, not the percentage change.

Financial disclaimer: I do not own shares in any biotechnology company.

How to sequence and assemble a large eukaryote genome with long reads in 2015 [Link] →

October 01, 2015 by Keith Bradnam

If you have any interest in the latest methods of DNA sequencing and/or genome assembly, you really owe it to yourself to be following Lex Nederbragt's excellent In between lines of code blog. Today's post offers some useful advice:

Main advice: bite the bullet and get the budget to get 100x coverage in long PacBio reads. 50-60x is really the minimum.

Who is saying what about the new PacBio Sequel system?

October 01, 2015 by Keith Bradnam

The big news from the world of DNA sequencing this week was that Pacific Biosciences has launched a new sequencing platform. The successor to their RS II platform has been named The Sequel System and it will be on display at the upcoming American Society of Human Genetics meeting. The new system promises a cost of sequencing a human genome (at 10x coverage) for $3,000.

The early buzz already seems pretty positive, and hopefully this sequel will turn out to be more like The Empire Strike Back than, say, Highlander II. What follows is a fairly comprehensive roundup of what people have been saying about this new platform — note that this story has been updated several times since I first wrote it (details of these updates are included at the end of this post):

From PacBio

The Official Sequel System webpage, which includes this Apple-esque video (with CSO Jonas Korlach taking on the Jony Ive role).
Details of PacBio's presentation and workshop at the ASHG 2015 meeting are available, with information about people can live stream the workshop.
Listen to the webcast (a conference call with questions that took place on the morning of October 1st). See below for details of some of the questions that were asked.

From science news websites

Bio-IT World's take on the news: A Worthy Sequel: PacBio's New Sequencing System.
GenomeWeb have a page up: PacBio Launches Higher-Throughput, Lower-Cost Single-Molecule Sequencing System (free membership required to read).
And GenomeWeb have added a follow-up story: PacBio Hopes to Increase Research User Base With Sequel System; Mount Sinai Among First Customers.
And a third GenomeWeb story: PacBio Preps for Sequel Shipments, Expects Roche to be Largest Customer

'Traditional' news outlets

NBC Bay Area TV news ran a short piece which strangely omits PacBio's name from the title: Menlo Park Company Aims for 'Precision Medicine'

From blogs

I think CoreGenomics may have been the first blog to write something about the Sequel: The new Pacific Biosciences sequencer
The incomparable Mick Watson presents his thoughts in a blog post: What does the PacBio Sequel mean for the future of sequencing?.
Keith Robison has also weighed in with many detailed thoughts on his blog regarding the news: PacBio Sequel: Smaller Box, Bigger Bang.
The Biomusings blog has entered into the discussion: What does SEQUEL mean for human genetics?
From Paul Krzyzanowski's 'The Checkmate Scientist' blog: PacBio's gain would be Illumina's loss in a simple world…

From discussion forums

There is a discussion unfolding on the SEQanswers forums.
And as always there are discussions happening on on reddit, see r/bioinformatics and r/biology

From the world of finance

The Motley Fool take a financial perspective on the news: Why Shares of Pacific Biosciences of California Inc. Soared Today.
More financial insights at Zacks, 24/7 Wall St, and MarketWatch (among many others).

I guess the question that everyone is asking now concerns the possibility of someone making a genome assembly from sequence data using this platform, and then using this tool to produce a better version of the assembly. In this case, would it be a sequel Sequel SEQuel genome assembly?

Questions from the conference call

There were a lot of questions asked in the hour long conference call. I've transcribed some of them and indicated the time point where you can jump to if you are interested in hearing PacBio's answers to specific questions:

7:40:"Can you give us some thoughts on turnaround time and cost per genome?"
11:20:"Can you talk about the use case beyond your current customer base? How this expands the number of applications?"
15:17:"Can you help us think about some of the major changes that went into the system? Is there still a manifold that moves in three dimensions?"
19:20:"From a user standpoint, are there any changes to site preparations that you would have to make from Sequel vs RS II; any limitations on things like putting it on 2nd/3rd/4th floor?"
22:25:"You've introduced a number of kits with various applications for the RS II, will the Sequel be able to run all of the applications from the beginning, or will it take time to introduce certain applications to the system?"
24:34:"Are there specific customer types that you think are positioned to be more on the earlier side of adoption, such as human sequencing, or microbiology, plant, animal etc.?"
33:20:"Can you give a perspective on what the scalability of this platform looks like comparatively (to the RS II)?"
35:08:"In terms of the metrics you gave around price per human genome, can you help us think about that relative to Illumina? If you take a 30x coverage genome on Illumina, what is the equivalent coverage you would need on the Sequel to get something similar…and how long would that take you to do?"
38:29:"Recognising a lot has been achieved with this launch: different computer architecture, different form factor, new optical systems, higher density, with a smaller footprint. I just want to make sure, there's no compromise in raw accuracy expected relative to the RS II?"
47:46:"Could you describe in layman's terms the benefits of methylation detection for your system?"
50:50:"With your technology relative to other platforms, can you help us understand — if you have these larger pieces of the puzzle if you will — how advantageous that could be after you're done generating data, when you get down to assembling the genome?"
53:16:"I'm curious what percentage of potential customers that looked at the RS II passed given the high price tag? What is the incremental buyer opportunity at the price point of $350,000?"
57:35:"Still trying to understand what percentage of competitive platforms you think you can swap out with the Sequel?"

Updates

2015-10-01 13.46: Added some more sources of news, including questions asked in conference call
2015-10-01 20.04: Added in more conference call details, with time points of different questions.
2015-10-01 20.39: Added Keith Robison's blog post
2015-10-02 06:34: Changed link for Bio-IT World's piece
2015-10-02 09.08: Added more links about PacBio's presentation at ASHG 2015
2015-10-02 09.41: Added link to CoreGenomics post and added disclaimer
2015-10-02 11.54: Added links to Sequel-related discussions on SEQanswers and reddit
2015-10-02 13.28: Added Biomusings and Checkmate Scientist blog posts, and split main part of article into different sections
2015-10-12 09.52: Addition of NBC Bay Area News piece
2015-10-14 16.57: Addition of 2nd GenomeWeb story
2015-10-23 20.02: Addition of 3rd GenomeWeb story

Financial disclaimer: I do not own shares in any biotechnology company.

Question: when is a GitHub repository not a GitHub repository?

September 30, 2015 by Keith Bradnam

Answer: when it doesn't contain any useful code.

Update 2015-10-02 08.58: this post was updated to reflect the addition of code the metaPORE repository.

A discussion on twitter today revealed something which I find very disappointing:

@froggleston @biomickwatson @mattloose @pathogenomenick @leilaluheshi - Yes. https://t.co/e53GWkNI5s repo empty. I pinged @cychiu98 for info
— Jonathan Jacobs (@bioinformer) September 30, 2015

A new paper by Greninger et al. (Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis) has been published in the journal Genome Medicine. The Methods contains the following line:

We developed a custom bioinformatics pipeline for real-time pathogen identification and visualization from nanopore sequencing data (MetaPORE) (Fig. 1b), available under license from UCSF at [23].

Reference #23 takes you to the metaPORE GitHub repository. At the time I initially wrote this post — and as the screen grab below shows — it contained zero code. Thankfully this has been changed and a set of Python and shell scripts are now available.

Screen Shot 2015-09-30 at 8.58.21 AM.png

Maybe this was just some sort of error in scheduling the release of the paper and the code. However, journals and authors should understand that if a paper (or a pre-print) appears online and points to a code repository (or any other website), the expectation is that people should be able to visit the site in question and download code.

We have a winner!

September 29, 2015 by Keith Bradnam

The randomatic_3000 Perl script has chosen a winner:

I will be reaching out to the winner via twitter, and once I get Vince to sign the book, I will mail it to them. Congratulations!

A top 10 list of 'Useful Bioinformatics Skills'

September 28, 2015 by Keith Bradnam

The deadline for my competition to win a signed copy of Vince Buffalo's excellent Bioinformatics Data Skills book has now passed. There were 65 entries and later this week I will randomly choose a winner. For the competition I simply asked people to tweet an answer to the following question:

Name a useful bioinformatics skill

I thought I would share some of the entries that people tweeted. In reverse order, here are my ten favorite answers. It was difficult choosing which ones made the cut, and there were many other excellent answers. Thanks to everyone who took part! I hope to announce the winner later this week.

10

This skill may not be so easy to acquire…

@ACGT_blog Knowing @vsbuffalo. #acgt
— Paul Smaldino (@psmaldino) September 14, 2015

9

Two people came up with this suggestion…

Useful bioinformatics skill: Patience... #acgt
— David Joly (@idjoly) September 24, 2015

@ACGT_blog @kbradnam a useful bioinformatics data skill: patience #ACGT
— Dave Tang (@davetang31) September 14, 2015

8

I think this answer also applies to 'scripts you wrote yourself several years ago'…

Useful bioinformatics skill: The ability to understand the scripts written by others. #ACGT
— goutham atla (@Geek_y) September 15, 2015

7

Clouded by the Dark Side, your code is.

@acgt_blog A useful bioinformatics data skill: anger management #acgt
— Neil Saunders (@neilfws) September 14, 2015

6

If you ever come up with some useful code snippet, the chances are that you will want to reuse it at some point.

Keep your own oneliners in a online notebook #ACGT
— genomepandit (@genomepandit) September 14, 2015

5

This was the most popular answer in the competition…

Critical bioinformatics skill: VERSION CONTROL. #ACGT https://t.co/qCnHzbGDJt
— Aaron Barnes (@MicroTolo) September 17, 2015

My entry for ‘useful bioinformatics data skill’ #ACGT competition: version control
— Lex Nederbragt (@lexnederbragt) September 14, 2015

Useful bioinformatics skill: version control repository for every project #ACGT
— Katrina Kutchko (@kutchko) September 23, 2015

@ACGT_blog @vsbuffalo a useful bioinformatics data skill is version control! #ACGT
— Jasmine Dumas (@jasdumas) September 14, 2015

4

Yes, yes, a thousand times yes!

Proper code documentation. #ACGT
— will shoemaker (@shoemakah) September 14, 2015

3

If you ever run into any sort of bioinformatics problem, you can probably assume that someone has suffered from the same problem as you, and that someone else has posted a useful answer online.

useful bioinformatics data skill: read manuals and find solutions on BioStars, SEQanswers, and Twitter #ACGT
— copypasteusa (@copypasteusa) September 17, 2015

2

Two closely related answers, so they can both share the number two spot…

Useful bioinformatics skill: trust nothing without testing #ACGT
— Gitanshu Munjal (@grmunjal) September 15, 2015

#ACGT A very useful bioinformatics skill is that never test any program or any code with huge dataset and always use a subset of data
— upendra devisetty (@upendra_35) September 16, 2015

1

And my favorite answer was one by Bastien Chevreux (@BaCh_mira)…

#ACGT Useful BioInf skill: Be skeptical. Data isn't wrong just because it contradicts "basic textbook knowledge". Nature doesn't read books.
— Bastien Chevreux (@BaCh_mira) September 15, 2015

In bioinformatics it can be good to have some healthy skepticism about the tools and data that you use. Not all genome assemblies are perfect (many are far from perfect), not all gene annotations are correct, and not all tools use defafult values that will work well with your data. Be skeptical!

Maybe one of these answers will be lucky enough to be chosen by the magical 'Perl-script-of-destiny' (that I still need to write). The winner will hopefully be announced in a day or two.

3 important digital things all scientists should have nowadays

September 25, 2015 by Keith Bradnam

Good advice from Michael Koontz (@_mikoontz):

(1/n) A smart guy once told me there are 3 important digital things all scientists should have nowadays (cc @noamross @davisegsa):
— Michael Koontz (@mikoontz) September 24, 2015

(2/n) 1) A profile page. This could be your own custom website, a ResearchGate page, a Google Scholar profile, etc.
— Michael Koontz (@_mikoontz) September 24, 2015

(3/n) 2) an ORCID. All science products (e.g., blogs, code) should count for you. I like @carlystrasser's take on it: http://t.co/I8TYM2jR01
— Michael Koontz (@_mikoontz) September 24, 2015

(4/4) 3) An academic Twitter account. Stay current! Stay involved! People get jobs via Twitter connections!
— Michael Koontz (@_mikoontz) September 24, 2015

The second item on the list is something which I wrote about recently.

101 questions with a bioinformatician #34: Katie Pollard

September 24, 2015 by Keith Bradnam

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.

Katie Pollard is a Senior Investigator at Gladstone Institutes and a Professor in the Department of Epidemiology and Biostatistics at UC San Francisco. She is also a Faculty supervisor of a bioinformatics core that provides collaborative support for high-throughput biology across the UCSF campus.

Katie's work involves the development of statistical and computational methods for the analysis of large genomic datasets, with a particular interest in genome evolution and identifying sequences that differ significantly between or within species. Her work on the chimpanzee genome has led to lots of coverage by mainstream media, and if you want to know more about this topic, you should definitely watch the What makes us human? talk that she gave at the California Academy of Sciences (video is online here).

You can find out more about Katie by visiting her lab's website. And now, on to the 101 questions...

001. What's something that you enjoy about current bioinformatics research?

Growth in new sources of data, such as from citizen science and electronic medical records, as well as emerging technologies, like single cell imaging and genomics platforms.

010. What's something that you don't enjoy about current bioinformatics research?

Computing in the cloud is promising, but it is still to expensive to store massive data for ongoing active compute and too slow to move data into the cloud and out again for each analysis.

011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Keep taking math classes.

100. What's your all-time favorite piece of bioinformatics software, and why?

The UC Santa Cruz Genome Browser: you cannot underestimate the importance of looking at raw data, and the browser provides a way visualize a lot of data for every position of the genome. It is easy to check if your assumptions are right or not.

101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

S for strong.