Question: when is a GitHub repository not a GitHub repository?

September 30, 2015 by Keith Bradnam

Answer: when it doesn't contain any useful code.

Update 2015-10-02 08.58: this post was updated to reflect the addition of code the metaPORE repository.

A discussion on twitter today revealed something which I find very disappointing:

@froggleston @biomickwatson @mattloose @pathogenomenick @leilaluheshi - Yes. https://t.co/e53GWkNI5s repo empty. I pinged @cychiu98 for info
— Jonathan Jacobs (@bioinformer) September 30, 2015

A new paper by Greninger et al. (Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis) has been published in the journal Genome Medicine. The Methods contains the following line:

We developed a custom bioinformatics pipeline for real-time pathogen identification and visualization from nanopore sequencing data (MetaPORE) (Fig. 1b), available under license from UCSF at [23].

Reference #23 takes you to the metaPORE GitHub repository. At the time I initially wrote this post — and as the screen grab below shows — it contained zero code. Thankfully this has been changed and a set of Python and shell scripts are now available.

Screen Shot 2015-09-30 at 8.58.21 AM.png

Maybe this was just some sort of error in scheduling the release of the paper and the code. However, journals and authors should understand that if a paper (or a pre-print) appears online and points to a code repository (or any other website), the expectation is that people should be able to visit the site in question and download code.

We have a winner!

September 29, 2015 by Keith Bradnam

The randomatic_3000 Perl script has chosen a winner:

I will be reaching out to the winner via twitter, and once I get Vince to sign the book, I will mail it to them. Congratulations!

A top 10 list of 'Useful Bioinformatics Skills'

September 28, 2015 by Keith Bradnam

The deadline for my competition to win a signed copy of Vince Buffalo's excellent Bioinformatics Data Skills book has now passed. There were 65 entries and later this week I will randomly choose a winner. For the competition I simply asked people to tweet an answer to the following question:

Name a useful bioinformatics skill

I thought I would share some of the entries that people tweeted. In reverse order, here are my ten favorite answers. It was difficult choosing which ones made the cut, and there were many other excellent answers. Thanks to everyone who took part! I hope to announce the winner later this week.

10

This skill may not be so easy to acquire…

@ACGT_blog Knowing @vsbuffalo. #acgt
— Paul Smaldino (@psmaldino) September 14, 2015

9

Two people came up with this suggestion…

Useful bioinformatics skill: Patience... #acgt
— David Joly (@idjoly) September 24, 2015

@ACGT_blog @kbradnam a useful bioinformatics data skill: patience #ACGT
— Dave Tang (@davetang31) September 14, 2015

8

I think this answer also applies to 'scripts you wrote yourself several years ago'…

Useful bioinformatics skill: The ability to understand the scripts written by others. #ACGT
— goutham atla (@Geek_y) September 15, 2015

7

Clouded by the Dark Side, your code is.

@acgt_blog A useful bioinformatics data skill: anger management #acgt
— Neil Saunders (@neilfws) September 14, 2015

6

If you ever come up with some useful code snippet, the chances are that you will want to reuse it at some point.

Keep your own oneliners in a online notebook #ACGT
— genomepandit (@genomepandit) September 14, 2015

5

This was the most popular answer in the competition…

Critical bioinformatics skill: VERSION CONTROL. #ACGT https://t.co/qCnHzbGDJt
— Aaron Barnes (@MicroTolo) September 17, 2015

My entry for ‘useful bioinformatics data skill’ #ACGT competition: version control
— Lex Nederbragt (@lexnederbragt) September 14, 2015

Useful bioinformatics skill: version control repository for every project #ACGT
— Katrina Kutchko (@kutchko) September 23, 2015

@ACGT_blog @vsbuffalo a useful bioinformatics data skill is version control! #ACGT
— Jasmine Dumas (@jasdumas) September 14, 2015

4

Yes, yes, a thousand times yes!

Proper code documentation. #ACGT
— will shoemaker (@shoemakah) September 14, 2015

3

If you ever run into any sort of bioinformatics problem, you can probably assume that someone has suffered from the same problem as you, and that someone else has posted a useful answer online.

useful bioinformatics data skill: read manuals and find solutions on BioStars, SEQanswers, and Twitter #ACGT
— copypasteusa (@copypasteusa) September 17, 2015

2

Two closely related answers, so they can both share the number two spot…

Useful bioinformatics skill: trust nothing without testing #ACGT
— Gitanshu Munjal (@grmunjal) September 15, 2015

#ACGT A very useful bioinformatics skill is that never test any program or any code with huge dataset and always use a subset of data
— upendra devisetty (@upendra_35) September 16, 2015

1

And my favorite answer was one by Bastien Chevreux (@BaCh_mira)…

#ACGT Useful BioInf skill: Be skeptical. Data isn't wrong just because it contradicts "basic textbook knowledge". Nature doesn't read books.
— Bastien Chevreux (@BaCh_mira) September 15, 2015

In bioinformatics it can be good to have some healthy skepticism about the tools and data that you use. Not all genome assemblies are perfect (many are far from perfect), not all gene annotations are correct, and not all tools use defafult values that will work well with your data. Be skeptical!

Maybe one of these answers will be lucky enough to be chosen by the magical 'Perl-script-of-destiny' (that I still need to write). The winner will hopefully be announced in a day or two.

3 important digital things all scientists should have nowadays

September 25, 2015 by Keith Bradnam

Good advice from Michael Koontz (@_mikoontz):

(1/n) A smart guy once told me there are 3 important digital things all scientists should have nowadays (cc @noamross @davisegsa):
— Michael Koontz (@mikoontz) September 24, 2015

(2/n) 1) A profile page. This could be your own custom website, a ResearchGate page, a Google Scholar profile, etc.
— Michael Koontz (@_mikoontz) September 24, 2015

(3/n) 2) an ORCID. All science products (e.g., blogs, code) should count for you. I like @carlystrasser's take on it: http://t.co/I8TYM2jR01
— Michael Koontz (@_mikoontz) September 24, 2015

(4/4) 3) An academic Twitter account. Stay current! Stay involved! People get jobs via Twitter connections!
— Michael Koontz (@_mikoontz) September 24, 2015

The second item on the list is something which I wrote about recently.