If you want your bioinformatics software to have a memorable name, it helps if the name is pronounceable

Image from she is geeky blog

There is a new paper in the journal Bioinformatics:

The paper describes a new method for implementing a Principle Components Analysis (PCA) of data. That new method has a name. That name has just seven characters. How hard can it be to pronounce?

  • S4VDPCA: ess-four-vee-dee-pee-cee-ay

It doesn't exactly trip off the tongue and having four 'ee-sounding' letters together (VDPC) doesn't make it easy to remember. When I first came across this paper, I skimmed the article, waited an hour, and then tried to remember the name. I could remember that it included '4', 'V', and 'D', but couldn't remember the order (or that it also included an 'S')

It is by no means essential that bioinformatics tools have easily pronounceable names, but this will help people remember the name of your software. In turn, this makes it easier for people to tell others about your software. I don't imagine that bioinformatics software developers ever want to overhear the following type of conversation:

Bob: "You should use that tool"

Sue: "What tool?"

Bob: "Umm, you know that PCA thingy. The S…something, something…PCA tool"

Sue: "The what?"

Bob: "Run a Google search for Bioinformatics PCA tools, it's probably the top hit."

Sue: <- facepalm ->

The Francis Crick Institute has signed the Hague Declaration

The Hague Declaration is an important manifesto that aims to provide guidelines for how to "best enable access to facts, data and ideas for knowledge discovery in the Digital Age". Although signatories to the declaration include large scientific research institutes, you can also sign the declaration as an invididual. The five main principles of the declaration are summarized as follows:

  1. Intellectual property was not designed to regulate the free flow of facts, data and ideas, but has as a key objective the promotion of research activity
  2. People should have the freedom to analyse and pursue intellectual curiosity without fear of monitoring or repercussions
  3. Licenses and contract terms should not restrict individuals from using facts, data and ideas
  4. Ethics around the use of content mining techniques will need to continue to evolve in response to changing technology
  5. Innovation and commercial research based on the use of facts, data, and ideas should not be restricted by intellectual property law

These principles are obviously of huge relevance for the field of genomics which seems to be generating tools and data at an ever increasing rate. So I was happy to read today that the new Francis Crick Institute in London is one of the Declaration's latest signatories:

"The large amounts of data and information that are now becoming available represent an extraordinary resource for researchers. By signing the Hague Declaration the Francis Crick Institute is expressing its support for the idea that researchers should be able to mine such content freely, thereby to advance knowledge and to promote Discovery without Boundaries."

Jim Smith, Director of Research at the Francis Crick Institute

Front Line Genomics interview with Craig Venter includes a question from yours truly

Issue 4 of the Front Line Genomics magazine is now available online. It includes an interview with Craig Venter who gave a much anticipated talk at their recent Festival of Genomics conference in Boston. Front Line Genomics kindly allowed some of their previous interviewees (which includes me) to pose some of the questions. Here's mine:

KRB: What do you see as the limits of synthetic biology? Could we assemble a functional eukaryotic genome, and what are the practical applications of such technology?

JCV: That’s a great question! The limitations will ultimately be more society limitations, ethical limitations, and standards rather than technology. I think a synthetic single eukaryotic cell would be very straightforward to do today. Various groups of scientists have been trying to build the yeast genome. It’s kind of like rebuilding a house one brick at a time, but they’re making a synthetic version of yeast. That’s not quite the same as writing the genetic code and then booting it up as we did, but that’s just because of the limitations on writing the genetic code now.

I think understanding what makes a multicellular organism, and all the regulation associated with that, are so far away from design that we’re going to have to learn a whole lot more biology before we get to that stage of deliberate design. I think about 10% of the genes in our designed synthetic bacterial cell, are of unknown function. All we know is that you can’t get life without them. That problem expands tremendously with eukaryotic cells. If you extrapolate to the challenge of interpreting the human genome, we only understand a tiny fraction of the human genome today.