This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.
Kerstin Howe is a Senior Scientific Manager, leading the Genome Reference Informatics group at the Wellcome Trust Sanger Institute. As part of the Genome Reference Consortium (GRC), Kerstin's group is helping ensure that "the human, mouse and zebrafish reference assemblies are biologically relevant by closing gaps, fixing errors and representing complex variation".
This important work entails the generation of 'long range' information (sequencing and optical mapping) for a variety of genomes and using that information to provide genome analyses, visualise assembly evaluations, and curate assemblies. You may also wish to check out 101 questions interviewee #3 (Deanna Church), another key player in the GRC.
Kerstin is not my first '101 Questions' interviewee that I know from my time working on the WormBase project. Unlike interviewee #23 though, I did have the pleasure of sharing an office with Kerstin — or WBPerson3103 as she will forever be known in WormBase circles — during my time at the Sanger Institute. It was after leaving WormBase that she became a big fish (of a little fish) in the vertebrate genomics community. And now, on to the 101 questions...
001. What's something that you enjoy about current bioinformatics research?
I'm currently really excited about the evolving long range technologies, like long read sequencing and optical mapping. There are so many promising technologies out there at the moment, ideally I want to try them all (all!) out. Short read sequencing alone unfortunately didn't help much with many of our remaining genome assembly issues and now the wait for a solution seems nearly over. I'm really looking forward to eventually being able to get unusual genome regions sorted, especially my favourite, the long arm of zebrafish chromosome 4.
010. What's something that you don't enjoy about current bioinformatics research?
With my Genome Reference Consortium hat on, it's somewhat disappointing that all the work we've put into improving genomes still seems to pass many people by. For example, we have put five years of curation into GRCh38 in an immense effort to create the best reference possible, yet it takes a long time to be taken up by the community. Of course I know that it's extremely costly for groups to move over and that the tool chain can't really deal with multiple representations of the same genomic region yet. But I hope that with the 1000 Genomes project taking it on now, and with the tool chain developments picking up speed, everyone will be encouraged to follow.
011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?
I am originally a lab biologist and came to bioinformatics rather late. If I would have known what I know now I could have saved myself from years of spreadsheet acrobatics. So I would advise myself to use a decent OS and learn at least Perl/Python and R early on. And try to find some people who know their way around and can help you get better.
100. What's your all-time favorite piece of bioinformatics software, and why?
I'm always happy if I find another Unix command that makes my life easier. Two shortcuts that really did: Ctrl-A and Ctrl-E to jump to the start/end of a command line rather than watching the cursor slowly moving. Other than that, I wrote a little script to edit FASTA files more than a decade ago. It's not sophisticated in any way and no one but me uses it, but it's literally grown on me. With constant additions and amendments over the years, it's perfectly suited for all my sequence extraction/modification needs.
101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?