101 questions with a bioinformatician #38: Gene Myers

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Gene Myers is a Director at the Max-Planck Institute for Molecular Cell Biology
and Genetics
(MPI-CBG) and the Klaus-Tschiar Chair of the Center for Systems Biology Dresden (CSBD).

Maybe you've heard of Gene for his pivotal role in developing the Celera genome assembler which led to genome assemblies for mouse, human, and drosophila (the first whole genome shotgun assembly of a multicellular organism). You may also know Gene from his work in helping develop a fairly obscure bioinformatics tool that no-one uses (just the 58,000 citations in Google Scholar).

His current research focuses on developing new methods for microscopy and image analysis; from his research page:

"The overarching goal of our group is to build optical devices, collect molecular reagents, and develop analysis software to monitor in as much detail as possible the concentration and localization of proteins, transcripts, and other entities of interest within a developing cohort of cells for the purpose of [developing] a biophysical understanding of development at the level of cell communication and force generation."

You can find out more about Gene by visiting his research page on the MPI-CBG website or by following him on Twitter (@TheGeneMyers). Finally, if you are interested in genome assembly then you may also want to check out his dazzlerblog ('The Dresden AZZembLER for long read DNA projects'). And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

The underlying technology is always changing and presenting new challenges, and the field is still evolving and becoming more "sophisticated". That is, there are still cool unsolved problems to explore despite the fact that some core aspects of the field, now in its middle-age in my view, are "overworked".



010. What's something that you don't enjoy about current bioinformatics research?

I'm really bored with networks and -omics. Stamp collecting large parts lists seems to have become the norm despite the fact that it rarely leads to much mechanistic insight. Without an understanding of spatial organization and soft-matter physics, most important biological phenomenon cannot be explained (e.g. AP axis orientation at the outset of worm embryogenesis).

Additionally, I was disgusted with the short-read DNA sequencers that, while cheap, produce truly miserable reconstructions of novel genomes. Good only for resequencing and digital gene expression/transcriptomics. Thank God for the recent emergence of the long-read machines.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

At age 18 its not so much about career specifics but one's general approach to education. For myself, I would have said, "go to class knuckle head and learn something from all the great researchers that are your teachers (instead of hanging out in your dorm room reading text books)", and for general advice to all at that stage I would say, learn mathematics and programming now while your mind is young and supple, you can acquire a large corpus of knowledge about biological processes later.



100. What's your all-time favorite piece of bioinformatics software, and why?

I don't use bioinformatics software, I make it :-) My favorite problem, yet fully solved in my opinion, is DNA sequence assembly -- it is a combinatorially very rich string problem.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

N — as it encompasses all the rest :-)

101 questions with a bioinformatician #37: Keith Robison

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Keith Robison is a Senior Bioinformatics Scientist at a small biotechnology company based in Cambridge, Massachusetts. His employer has an interest in the natural products drug discovery space and as Keith puts it, his own work concerns 'Assembling and analyzing actinomycete genomes to reveal their biosynthetic moxie'.

If you didn't already know — and shame on you if that is the case — Keith writes about developments in sequencing technologies (and other topics) on his Omics! Omics! blog. This is required reading for anyone interested in trying to understand the significance of the regular announcements made by various companies that develop sequencing technologies. In particular, his analysis of news coming out from the annual AGBT conference is typically detailed and insightful.

You can find out more about Keith by reading his aforementioned blog or by following him on twitter (@OmicsOmicsBlog). A special thanks to Keith for waiting patiently on me to get this interview posted! And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

All sorts of re-thinking how to do things — productive ways to look at old problems. Look at all the exciting improvements in assembly coming from long reads, or alignment-free RNA-Seq and metagenomics. Cool stuff.



010. What's something that you don't enjoy about current bioinformatics research?

Too many papers that report a new program without adequate benchmarking or a clear description of what differentiates the program — is it really different, or just old wine in new bottles?



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Wow. I didn't dabble into bioinformatics until I was 19. I think my advice would be try out a new programming language every other year — I've gotten a lot of mileage out of a few languages, but even learning a new one (that I subsequently drop) productively influences my programming.



100. What's your all-time favorite piece of bioinformatics software, and why?

My favorite bioinformatics software was the original WWW interface to FlyBase — first: because I wrote it as a lark, second: FlyBase paid me to support it after I showed it off, and third: because its one of the few programs of mine that ever had an explicit sunset!



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

M — Methionine is good at getting things started (KRB: yes I know, Methionine is not an IUPAC nucleotide character…but that was the given answer to the question).

101 questions with a bioinformatician #36: Alicia Oshlack

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Alicia Oshlack is the Head of Bioinformatics at the Murdoch Childrens Research Institute (they don't like apostrophes) in Melbourne, Australia. Her research focuses on four main project areas: methods for analysing RNA-seq data, epigenomics, clinical genomics data analysis, and cancer genomics.

Before moving into the field of genomics, Alicia had a background in astronomy and her Ph.D. work concerned the structure of radio quasars. Not many bioinformaticians can claim to have published papers on the topic of estimating the mass of black holes!

You can find out more about Alicia by reading her Wikipedia page or by following her on twitter (@AliciaOshlack). I also encourage you to check out her must read article for fellow computational biologists: A 10-step guide to party conversation for bioinformaticians. And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

I love the pace at which things are changing in the field. There is always something new to work on and there are so many ways to contribute something useful to the research community. I also really love the balance between collaborative analysis on really interesting biological problems and doing careful methods development.



010. What's something that you don't enjoy about current bioinformatics research?

I get frustrated that I need to spend so much of my time convincing people that bioinformatics is a real scientific research discipline where we have deep scientific training and use our brains to solve scientific problems. Hopefully I will have convinced everyone in Australia soon.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

I did my PhD in astrophysics and I often wonder if I would have been better off doing a more relevant subject but I really appreciate the skills I learnt doing that. Within this I probably would tell myself to put a bit more focus on programming and do statistics instead of applied mathematics.



100. What's your all-time favorite piece of bioinformatics software, and why?

I think limma is amazing. Have you seen the users guide? I think it's 145 pages long and although it was originally developed for microarray analysis more than 12 years ago it has adapted to the sequencing revolution and is used more than ever now. I believe it is the most widely used bioconductor analysis package ever.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

I think S = G/C because I'm always a little bit biased.

101 questions with a bioinformatician #35: Aaron Darling

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Aaron Darling is an Associate Professor at the ithree institute — where capital letters are in short supply? — which is part of UTS (University of Technology Sydney). His research focuses on developing computational and molecular techniques to characterize the hidden world of microbes. He helped develop the Mauve multiple genome alignment tool and continues to work on this and other software tools. Aaron also has a long-standing interest in poop:

Of course this interest is all part of an ongoing research project, one that is seeking to understand the development of the infant gut microbiome.

You can find out more about Aaron by visiting his lab's website, or by following him on twitter (@koadman). And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

The growing interplay between informatics, molecular biology, and experimental design is very exciting. In the past 10 years many problems that could only have been solved through decades of experimental work have been transformed from experimental problems to data analysis problems. I think this trend will only accelerate as our technology to interface digital computational systems with biological systems continues to improve. And data analysis feeds back to inspire new experimental designs in a feedback loop that's getting ever-shorter. As an informatician I find it especially fun to discover new ways of designing the lab work that solves long-standing data analysis problems.



010. What's something that you don't enjoy about current bioinformatics research?

Data wrangling and data mangling. This is almost certainly cliche by now but inconsistently implemented file formats are the bane of bioinformatics. This was apparent to me within weeks of starting in the field, as my first assigned task was to write a sequence file format parsing library for the E. coli genome project team. I often wonder why I didn't run as fast as I could in the opposite direction.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Early on I benefited from a nugget of wisdom in Dan Gusfield's sequence analysis book which emphasized the importance of solving biological data analysis problems that are core to the biology, not the technology platform used to measure the biology. For example the general sequence alignment problem vs. short read alignment. Those are the contributions that are going to matter over the long term. I wish I had also appreciated early on that the elegance and simplicity of the solution, and especially the code implementing it, matters just as much.



100. What's your all-time favorite piece of bioinformatics software, and why?

Probably BEAST, because I learned so much about phylogenetic models, MCMC, and software design from using it and coding up modules for it.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

H, because as a teenager I always wanted to be a G but in reality was everything but.

101 questions with a bioinformatician #34: Katie Pollard

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Katie Pollard is a Senior Investigator at Gladstone Institutes and a Professor in the Department of Epidemiology and Biostatistics at UC San Francisco. She is also a Faculty supervisor of a bioinformatics core that provides collaborative support for high-throughput biology across the UCSF campus.

Katie's work involves the development of statistical and computational methods for the analysis of large genomic datasets, with a particular interest in genome evolution and identifying sequences that differ significantly between or within species. Her work on the chimpanzee genome has led to lots of coverage by mainstream media, and if you want to know more about this topic, you should definitely watch the What makes us human? talk that she gave at the California Academy of Sciences (video is online here).

You can find out more about Katie by visiting her lab's website. And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

Growth in new sources of data, such as from citizen science and electronic medical records, as well as emerging technologies, like single cell imaging and genomics platforms.



010. What's something that you don't enjoy about current bioinformatics research?

Computing in the cloud is promising, but it is still to expensive to store massive data for ongoing active compute and too slow to move data into the cloud and out again for each analysis.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Keep taking math classes.



100. What's your all-time favorite piece of bioinformatics software, and why?

The UC Santa Cruz Genome Browser: you cannot underestimate the importance of looking at raw data, and the browser provides a way visualize a lot of data for every position of the genome. It is easy to check if your assumptions are right or not.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

S for strong.

101 questions with a bioinformatician #33: Sarah Teichmann

101 questions.png

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Sarah Teichmann is a Group Leader at the European Bioinformatics Institute and a Senior Group Leader at the Wellcome Trust Sanger Institute — the Genome Campus (at Hinxton, UK) is one of those strange places where you can walk 10 meters and become a different (and more senior) person!

Her research focuses on elucidating the principles of protein structure evolution, higher order protein structure and protein folding. She also has a longstanding interest in understanding gene expression regulation. As part of her work, she is involved with developing and maintaining a number of useful bioinformatics resources including the 3D Complex database.

Sarah was a recent recipient of the the prestigious European Molecular Biology Organization (EMBO) Gold Award for her use of 'computational and experimental methods to better understand genomes, proteomes and evolution'. She was also recently interviewed by CrossTalk (the blog of Cell Press): The Unstoppable Sarah Teichmann on Programing, Motherhood, and Protein Complex Assembly. I particularly liked Sarah's general advice to junior scientists:

Follow your heart and work on things you are excited about and enjoy. Life is too short—and academic careers too unpredictable—to settle for anything less. Try to work with people who are reasonable and considerate of others, yet driven and focused, and generous in investing time and resource to projects and careers of lab members and colleagues.

You can find out more about Sarah by visiting her group's website. And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

The data deluge! So much and so many kinds of biological data — ranging from all the versions of next-generation sequencing data to protein structures — it is such a gift. As computational biologists, we are in an unprecedented position to make new discoveries by mining this data, and we’re all having a ball.



010. What's something that you don't enjoy about current bioinformatics research?

I’m thinking hard to come up with something. One issue that has always puzzled me is why mainstream journals don’t recognise the value of pure theoretical and computational biology. The prediction of the structure of the double helix was recognised with a Nobel Prize, and celebrated more than the Franklin/Wilkins crystal structure. Predictions are generally given scant notice, and the experimental validation (often years later) is considered the key achievement. This strikes me as incongruous.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Take programming and computer science seriously, and get some formal training in it.



100. What's your all-time favorite piece of bioinformatics software, and why?

R came after my time as a hands-on researcher (I’m more of a 90s Perl girl) but it seems to have revolutionised how quickly people can implement methods and visualise data. I also like the fact that there are now notebook-style ways of documenting whole workflows in R and Python. This can be included as supplementary material in publications and should help in making analyses easily reproducible by others.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

Please can I choose three? A then U then G codes for "go bioinformatics" ☺

101 questions with a bioinformatician #32: Aaron Quinlan

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Aaron Quinlan is an Associate Professor of Human Genetics and Biomedical Informatics at the University of Utah and the Associate Director of the USTAR Center for Genetic Discovery.

His research focuses on "developing and applying computational methods towards the understanding of genetic variation in diverse contexts". This work had led to Aaron's involvement in the development of many popular bioinformatics tools, with Bedtools being one of the most well known. I wish he had time to blog more, because then we could all enjoy more writing like this:

Have you ever been incensed by the ridiculous number of chromosome naming and ordering schemes that exist in genomics? If the answer is “no”, then either you are an incredibly patient person, you enjoy unnecessary chaos, or you just haven’t done any detailed analysis of genomics datasets.

You can find out more about Aaron by visiting his lab's website, or by following him on twitter (@aaronquinlan). And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

I come from a creative family and have always enjoyed building things. There is pure joy in having the power to conceive and apply an algorithmic idea that has the potential to improve our understanding of the biology of the genome and the genetic basis of disease.



010. What's something that you don't enjoy about current bioinformatics research?

The fashion.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Take every math and statistics course possible and read constantly while you still have the time.



100. What's your all-time favorite piece of bioinformatics software, and why?

Without question, PolyBayes (Marth et al, 1999). I came to computational biology as a former software engineer without substantial training in biology. PolyBayes was the first Bayesian method for polymorphism detection and was written by my Ph.D. mentor, Gabor Marth. I spent much of my first year in graduate school dissecting the PolyBayes code (and the ACE file format)!!!) to understand the mathematic and data analysis strategies that were required at the time. That learning process has influenced much of the work I have done since.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

N, since I constantly feel as though I am doing everything while also doing nothing.

 

DVD bonus materials


KRB: Because of the relative brevity of this interview, I thought that I would also share a couple of answers that Aaron gave me to some of the questions I also include when asking people to do these interviews (this info sometimes helps me write my introductions):


0111. What is the correct way of describing your current position or title(s)

  1. Associate Professor of Human Genetics and Biomedical Informatics
  2. Associate Director of the USTAR Center for Genetic Discovery
  3. Sender of the emails and bringer of the donuts.


1001. In 1–2 sentences, describe what your role entails

Basically doing everything I can to not be a bottleneck for the people in my lab.

101 questions with a bioinformatician #31: Morgan Taschuk

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Morgan Taschuk is a Senior Manager for Genome Sequence Informatics at the Ontario Institute for Cancer Research (OICR). She manages the production sequence analysis team to analyse all of the sequence data sequenced at OICR, resulting in the generation of alignment files, variant calls, QC metrics and other bountiful amounts of sequence data for OICR researchers and collaborators.

She recently wrote a great blog post regarding the (sometimes contentious) issue of Biologists vs Bioinformaticians. Definitely worth a read. Morgan has also recently started to assemble a Twitter list of Women in Bioinformatics, now up to 179 members. I'm sure she would like to make that list even longer, so please let her know of any omissions.

You can find out more about Morgan by visiting her Modern Model Organism blog, or by following her on twitter (@morgantaschuk). And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

There's always something more to learn. I'm spending a lot of time with our genomics lab recently and learning about how lab processes impact our data fascinates me. Bioinformatics skills are usually in demand so I also get to work with a wide variety of people with different questions and problems and have to stretch my brain to apply myself.



010. What's something that you don't enjoy about current bioinformatics research?

Often people write their own scripts or software instead of looking for something that already exists out there. Not only is it wasted effort for very similar results, it sabotages any attempt to standardize across the field. Open-source software is there for everyone to change and improve. Why not build on a foundation instead of digging the hole yourself?



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Since nobody can tell you what bioinformatics is, it's up to you to define it. I spent a long time fighting with imposter syndrome, not just because I felt inadequate but also because I was called a bioinformatician when I didn't fit the classical model. Nobody fits the classical model these days. Thinking about this question actually inspired me to write a blog post about the difference between bioinformaticians and computational biologists. Judging from the feedback on Twitter and the blog, the problem of defining what a bioinformatician is still really sticks in people's throats.



100. What's your all-time favorite piece of bioinformatics software, and why?

SAMtools. It's an amazing piece of very stable, utilitarian, open source code that forms the backbone of most sequencing pipelines.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

I struggled the most with this question! Y, because 'pyrimidine' is a pretty word and so Y not.

~crickets~

101 questions with a bioinformatician #30: Vince Buffalo

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Vince is a second year graduate student in the lab of Graham Coop at UC Davis. Before that he earned his bioinformatics 'chops' working in other groups on the UC Davis campus as a bioinformatician and statistical programmer.

I came to know Vince when he was working as part of the Genome Center's Bioinformatics Core Facility; I was immediately impressed, not only by his diverse set of computational skills, but by the way he applied those skills. Put simply, Vince does things the right way. He believes that bioinformatics should be a carefully documented, reproducible science. He also sees the strengths and advantages of using core Unix skills to organize and manage bioinformatics pipelines. These skills will provide a more useful, and lasting, toolbox than if you only ever learn how to use the latest and greatest set of published bioinformatics tools.

Impressively, Vince has recently published a book (Bioinformatics Data Skills by O'Reilly), this is something that I highly encourage people to buy, and I'm convinced that it will become an indispensible guide to everyone working in this field. In the book's introduction, he neatly states the problem that I alluded to earlier:

Many biologists starting out in bioinformatics tend to equate “learning bioinformatics” with “learning how to run bioinformatics software.” This is an unfortunate and misinformed idea of what bioinformaticians actually do. This is analogous to thinking “learning molecular biology” is just “learning pipetting." … the approach of this book is to focus on the skills bioinformaticians use to explore and extract meaning from complex, large bioinformatics datasets.

You can find out more about Vince by visiting his 'digital notebook' website at vincebuffalo.org, or by following him on twitter @vsbuffalo. And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

Watching bioinformatics grow to tackle exciting evolutionary questions,especially with non-model organisms. While bioinformatics has clearly revolutionized the human genomics field, I think in the next decade we'll see interesting developments in bioinformatics tailored to problems in complex non-model organism genomics.

I love plants and have worked in plant genomics, and I've seen first hand that it's very hard. Many bioinformatics tools we used were designed to work with human data, not gigantic polyploid genomes. It will be exciting over the next few years to see how reads grow in length, new algorithms emerge, and how this will enable more non-model research. As a budding evolutionary biologist, I'm hopeful that these bioinformatics advances will fuel more discoveries in neat species that have traditionally been harder to work with.



010. What's something that you don't enjoy about current bioinformatics research?

A large proportion of a bioinformatician's time is spent tackling unnecessary human-made problems: data is poorly organized, file formats are both poorly specified and followed, and software is often poorly documented or isn't robust to different data. These are neither interesting scientific problems nor fun computational problems — these are frustrating social and community issues. No one wants to tackle these problems for that reason, but at some point we'll have to as a community — to avoid wasting our collective time on these annoyances.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Study more mathematics. I fell in love with statistics before I did math because I quickly saw the beauty in using statistics to understand data. Now I'm working backwards and trying to bolster my maths skills and seeing the beauty in other mathematical fields and really enjoying it. Darwin said "mathematics seems to endow one with something like a new sense" — I'd argue that this is especially true in biology.



100. What's your all-time favorite piece of bioinformatics software, and why?

It's a tie — SAMtools and PSMC. SAMtools is an amazing piece of engineering — from an algorithmic perspective, from a usability perspective, and from a community perspective. If you dig inside the source, everything is so cleverly written and carefully optimized (e.g. the klib library). I've learned a lot of C tricks from reading Heng Li's code.

SAMtools is also extremely well designed from the user perspective — it adopts the Unix philosophy and its subcommand interface is much like Git's. However, SAMtools is not a perfect program; there have been numerous bugs found over the years and some folks attack it for this. But these bugs are quickly patched thanks to active development and an excellent community. I don't work on SAMtools (other than one tiny bug fix) but I enjoy following along via GitHub and reading and learning from the source.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

S — and it's a simple puzzle why this is the letter I chose.

101 questions with a bioinformatician #29: Jane Loveland

This post is part of a series that interviews some notable bioinformaticians to get their views on various aspects of bioinformatics research. Hopefully these answers will prove useful to others in the field, especially to those who are just starting their bioinformatics careers.


Jane Loveland is a Senior Computer Biologist at The Wellcome Trust Sanger Institute where she is involved in a number of key projects relating to genome annotation and training.

As a manager in the HAVANA group (Human and Vertebrate Analysis and Annotation), she helps oversee the valuable work in using manual annotation to provide a reference gene set for the human, mouse, and zebrafish genomes. HAVANA's annotation is made publicly available via the Vega genome browser, which is in turn merged with the annotation in Ensembl to produce the reference GENCODE gene set.

Jane also leads a team of instructors for Wellcome Trust Advanced Courses which teach workshops all over the world, in particular the Open Door Workshops:

The Open Door Workshop provides an introduction to bioinformatics tools freely available on the internet, focussing primarily on the Human Genome data. The workshops provide hands-on training in the use of public databases and web-based sequence analysis tools, and are taught by experienced instructors.

And now, on to the 101 questions...



001. What's something that you enjoy about current bioinformatics research?

The speed of change. From an annotation view point, we are constantly having to find ways to use new data sources which in turn adds value to the annotation that we produce.

When I’m putting together a manual for a workshop I have to update everything, every time. I have come into bioinformatics from wet lab biochemistry/molecular biology and I once spent an entire week hand-crafting a multiple alignment figure for my thesis. I can do this in a few minutes now.



010. What's something that you don't enjoy about current bioinformatics research?

Everyone assumes that all genome sequences are 'finished' (KRB: I don't!). They may be sequenced but the quality is often pretty poor compared to the sequence that we were producing at the Sanger Institute about a decade ago.

You can’t interpret what’s going on in a genome if the underlying reference sequence is of poor quality. I do a lot of teaching and spend a lot of time explaining this to researchers.



011. If you could go back in time and visit yourself as a 18 year old, what single piece of advice would you give yourself to help your future bioinformatics career?

Just go for it. Bit of a cliché I know. I had a crippling lack of confidence when I was younger which I think really held me back.



100. What's your all-time favorite piece of bioinformatics software, and why?

For annotation: Blixem. This is an interactive graphical BLAST viewer — old but essential for gene annotation. Means that I can view alignments to the genome at base pair level really quickly and simply.

For workshops: Ensembl. You have to be able to browse a genome.



101. IUPAC describes a set of 18 single-character nucleotide codes that can represent a DNA base: which one best reflects your personality, and why?

Can I have I for inosine? Reminds me of making degenerate primers for PCR. It's a multi-tasker, which is also how I see myself. It's not on the list though (KRB: everyone keeps breaking the rules!).