Fun with but remember to check the sample size

On Friday, I saw this tweet by Richard Smith-Unna (@blahah404): is a great site and can be really useful, but as always in science, you should sometimes be careful of some statistics. TimeTree uses published estimates of divergence times to list the 'mean' and 'median' times that two species diverged.

For some species, e.g. cat and dog, there are lots of published studies so you can perhaps be more confident about the averages:

Each black dot in this figure represents a date from a separate published study. In the sheep/goat comparison that Richard tweeted about, there are only two data points (6.2 and 8.5 million years ago). However, these are close enough together that the headline mean divergence time of 7.3 million years gives a reasonable estimate (albeit based on two data points).

But then you get to species comparisons such as Caenorhabditis elegans vs Caenorhabditis briggsae:

I would hate it if anyone seriously reported this 'average' figure of 51.3 million years, without pointing out that it is an average of '1.0' and '101.5'. Always be suspicious of averages without seeing the spread of data!

Update: 2014-04-07 14.48 — Richard Smith-Unna has now looked at the references behind the 1.0 and 10.1.5 million year dates and spotted errors in how this has been reported in TimeTree. See his comment below. This merits another cautionary warning when using data like this...don't assume that data extracted from papers by humans/robots/software will always be done correctly.