As more people use N50 as a metric, fewer genomes seem to be 'completed'

If you search Google Scholar for the term genome contig|scaffold|sequence +"N50 size|length" and then filter by year, you can see that papers which mention N50 length have increased dramatically in recent years:

 Google Scholar results for papers that mention N50 length. 2000–2013.

Google Scholar results for papers that mention N50 length. 2000–2013.

I'm sure that my search term doesn't capture all mentions of N50, and it probably includes a few false positives as well. It doesn't appear to be mentioned before 2001 at all, and I think that the 2001 Nature human genome paper may have been the first publication to use this metric.

Obviously, part of this growth simply reflects the fact that more people are sequencing genomes (or at least writing about sequenced genomes), and therefore feel the need to include some form of genome assembly metric. A Google Scholar search term for "genome sequence|assembly" shows another pattern of growth, but this time with a notable spurt in 2013:

 Google Scholar results for papers that mention genome sequences or assemblies. 2000–2013.

Google Scholar results for papers that mention genome sequences or assemblies. 2000–2013.

Okay, so more and more people are sequencing genomes. This is good news, but only if those genomes are actually usable. This led me to my next query. How many people refer to their published genome sequence as complete? I.e. I searched Google Scholar for "complete|completed genome sequence|assembly". Again, this is not a perfect search term, and I'm sure it will miss some descriptions of what people consider to be complete genomes. But at the same time it probably filters out all of the 'draft genomes' that have been published. The results are a little depressing:

 Google Scholar results for papers that mention genome sequences or assemblies vs those that make mention of 'completed' genome sequences or assemblies. 2000–2013.

Google Scholar results for papers that mention genome sequences or assemblies vs those that make mention of 'completed' genome sequences or assemblies. 2000–2013.

So although there were nearly 90,000 publications last year that mentioned a genome sequence (or assembly), approximately just 7,500 papers mentioned the C-word. This is a little easier to visualize if you show the number of 'completed' genome publications as a percentage of the number of publications that mention 'genome sequence' (irrespective of completion status):

 Numbers of publications that mention 'completed' genomes as percentage of those that mention genomes. 2000–2013.

Numbers of publications that mention 'completed' genomes as percentage of those that mention genomes. 2000–2013.

Maybe  journal reviewers are more stringent about not allowing people to use the 'completed' word if the genome isn't really complete (which depending on your definition of 'complete' may include most genomes)? Or maybe people are just happier these days to sequence something, throw it through an assembler and then publish it, regardless of how incomplete it is?