A new tool to boost the N50 length of your genome assembly

We all know that the most important aspect of any genome assembly is the N50 length of its contigs or scaffolds. Higher N50 lengths are clearly correlated with increases in assembly quality and any good bioinformatician should be looking to maximize the N50 length of any assembly they are making.

I am therefore pleased that I can today announce the release of a new software tool, N50 Booster!!! that can help you increase the N50 length of an existing assembly. This tool was written in C for maximum computational efficiency and then reverse engineered into Perl for maximum obfuscation.

This powerful software is available as a Perl script (n50_booster.pl) that can be downloaded from our lab's website. The only requirement for this script is the FAlite.pm Perl module (also available from our lab's website).

Before I explain how this script works to boost an assembly's N50 length, I will show a real-world example. I ran the script on release WS230 of the Caenorhabditis japonica genome assembly:

$ n50_booster.pl c_japonica.WS230.genomic.fa

Before:
==============
Total assembly size = 166256191 bp
N50 length = 94149 bp

Boosting N50...please wait

After:
==============
Total assembly size = 166256191 bp
N50 length = 104766 bp

Improvement in N50 length = 10617 bp

See file c_japonica.WS230.genomic.fa.n50 for your new (and improved) assembly

As you can see, N50 Booster!!! not only makes a substantial increase to the N50 length of the C. japonica assembly, it does so while preserving the assembly size. No other post-assembly manipulation tool boasts this feature!

The n50_booster.pl script works by creating a new FASTA file based on the original (but which includes a .n50 suffix) and ensures that the new file has an increased N50 length. The exact mechanism by which N50 Booster!!! works will be evident from an inspection of the code.

I am confident that N50 Booster!!! can give your genome assembly a much needed boost and the resultant increase in N50 length will lead to a much superior assembly which will increase your chances of a publication in a top-tier journal such as the International Journal of Genome Assembly or even the Journal of International Genome Assembly.

Update: 2014-04-08 09.44 — I wrote a follow up post to this one which goes into more detail about how N50 Booster!!! works and discusses what people could (and should) do to the shortest sequences in their genome assemblies.