Assembly Reconciliation Tools

Data sets

GAGE data set

We used the resulting assemblies of Genome Assembly Gold-Standard Evaluations (GAGE)'s evaluation of multiple assemblers on their dataset. The assemblies are publicly available and can be obtained from the results tab at their website, reads are also available.

Synthetic data set

We used Saccharomyces cerevisiae strain S288c reference (accession GCF_000146045.2), and introduced certain types of misassemblies. To produce synthetic reads, we utilized ART: a next-generation sequencing read simulator. To generate single reads we used the following command:


    ${PATH_TO}/art_bin_VanillaIceCream/art_illumina -sam -i $(PATH_TO}/Saccharomyces_cerevisiae.fasta -l 51 -f 15 -o yeast
    

While the paired-end reads are generated using the this command:


    ${PATH_TO}/art_bin_VanillaIceCream/art_illumina -sam -i $(PATH_TO}/Saccharomyces_cerevisiae.fasta -l 51 -f 15 -p -m 450 -s 10 -o yeast
	

Synthetic assemblies were generated using RSVSim, a bioconductor package that simulate structural variations. Synthetic flawed assemblies we used were generated by the following R script.

References:

  1. Salzberg, S. L. et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22, 557–567 (2012).
  2. Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
  3. Bartenhagen C. RSVSim: RSVSim: an R/Bioconductor package for the simulation of structural variations. R package version 1.14.0 (2015).