Assembly Reconciliation Tools

Results

Quality Statistics

To obtain quality statistics of the resulting assemblies, we ran Quast2.3 using the following command:


    python ${PATH_TO}/quast-2.3/quast.py  -o $output_dir  -L --est-ref-size $genome_size --gage $result_assembly -R $reference
    

Gene Coverage

To calculate estimated gene coverage, first create a blast database of the output assembly, then align your genes against that database. This could be done using the following
blast commands:


    makeblastdb -in $Assembly -out databaseBLAST -dbtype nucl -parse_seqids
    blastn -query $gene_list_file -out output.blast.txt -db databaseBLAST -num_threads $N
    

To obtain Genes sequences for each specie, we used the reference genomes and the corresponding annotations:

You may download our gene list files. A perl script to calculate total percentage of gene coverage available from GitHub repository.

GAGE Statistics - No Reference

For Bombus_impatiens We used E-Size statistics provided by GAGE. The script can be downloaded from their website by clicking on this link. We used the following command to run the script.


    java GetFastaStats -o -min 500 -genomeSize <Genome Expected Size> $result_assembly
    

Synthetic Data Statistics

To assess the correctness of the merged assemblies we aligned the flawed synthetic input each of the resulting assemblies to the reference. and visualized the alignment using colored barplot. Pairwise alignments and visualization were generated an R script utilizing Decipher, an R Bioconductor package.

References:

  1. Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
  2. ES Wright, Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R. The R Journal, 8(1), 352-359 (2016).