Mapping reads to references by PEAR, bowtie2, samtool idx and sort tools

1 2 3 4
1b_EU781828 9359 160810 84317
1b_EU781827 9413 103840 48024
1b_D90208 9413 37970 19835
1b_M58335 9416 36799 19551
1e_KC248194 9422 4518 2222

The above result is from samtool idx tool. Please tell me what they mean, thanks.

1, reference name; 2, genome size; 3, mapped reads; 4, unmapped reads?

Welcome, @huiping

This tool, like many others in Galaxy, is the same as would be used outside of Galaxy. That means the external documentation/usage is generally the same, especially for the outputs.

How to know where to find docs and examples?

  1. Internet search for references, publications, specifications.

  2. Scroll down on the tool form to find available references, including link-outs to author resources and publications.

Sometimes an abbreviated Help section will also be included as a short reference directly on the tool form. The content is usually from the original documentation. This is intended as a supplement and usually addresses any Galaxy-specific details like data formats. Most of what happens at this forum is connecting all of these together.

This is the Help section for this Samtools idx tool, which explains the output along with an example. Review the link-outs for full details.

Help

What it does

Runs the samtools idxstats command. It retrieves and prints stats in the index file.

Input is a sorted and indexed BAM file, the output is tabular with four columns (one row per reference sequence plus a final line for unmapped reads):

Column Description


 1 Reference sequence identifier
 2 Reference sequence length
 3 Number of mapped reads
 4 Number of placed but unmapped reads
      (typically unmapped partners of mapped reads)

Example output from a de novo assembly:

contig_1 170035 98397 0
contig_2 403835 199564 0
contig_3 553102 288189 0
… … … …
contig_603 653 50 0
contig_604 214 6 0
* 0 0 50320
In this example there were 604 contigs, each with one line in the output table, plus the final row (labelled with an asterisk) representing 50320 unmapped reads. In this BAM file, the final column was otherwise zero.

The results of samtools ixdstats can be visualized with MultiQC.

Peter J.A. Cock (2013), Galaxy wrapper for the samtools idxstats command

Citations

Definition of SAM/BAM format. (n.d.). HTS format specifications

Visit Citation

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., & Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. Sequence Alignment/Map format and SAMtools | Bioinformatics | Oxford Academic

Visit Citation

Li, H. (2011). Improving SNP discovery by base alignment quality. Bioinformatics, 27(8), 1157–1158. Improving SNP discovery by base alignment quality | Bioinformatics | Oxford Academic

Visit Citation

Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), 2987–2993. statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data | Bioinformatics | Oxford Academic

Visit Citation

Danecek. (n.d.). Multiallelic calling model in bcftools (-m). http://samtools.github.io/bcftools/call-m.pdf

Visit Citation

Durbin, R. (n.d.). Segregation based metric for variant call QC. http://samtools.github.io/bcftools/rd-SegBias.pdf

Visit Citation

Li, H. (n.d.). Mathematical Notes on SAMtools Algorithms. http://www.broadinstitute.org/gatk/media/docs/Samtools.pdf

Visit Citation

SAMTools GitHub page. (n.d.). GitHub - samtools/samtools: Tools (written in C using htslib) for manipulating next-generation sequencing data

Visit Citation

Requirements See details

  • samtools (Version 1.13)

References

Thanks, Jennifer.
I wonder what you mean by that unmapped partners of mapped reads?

Best regards,
Chen

Hi, Jennifer,

Below is the result from bowtie2 aligner using multi-fasta references. I wonder what the number in column 4 means? Thanks.

|1b_EU781828|9359|160810|84317|
|1b_EU781827|9413|103840|48024|
|1b_D90208|9413|37970|19835|
|1b_M58335|9416|36799|19551|
|1e_KC248194|9422|4518|2222|
|1l_KC248197|9433|2819|731|
|1d_KJ439768|9451|2271|1190|
|1a_AF009606|9646|911|381|
|2m_JF735111|9618|370|158|
|1a_M62321|9401|187|61|
|1a_EF407457|9286|168|64|
|1l_KC248196|9428|13|3|
|1a_HQ850279|9191|10|5|
|2k_AB031663|9488|10|0|
|1a_AJ278830|9610|8|3|
|3i_FJ407092|9433|8|2|
|1n_KJ439781|9447|6|3|
|2k_JX227953|9380|6|1|
|1a_M67463|9416|5|3|
|1i_KJ439772|9421|5|2|
|2j_HM777358|9112|2|1|
|2a_D00944|9589|1|1|
|2u_JF735112|9520|1|0|
|3a_D28917|9454|1|1|
|3e_KJ470618|9458|1|0|
|3i_JX227955|9350|1|0|
|4c_FJ462436|9468|1|1|
|6xe_JX183557|9460|1|1|

|*|0|0|650418|

|1c_AY051292|9440|0|0|
|1c_AY651061|9441|0|0|
|1c_D14853|9487|0|0|
|1g_AM910652|9490|0|0|
|1h_KC248198|9421|0|0|
|1h_KC248199|9420|0|0|
|1j_KJ439773|9413|0|0|
|1k_KJ439774|9446|0|0|
|1l_KC248193|9430|0|0|
|1m_KJ439778|9448|0|0|
|1m_KJ439782|9448|0|0|
|1n_KJ439775|9411|0|0|
|1o_KJ439779|9413|0|0|
|1o_MH885469|9359|0|0|
|2a_AB047639|9678|0|0|
|2a_HQ639944|9392|0|0|
|2b_AB030907|9654|0|0|
|2b_AB661382|9102|0|0|
|2b_AB661388|9102|0|0|
|2b_D10988|9511|0|0|
|2c_D50409|9513|0|0|
|2c_JX227949|9295|0|0|
|2d_JF735114|9508|0|0|
|2e_JF735120|9825|0|0|
|2f_KC844042|9472|0|0|
|2f_KC844050|9470|0|0|
|2i_DQ155561|9370|0|0|
|2j_HM777359|9383|0|0|
|2j_JF735113|9509|0|0|
|2m_JX227967|9383|0|0|
|2q_F666428|9566|0|0|
|2q_F666429|9398|0|0|
|2r_JF735115|9512|0|0|
|2t_KC197238|9616|0|0|
|2v_MW041295|9403|0|0|
|2v_MW041297|9464|0|0|
|2v_MW041298|9419|0|0|
|2v_MW041299|9472|0|0|
|3a_D17763|9456|0|0|
|3a_J714194|9429|0|0|
|3a_X76918|9390|0|0|
|3b_D49374|9444|0|0|
|3b_JQ065709|9383|0|0|
|3d_KJ470619|9458|0|0|
|3g_JF735123|9634|0|0|
|3g_JX227954|9353|0|0|
|3h_JF735121|9579|0|0|
|3h_JF735126|9633|0|0|
|3k_D63821|9450|0|0|
|3k_JF735122|9660|0|0|
|4a_DQ418789|9295|0|0|
|4a_DQ988074|9054|0|0|
|4a_Y11604|9354|0|0|
|4b_FJ462435|9440|0|0|
|4d_DQ418786|9273|0|0|
|4d_EU392172|9299|0|0|
|4d_FJ462437|9088|0|0|
|4f_EF589161|9304|0|0|
|4f_EU392174|9298|0|0|
|4f_EU392175|9297|0|0|
|4g_FJ462432|9435|0|0|
|4g_JX227963|9314|0|0|
|4g_JX227971|9174|0|0|
|4k_EU392171|9281|0|0|
|4k_EU392173|9312|0|0|
|4k_FJ462438|9438|0|0|
|4l_FJ839870|9388|0|0|
|4l_JX227957|9317|0|0|
|4m_FJ462433|9425|0|0|
|4m_JX227972|9305|0|0|
|4n_FJ462441|9435|0|0|
|4n_JX227970|9301|0|0|
|4o_FJ462440|9422|0|0|
|4o_JX227977|9286|0|0|
|4p_FJ462431|9475|0|0|
|4q_FJ462434|9433|0|0|
|4r_FJ462439|9440|0|0|
|4r_JX227976|9319|0|0|
|4s_JF735136|9426|0|0|
|4t_FJ839869|9488|0|0|
|4v_HQ537008|9174|0|0|
|4v_HQ537009|9177|0|0|
|4v_JX227959|9323|0|0|
|4v_JX227960|9270|0|0|
|4w_FJ025855|9242|0|0|
|4w_FJ025856|9271|0|0|
|5a_AF064490|9408|0|0|
|5a_Y13184|9343|0|0|
|6a_AY859526|9355|0|0|
|6a_EU246930|9376|0|0|
|6a_HQ639936|9315|0|0|
|6a_Y12083|9340|0|0|
|6b_D84262|9628|0|0|
|6c_EF424629|9459|0|0|
|6d_D84263|9615|0|0|
|6e_DQ314805|9468|0|0|
|6e_EU246932|9382|0|0|
|6f_DQ835760|9454|0|0|
|6f_EU246936|9370|0|0|
|6g_D63822|9461|0|0|
|6g_DQ314806|9462|0|0|
|6h_D84265|9621|0|0|
|6i_DQ835762|9458|0|0|
|6i_DQ835770|9447|0|0|
|6j_DQ835761|9442|0|0|
|6j_DQ835769|9454|0|0|
|6k_D84264|9601|0|0|
|6l_EF424628|9453|0|0|
|6l_JX183556|9413|0|0|
|6m_DQ835766|9449|0|0|
|6m_DQ835767|9444|0|0|
|6n_DQ278894|9441|0|0|
|6n_DQ835768|9447|0|0|
|6n_EU246938|9361|0|0|
|6o_EF424627|9450|0|0|
|6o_EU246934|9364|0|0|
|6p_EF424626|9453|0|0|
|6q_EF424625|9463|0|0|
|6r_EU408328|9449|0|0|
|6s_EU408329|9473|0|0|
|6t_EF632071|9406|0|0|
|6t_EU246939|9364|0|0|
|6u_EU246940|9367|0|0|
|6v_EU158186|9428|0|0|
|6v_EU798760|9461|0|0|
|6v_EU798761|9429|0|0|
|6w_DQ278892|9448|0|0|
|6w_EU643834|9147|0|0|
|6w_EU643836|9144|0|0|
|6xa_EU408330|9451|0|0|
|6xa_EU408331|9470|0|0|
|6xa_EU408332|9443|0|0|
|6xb_JX183552|9533|0|0|
|6xb_KJ567645|9401|0|0|
|6xc_KJ567651|9442|0|0|
|6xd_KM252789|9432|0|0|
|6xd_KM252790|9435|0|0|
|6xd_KM252791|9434|0|0|
|6xe_KM252792|9401|0|0|
|6xf_KJ567646|9428|0|0|
|6xf_KJ567647|9421|0|0|
|6xg_MH492360|9318|0|0|
|6xg_MH492361|9318|0|0|
|6xg_MH492362|9318|0|0|
|6xh_MG879000|9057|0|0|
|6xi_JX183549|9448|0|0|
|6xi_MZ504973|9361|0|0|
|6xi_MZ504976|9361|0|0|
|6xj_DQ278891|9440|0|0|
|6xj_DQ278893|9430|0|0|
|6xj_MZ171127|9393|0|0|
|6xj_MZ171134|9396|0|0|
|7a_EF108306|9357|0|0|
|7b_KX092342|9382|0|0|
|8a_MH590698|9347|0|0|
|8a_MH590699|9351|0|0|
|8a_MH590700|9547|0|0|
|8a_MH590701|9437|0|0|

To better understand the difference between number of placed but unmapped reads versus the more general unmapped reads, maybe review the mapping statistics output by Bowtie2?

Both Samtools idxstats and Bowtie2 are generating statistics based on the BAM content (flags). The first is a higher level summary (and works on mapped BAM results generated by any (most?) tools) and the latter is a breakdown with more context specific to that tool.

You could also do some exploration with a tool like Filter BAM and other tools under the SAM/BAM tool group.