Viewing a .bam file in the console

One thing we do regularly at work is taking a look at aligned sequences of human DNA as generated by what is called “next-generation sequencing”. This data is stored in so-called .bam files, which can get pretty large. For example, the .bam file for an individual whose whole genome is sequenced at 12x coverage is approximately 60GB.
To view these files, to check the alignment, look at the coverage of a specific region, etc, people typically use graphical browsers like the IGV or Savant. However, these require you to either run the tool on the server (which means relatively slow X-forwarding over SSH) or copying the BAM file to your local machine, which also takes a lot of time, especially if you want to take a look at a single region for a bunch of people.

For jobs like that I’ve found the text-based viewer integrated in SamTools to be very convenient. It’s a matter of running

samtools tview sample.bam /path/to/reference.genome.fasta

after which you get a view like this:

1000821   1000831   1000841   1000851   1000861   1000871   1000881   1000891   1000901
GGCCAGGCAGGGCTTCTGGGTGGAGTTCAAGGTGCATCCTGACCGCTGTCACCTTCAGACTCTGTCCCCTGGGGCTGGGGCAAGTGCCCGATGGGAGCGCA
.....................................................................................................
..........          ......................A.......................T...............G........A........C
...........                                     .....................................................
............                                           ..............................................
..........................................................C...........      .......................A.
...................................................................................        ..........
                                                                                           ..........

Using g followed by 1:23000000 you will jump to the given position on the given chromosome.
If the 1:23000000 doesn’t work, check the header of the BAM file to see how the chromosome is specified (sometimes it is chr1:23000000, for example):

samtools view -H sample.bam

In the above example the dots indicate nucleotides that are identical to the reference (shown in the second line), the positions with letters indicate reads where a different base was read. In this example all of them are probably sequencing or alignment errors because only one discordant read is observed at any position. If you find a column with letters that means this position is indeed different from the reference. Also notice how the various reads are aligned and that in this case the coverage doesn’t seem to be very high.

Viewing a .bam file in the console

Related Images:

Leave a Reply Cancel reply

Categories

Meta

Archives

Viewing a .bam file in the console

Related Images:

Leave a Reply Cancel reply

Tags

Categories

Meta

Archives