Notes about open source software, computers, other stuff.

Category: Linux (Page 5 of 8)

Viewing a .bam file in the console

One thing we do regularly at work is taking a look at aligned sequences of human DNA as generated by what is called “next-generation sequencing”. This data is stored in so-called .bam files, which can get pretty large. For example, the .bam file for an individual whose whole genome is sequenced at 12x coverage is approximately 60GB.
To view these files, to check the alignment, look at the coverage of a specific region, etc, people typically use graphical browsers like the IGV or Savant. However, these require you to either run the tool on the server (which means relatively slow X-forwarding over SSH) or copying the BAM file to your local machine, which also takes a lot of time, especially if you want to take a look at a single region for a bunch of people.

For jobs like that I’ve found the text-based viewer integrated in SamTools to be very convenient. It’s a matter of running

samtools tview sample.bam /path/to/reference.genome.fasta

after which you get a view like this:

1000821   1000831   1000841   1000851   1000861   1000871   1000881   1000891   1000901
GGCCAGGCAGGGCTTCTGGGTGGAGTTCAAGGTGCATCCTGACCGCTGTCACCTTCAGACTCTGTCCCCTGGGGCTGGGGCAAGTGCCCGATGGGAGCGCA
.....................................................................................................
..........          ......................A.......................T...............G........A........C
...........                                     .....................................................
............                                           ..............................................
..........................................................C...........      .......................A.
...................................................................................        ..........
                                                                                           ..........

Using g followed by 1:23000000 you will jump to the given position on the given chromosome.
If the 1:23000000 doesn’t work, check the header of the BAM file to see how the chromosome is specified (sometimes it is chr1:23000000, for example):

samtools view -H sample.bam

In the above example the dots indicate nucleotides that are identical to the reference (shown in the second line), the positions with letters indicate reads where a different base was read. In this example all of them are probably sequencing or alignment errors because only one discordant read is observed at any position. If you find a column with letters that means this position is indeed different from the reference. Also notice how the various reads are aligned and that in this case the coverage doesn’t seem to be very high.

Related Images:

Setting up (or fixing) an encrypted swap partition

Today I tried to clone my laptop’s harddrive to a new drive (thanks to Lenovo for sending me a replacement since the old drive was showing signs of breaking down). At first I tried dd, but that failed at around 90%, either because the old disk is indeed failing or because something fishy with the USB connection or enclosure in which I put the new disk. So I started gparted to check which partitions were copied OK and which weren’t. It turns out that all partitions were fine, except for my (encrypted) swap partition. gparted didn’t even recognise the partition type (on the original drive!). So after I replaced the harddrive I wanted to recreate the encrypted swap partition. It turn’s out to be easy if you follow the steps outlined in this blog post from Puny Geek. Thanks Puny Geek!

Related Images:

Exit a Bash script if an error occurs

Last week I found out that a Bash script I wrote to do some data QC gave me a false sense of security: a script continues even if one (or more) of the statements in the scripts fails (with an exit status not equal to 0). It turned out that for some of the data sets the QC wasn’t done correctly because I didn’t check the exit status after each step.

My first thought was: oh boy, that means I have to check $? for every step. That means a lot of repetitive code to write! Luckily my colleague came with the answer: add

set -e

at the top of you Bash script and the script will fail if one of its statements fails (for the fine print see the top answer in this is StackOverflow post).

Related Images:

Speeding up grep when looking in large files

In my line of work it is not uncommon to have to find out whether a given term is present in a long list. Say, for example you need to look up whether a set of, say 10, SNPs is present in a (possibly annotated) list of SNPs present on a genotyping array (having for example 240k SNPs).
My first instinct in such cases is to use grep, and it’s a good instinct that has served me well over the years.

Recently we had a case that involved quite some larger files. We needed to see whether a set of genomic positions was present in a genome-wide list of such positions. Of course we split the files up per chromosome, but still this took ~ 24 hours for a chromosome when using

grep -w -f short_list long_file > results

I was convinced this could be done faster and googled a bit, read the grep man page to find out that the -F option of grep ensures that the search string is not seen as a (regexp) pattern, but as fixed. This meant an enormous speed improvement. Instead of having to wait for 24 hours we got the output in under a minute!

I did a quick performance comparison: looking up ten items in a ~415MB file with 247,871 rows and 136 columns took ~2 minutes, 3 seconds with out -F and less than a second with the -F option:

$ time grep -w -f shortlist.txt largefile.tsv > out_withoutF
 
real    2m3.181s
user    2m0.780s
sys     0m2.196s
$ time grep -wF -f shortlist.txt largefile.tsv > out_withF
 
real    0m0.568s
user    0m0.500s
sys     0m0.060s

Related Images:

Fixing the NFS check plugin in Nagios (in Ubuntu)

For some time (probably after an upgrade, I actually don’t remember anymore) we had problems with the NFS check in Nagios on our Ubuntu 12.04 servers. The check would return UNKNOWN: RPC program nfs udp is not running. When running the actual check from the command line:

/usr/lib/nagios/plugins/check_rpc -H '$HOSTADDRESS$' -C nfs -c2,3

the output would be: Can't fork for rpcinfo.
It turns out that the file /usr/lib/nagios/plugins/utils.pm has the wrong path to the rpcinfo binary. Instead of /usr/sbin/rpcinfo it lists /usr/bin/rpcinfo. So, like most of the times, the fix is easy, but pinpointing the exact problem isn’t.

Don’t forget to restart Nagios after changing the path as utils.pm needs to be reloaded.

As Ubuntu is based on Debian, I expect this fix to work there as well. According to this Launchpad bug report this issue was fixed in January in version 1.4.16-1ubuntu1 of the nagios-plugins package, which is not in Ubuntu 12.04.

Related Images:

Importing a git repo into another one (keeping all history)

Today I wanted to create a new git repository that should contain several subdirectories that each were initially stored as separate git repos. Of course I didn’t want to lose the history. Thanks to user ebneter‘s answer at StackOverflow I was able to do so. These are the steps I took:

mkdir new_combined_repo
git init                           # Make empty new 'container' repo (no need to create a subdir at this point yet)
git remote add oldrepo /path/to/oldrepo
git fetch oldrepo
git checkout -b olddir oldrepo/master
mkdir olddir
git mv stuff olddir/stuff          # as necessary
git commit -m "Moved stuff to olddir"
git checkout master                
git merge olddir                   # should add olddir/ to master
git commit
git remote rm oldrepo
git branch -d olddir               # to get rid of the extra branch before pushing
git push                           # if you have a remote, that is

Related Images:

Converting from bzr to git

I’m in the process of moving several of my projects that used Bazaar (bzr) for revision control to Git. Converting a repository from bzr to git is very easy when using the fastimport package. In a Debian-based distribution run the following command to install the package (don’t be fooled by its name, it also contains the fastexport option):

sudo aptitude install bzr-fastimport

The go into the directory that contains your bzr repo and run:

git init
bzr fast-export `pwd` | git fast-import 

You can now check a few things, e.g. running git log to see whether the change log was imported correctly. This is also the moment to move the content of your .bzrignore file to a .gitignore file.

If all is well, let’s clean up:

rm -r .bzr 
git reset HEAD

Thanks to Ron DuPlain for his post here, from which I got most of this info.

Related Images:

Solving “RTNETLINK answers: File exists” when running ifup

On a server with multiple network cards I tried to configure the eth3 interface by editing /etc/network/interfaces (this was an Ubuntu 12.04 machine).

This was the contents of /etc/networking/interfaces:

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
        address xxx.yyy.zzz.mmm
        netmask 255.255.255.0
        gateway xxx.yyy.zzz.1
        dns-nameservers xxx.yyy.zzz.aaa xxx.yyy.zzz.bbb
        dns-search mydomain.nl

auto eth3
iface eth3 inet static
        address 192.168.4.1
        netmask 255.255.255.0
        gateway 192.168.4.1

When I tried to bring the interface up I got an error message:

$ ifup eth3
RTNETLINK answers: File exists
Failed to bring up eth3.

It took me a while to figure it out, but the problem was the gw line in the eth3 entry. Of course you can only have one default gateway in your setup. I missed this because I was also trying to add routes to networks behind the machine on the other end of eth3.
In the end, removing the gw line in the eth3 entry solved the problem.

My final /etc/networking/interfaces looks like this:

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
        address xxx.yyy.zzz.mmm
        netmask 255.255.255.0
        gateway xxx.yyy.zzz.1
        dns-nameservers xxx.yyy.zzz.aaa xxx.yyy.zzz.bbb
        dns-search mydomain.nl

auto eth3
iface eth3 inet static
        address 192.168.4.1
        netmask 255.255.255.0
        post-up /sbin/route add -net 192.168.1.0 netmask 255.255.255.0 gw 192.168.4.250
        post-up /sbin/route add -net 192.168.2.0 netmask 255.255.255.0 gw 192.168.4.250
        post-up /sbin/route add -net 192.168.3.0 netmask 255.255.255.0 gw 192.168.4.250
        post-down /sbin/route del -net 192.168.1.0 netmask 255.255.255.0
        post-down /sbin/route del -net 192.168.2.0 netmask 255.255.255.0
        post-down /sbin/route del -net 192.168.3.0 netmask 255.255.255.0

Update 2013-08-19: Removed network entries as per Ville’s suggestion.

Related Images:

Pairing a device with a Logitech unifying receiver in Linux

My girlfriend’s keyboard and mouse stopped working some time ago. It turned out that her Logitech unifying receiver (a small USB dongle for keyboard and mouse) was a bit broken, only when twisted in a certain way it would work. So, I called Logitech, explained the situation and they offered to send us a replacement for free. Well done Logitech support!

Now, since we both use Linux as our main OS, the question was how to pair the mouse and keyboard with the new receiver. Logitech provides a piece of Windows software, but nothing for Linux. It turns out it’s not that difficult and you can find various little C programmes that do it for you. I tried Travis Reeder’s solution and it worked like a charm on my Ubuntu 12.04 machine.

These are the steps I took.
First I switched off the keybord and the mouse, then ran the following:

$ git clone https://github.com/treeder/logitech_unifier.git
Cloning into 'logitech_unifier'...
remote: Counting objects: 35, done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 35 (delta 11), reused 33 (delta 9)
Unpacking objects: 100% (35/35), done.
$ cd logitech_unifier/
$ ./autopair.sh 
Logitech Unified Reciever unify binary not compiled, attemping compilation
Logitech Unified Reciever unify binary was successfully compiled
Auto-discovering Logitech Unified Reciever
Logitech Unified Reciever found on /dev/hidraw0!
Turn off the device you wish to pair and then press enter
[sudo] password for lennart: 
The receiver is ready to pair a new device.
Switch your device on to pair it.

I ran the autopair.sh script twice, once for the mouse and once for the keyboard.

Thanks Travis!

Related Images:

Using CUA selection mode in Emacs to edit rectangles

Today Planet Emacsen brought me Irreal’s second blog post in a short time on CUA mode in Emacs. So far I’ve always ignored it because as far as I knew CUA mode is about getting the Windows keyboard shortcuts of Ctrl-c, Ctrl-x and Ctrl-v for copying and pasting to Emacs. The thing is, I date back to the DOS era when Shift-Del and Shift-Ins were used for that, so back in my Windows days I never got used to those ‘new’ keyboard shortcut. Now that I’ve been an Emacs user for more than a decade I’m so used to C-w and C-y and I see no reason for having the Windows shortcuts work in Emacs.

Back to Irreal. In his recent blog posts he writes about a subset of cua-mode: cua-selection-mode. The video by Mark Mansour that we writes about says it all (it’s short, so go and watch it!). What cua-selection-mode is all about is rectangle editting. So far I’ve been using the regular Emacs keys for rectangle selection and editing (basically C-space to select a rectangle and C-r-k to cut it, C-r-t to insert text and C-r-y to paste a rectangle). By setting

(cua-selection-mode 1)

in your ~/.emacs file you only enable the rectangle features of CUA mode.

So, for those that didn’t watch the video, what does the rectangle editing mean? It means that you can for example simply insert a list of increasing numbers in a text (this may come in handy in an org-mode table for example), or you can insert the same text in front of and/or behind a selected column of text.

Key combos to remember are:

  • C-return: Start selection
  • return: move the cursor to top-left, top-right, bottom-left and bottom-right corner of the selected rectangle
  • C-?: briefly list the available key combinations (with rectangle selection enabled)
  • M-i: if the selection is a column of numbers increase the numbers (by one)
  • M-n: Insert a number in the column (asks for start value and increment value)
  • C-1 C-w: Kill (cut) the contents of the rectangle to register 1 (you can use number 0–9 for different registers). Using C-1 C-y yanks (pastes) the rectangle at the cursor position.

Related Images:

« Older posts Newer posts »

© 2024 Lennart's weblog

Theme by Anders NorĂ©nUp ↑