Notes about open source software, computers, other stuff.

Tag: script (Page 2 of 2)

Speeding up grep when looking in large files

In my line of work it is not uncommon to have to find out whether a given term is present in a long list. Say, for example you need to look up whether a set of, say 10, SNPs is present in a (possibly annotated) list of SNPs present on a genotyping array (having for example 240k SNPs).
My first instinct in such cases is to use grep, and it’s a good instinct that has served me well over the years.

Recently we had a case that involved quite some larger files. We needed to see whether a set of genomic positions was present in a genome-wide list of such positions. Of course we split the files up per chromosome, but still this took ~ 24 hours for a chromosome when using

grep -w -f short_list long_file > results

I was convinced this could be done faster and googled a bit, read the grep man page to find out that the -F option of grep ensures that the search string is not seen as a (regexp) pattern, but as fixed. This meant an enormous speed improvement. Instead of having to wait for 24 hours we got the output in under a minute!

I did a quick performance comparison: looking up ten items in a ~415MB file with 247,871 rows and 136 columns took ~2 minutes, 3 seconds with out -F and less than a second with the -F option:

$ time grep -w -f shortlist.txt largefile.tsv > out_withoutF
 
real    2m3.181s
user    2m0.780s
sys     0m2.196s
$ time grep -wF -f shortlist.txt largefile.tsv > out_withF
 
real    0m0.568s
user    0m0.500s
sys     0m0.060s

Related Images:

Pairing a device with a Logitech unifying receiver in Linux

My girlfriend’s keyboard and mouse stopped working some time ago. It turned out that her Logitech unifying receiver (a small USB dongle for keyboard and mouse) was a bit broken, only when twisted in a certain way it would work. So, I called Logitech, explained the situation and they offered to send us a replacement for free. Well done Logitech support!

Now, since we both use Linux as our main OS, the question was how to pair the mouse and keyboard with the new receiver. Logitech provides a piece of Windows software, but nothing for Linux. It turns out it’s not that difficult and you can find various little C programmes that do it for you. I tried Travis Reeder’s solution and it worked like a charm on my Ubuntu 12.04 machine.

These are the steps I took.
First I switched off the keybord and the mouse, then ran the following:

$ git clone https://github.com/treeder/logitech_unifier.git
Cloning into 'logitech_unifier'...
remote: Counting objects: 35, done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 35 (delta 11), reused 33 (delta 9)
Unpacking objects: 100% (35/35), done.
$ cd logitech_unifier/
$ ./autopair.sh 
Logitech Unified Reciever unify binary not compiled, attemping compilation
Logitech Unified Reciever unify binary was successfully compiled
Auto-discovering Logitech Unified Reciever
Logitech Unified Reciever found on /dev/hidraw0!
Turn off the device you wish to pair and then press enter
[sudo] password for lennart: 
The receiver is ready to pair a new device.
Switch your device on to pair it.

I ran the autopair.sh script twice, once for the mouse and once for the keyboard.

Thanks Travis!

Related Images:

Using Plugwise adapters with Linux

Yesterday I received a small package I had ordered: the Plugwise Home Start kit. According to the box it is an energy management and control system. The idea is that you insert a sort of adaptor between a power socket and a device and using the Plugwise Source software you can monitor the power usage of the device. Furthermore, you can use the software to create schedule to turn the device on and off at a specific time.

The package contains the following:

  • a USB adapter (called the Stick)
  • a Circle+, the master adaptor that keeps track of the other devices in the network
  • a Circle, the regular members of the Plugwise network

The Circles communicate to each other using the ZigBee protocol in the 2.4GHz range. According to the documentation, the range of each Circle is about 5m.

Unfortunately the Source software only runs on windows. Luckily some people have already analysed the protocol and written some software to control the Plugwise devices (see links below).

First steps

Plugging the USB dongle in gives the following output in /var/log/syslog:

Nov 19 12:20:37 barabas kernel: [  182.855742] usb 1-1.6.1.1.3: new full speed USB device number 14 using ehci_hcd
Nov 19 12:20:37 barabas mtp-probe: checking bus 1, device 14: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.1/1-1.6.1.1/1-1.6.1.1.3"
Nov 19 12:20:37 barabas mtp-probe: bus: 1, device: 14 was not an MTP device
Nov 19 12:20:37 barabas kernel: [  183.169370] usbcore: registered new interface driver usbserial
Nov 19 12:20:37 barabas kernel: [  183.169389] USB Serial support registered for generic
Nov 19 12:20:37 barabas kernel: [  183.169431] usbcore: registered new interface driver usbserial_generic
Nov 19 12:20:37 barabas kernel: [  183.169434] usbserial: USB Serial Driver core
Nov 19 12:20:37 barabas kernel: [  183.171310] USB Serial support registered for FTDI USB Serial Device
Nov 19 12:20:37 barabas kernel: [  183.171552] ftdi_sio 1-1.6.1.1.3:1.0: FTDI USB Serial Device converter detected
Nov 19 12:20:37 barabas kernel: [  183.171588] usb 1-1.6.1.1.3: Detected FT232RL
Nov 19 12:20:37 barabas kernel: [  183.171591] usb 1-1.6.1.1.3: Number of endpoints 2
Nov 19 12:20:37 barabas kernel: [  183.171595] usb 1-1.6.1.1.3: Endpoint 1 MaxPacketSize 64
Nov 19 12:20:37 barabas kernel: [  183.171598] usb 1-1.6.1.1.3: Endpoint 2 MaxPacketSize 64
Nov 19 12:20:37 barabas kernel: [  183.171602] usb 1-1.6.1.1.3: Setting MaxPacketSize 64
Nov 19 12:20:37 barabas kernel: [  183.171975] usb 1-1.6.1.1.3: FTDI USB Serial Device converter now attached to ttyUSB0
Nov 19 12:20:37 barabas kernel: [  183.171998] usbcore: registered new interface driver ftdi_sio
Nov 19 12:20:37 barabas kernel: [  183.172002] ftdi_sio: v1.6.0:USB FTDI Serial Converters Driver
Nov 19 12:20:37 barabas modem-manager[901]: <info>  (ttyUSB0) opening serial port...
Nov 19 12:20:49 barabas modem-manager[901]: <info>  (ttyUSB0) closing serial port...
Nov 19 12:20:49 barabas modem-manager[901]: <info>  (ttyUSB0) serial port closed
Nov 19 12:20:49 barabas modem-manager[901]: <info>  (ttyUSB0) opening serial port...
Nov 19 12:20:55 barabas modem-manager[901]: <info>  (ttyUSB0) closing serial port...
Nov 19 12:20:55 barabas modem-manager[901]: <info>  (ttyUSB0) serial port closed

lsusb gives:

Bus 001 Device 014: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC

I couldn’t get the pairing to work under Linux (with the PlugwiseOnLinux scripts), Even though I corrected the MAC address in the python code. I then tried it in Windows where I failed at first also. After resetting the Circle+ and the Circle (removing/inserting them into the power outlet with 3 second intervals, as mentioned in the FAW on the plugwise website) I managed to pair the Circles. Looking back, I think I didn’t wait long enough for the pairing to work under Linux. During the trials in Windows I noticed that the pairing can take up to about 5 minutes…

Back in Linux I used python-plugwise (see links below) to turn the Circles on and off, e.g. this is how I turn my Circle+ off (note that I am a member of the dialout group, which is needed to communicate with /dev/ttyUSB0):

$ plugwise_util -d /dev/ttyUSB0 -m 000D6F0000B1C117 -s off

This is what I want! The only minor downside of python-plugwise is that it depends on the crcmod python library, which apparently is not package for Debian/Ubuntu. So installing using the python-setup framework as mentioned in the README is necessary.

Reading out the current power usage of my Circle works also:

$ plugwise_util -d /dev/ttyUSB0 -m 000D6F0000B85134 -p
power usage: 2.27W

So, no that it works, what am I going to do with the Plugwise modules? I’m going to use them in my backup scripts to switch the power to my external hard drives.

Making a .deb

I used checkinstall to make a package of python-plugwise. In a working directory, first check out the source code of python-plugwise using mercurial, as mentioned on the web site:

$ hg clone https://bitbucket.org/hadara/python-plugwise

Then run checkinstall and don’t forget to fill in the details correctly. For example, the package name is ‘python’ by default, which you definitely don’t want, since that would overwrite Ubuntu’s default ‘python’ package. Also make sure that you remove the crcmod python library if you installed python-plugwise before, otherwise it won’t get packaged. The output below shows the final values, after I changed them.

$ sudo checkinstall -D python setup.py install
 
checkinstall 1.6.2, Copyright 2009 Felipe Eduardo Sanchez Diaz Duran
	      This software is released under the GNU GPL.
 
 
 
*****************************************
**** Debian package creation selected ***
*****************************************
 
This package will be built according to these values:
 
0 -  Maintainer: [ lennart@karssen.org ]
1 -  Summary: [ python-plugwise is used to control the Plugwise power switches as well as read out information on power usage. ]
2 -  Name:    [ python-plugwise ]
3 -  Version: [ 0.2-hg-20111120 ]
4 -  Release: [ 1 ]
5 -  License: [ GPL ]
6 -  Group:   [ checkinstall ]
7 -  Architecture: [ amd64 ]
8 -  Source location: [ python-plugwise ]
9 -  Alternate source location: [  ]
10 - Requires: [ python ]
11 - Provides: [ python-plugwise ]
12 - Conflicts: [  ]
13 - Replaces: [  ]
 
Enter a number to change any of them or press ENTER to continue:

You can check the contents of the package to make sure the crcmod
files are included using dpkg:

$ dpkg --contents python-plugwise_0.2-hg-20111120-1_amd64.deb

An idea for later: make an SNMP module that calls plugwise_util to get the power usage so that I can monitor the power usage of a device using Cacti.

Links

Related Images:

Embedding album art in FLAC files

I recently wanted to add cover art to my collection of FLAC-encoded audio files. I wrote the following simple script to help me automate the process. Running this script in a given directory (I group my music in directories per artist, followed by a subdirectory for each album) with the name of the album art image file name as argument then automatically embeds the image in the FLAC/Vorbis tag.

#!/bin/bash
#
# This script embeds a given image (usually .jpg) as album art in the
# FLAC files in the present directory (and its subdirectories).
#
# Time-stamp: <2011-07-31 20:43:23 (lennart)>
 
coverart=$1
 
find . -name "*.flac" -print0 |xargs -0 metaflac --import-picture-from="$coverart"

Related Images:

Using rsync to backup to a remote Synology Diskstation

An updated version of the script can be found here.

I recently bought a NAS, a Synology DiskStation DS211j and stuffed two 1TB disks in it. I configured the disks to be in RAID 1 (mirrored) in case one of them decides to die. I then brought the NAS to a family member’s house and installed it there. Now she uses it to back up her important files (and as a storage tank for music and videos).

The good thing for me is that I can now make off-site backups of my home directories. I configured the DS211j to accept SSH connections so that I can log into it (as user admin or root). I used the web interface to create a directory for my backups (which appeared to be /volume1/BackupLennart after logging in with SSH).

After making a hole in her firewall that allowed me to connect to the DS211j, I created a backup script in /etc/cron.daily with the following contents:

#!/bin/bash
#
# This script makes a backup of my home dirs to a Synology DiskStation at
# another location. I use LVM for my /home, so I make a snapshot first and
# backup from there.
#
# Time-stamp: <2011-02-06 21:30:14 (lennart)>
 
###############################
# Some settings
###############################
 
# LVM options
VG=raidvg01
LV=home
MNTDIR=/mnt/home_rsync_snapshot/
 
# rsync options
DEST=root@remote-machine.example.com:/volume1/BackupLennart/
SRC=${MNTDIR}/*
OPTIONS="-e ssh --delete --progress -azvhHS --numeric-ids --delete-excluded "
EXCLUSIONS="--exclude lost+found --exclude .thumbnails --exclude .gvfs --exclude .cache --exclude Cache"
 
 
 
###############################
# The real work
###############################
 
# Create the LVM snapshot
if [ -d $MNTDIR ]; then
    # If the snapshot directory exists, another backup process may be
    # running
    echo "$MNTDIR already exists! Another backup still running?"
    exit -1
else
    # Let's make snapshots
    mkdir -p $MNTDIR
    lvcreate -L5G -s -n snap$LV /dev/$VG/$LV
    mount /dev/$VG/snap$LV $MNTDIR
fi
 
 
# Do the actual backup
rsync $OPTIONS $EXCLUSIONS $SRC $DEST
 
# Remove the LVM snapshot
if [ -d $MNTDIR ]; then
    umount /dev/$VG/snap$LV
    lvremove -f /dev/$VG/snap$LV
    rmdir $MNTDIR
else
    echo "$MNTDIR does not exist!"
    exit -1
fi

Let’s walk through it: in the first section I configure several variables. Since I use LVM on my server, I can use it to make a snapshot of my /home partition. The LVM volume group I use is called ‘raidvg01’. Withing that VG my /home partition resides in a logical volume called ‘home’. The variable MNTDIR is the place where I mount the LVM snapshot of ‘home’.

The rsync options are quite straight forward. Check the rsync man page to find out what they mean. Note that I used the --numeric-ids option because the DS211j doesn’t have the same users as my server and this way all ownerships will still be correct if I ever need to restore from this backup.

In the section called “The real work” I first create the MNTDIR directory. Subsequently I create the LVM snapshot and mount it. After this the rsync backup can be run and finally I unmount the snapshot and remove it, followed by the removal of the MNTDIR.

Since the script is placed in /etc/cron.daily it will be executed every day. Since we use SSH to connect to the remote DS211j I set up SSH key access without a password. This Debian howto will tell you how to set that up.

The only thing missing in this setup is that the backups are not stored in an encrypted form on the remote NAS, but for now this is good enough. I can’t wait until the network bandwidth on both sides of this backup connection get so fast (and affordable) that I can easily sync my music as well. Right now uploads are so slow that I hardly dare to include those. I know that I shouldn’t complain since the Netherlands has one of the highest broadband penetrations in the world, but, hey, don’t you just always want a little more, just like Oliver Twist?

Related Images:

Script to tunnel RDP connections through stepping stone server using SSH

In order to connect to the servers at work we need to connect through a stepping stone host (via SSH). Since some of the servers are MS Windows machines which can be managed via the Remote Desktop Protocol (RDP), this traffic needs to be tunneled over SSH as well.
I wrote the following bash script to automate setting up the tunnel. It sets some default variables and then looks for an available port between 1234 and 1254 (chosen completely arbitrarily) and uses it for the tunnel. Then it uses the rdesktop program to start the RDP connection.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
#!/bin/bash
#
# This script makes an ssh tunnel to a stepping stone server
# and uses it to start an rdesktop connection to the machine 
# given as the first argument of the script. 
#
# (C) L.C. Karssen
# $Id: winremote.sh,v 1.14 2010/02/10 13:03:08 lennart Exp $
#
 
# User-configurable variables
ssh_username=your_steppingstone_username_here
steppingstone=steppingstone.your_company.com
rdesktop_username=your_windows_username_here
rdesktop_domain=your_windows_domain_here
rdesktop_opts="-z -g 1024x768 -a 16"
rdesktop_port=3389 # This is the standard MS rdesktop port
 
 
# For ordinary users it should not be necessary to change anything below this line. 
# Some functions:
usage()
{
    cat <<EOF
Usage:
    $program [options] rdesktop_hostname 
 
Make a remote desktop connection through an SSH tunnel.
 
Options: 
    -h, --help                                   print this help message
    -s, --steppingstone steppingstone_hostname   set other stepping stone host
                                                   (default: $steppingstone)
    -t, --timeout sec                            set timeout (default: 1)
    -v, --verbose                                verbose output
     --version                                   print version
 
Note that some customisations need to be made in the first few lines of this 
script (e.g. user names and other defaults)
EOF
}
 
program=`basename $0`
 
# Command line option parsing. Shift all options 
verbose=
timeout=1
 
while [ $# -gt 0 ]
do 
    case $1 in
	-v | --verbose | -d | --debug ) 
	    verbose=true
	    ;;
	--version )
	    echo '$Revision: 1.14 $'
	    exit 0
	    ;;
	-t | --timeout ) 
	    shift
	    timeout="$1"
	   if [ $timeout -lt 1 ]; then
	       timeout=1
	   fi
	   if [ $verbose ]; then
	       echo "Timeout set to $timeout"
	   fi
	   ;;
	-s | --steppingstone ) 
	   shift
	   steppingstone="$1"
	   if [ $verbose ]; then
	       echo "Steppingstone server is $steppingstone"
	   fi
	   ;;
	-h | --help ) 
	   usage
	   exit 0
	   ;;
	-*) 
	   echo "$0: invalid option $1" >&2
 	   usage
	   exit 1
	   ;;
	*) 
	   break
	   ;;
    esac
    shift
done
 
# Server name (as seen on the steppingstone) that we want to connect to:
rdesktop_server=$1 
 
################### Config done, let's get to work ########################
 
# Simple usage description
if [ "$rdesktop_server" == "" ]; then
    echo "Error: No rdesktop host given" >&2
    usage
    exit 1
fi
 
# Find a free port on the local machine that we can use to connect through
min_port=1234
max_port=1254
used_ports=(`netstat -tan | awk '{print $4}' | grep 127.0.0.1 | awk -F: '{print $2}' | sort -g`)
if [ $verbose ]; then
    echo "Used ports are: ${used_ports[@]}"
fi
 
# In the next line we first print the $used_ports as an array, but with 
# each port on a single line. This is then piped to an awk script that 
# puts all the values in an array and subsequently walks through all ports 
# from $min_port to $max_port in order to find the first port that is not 
# in the array. This port is printed.
local_port=`printf "%i\n" ${used_ports[@]} | \
    awk -v minp=$min_port -v maxp=$max_port \
    '{ array[$1]=1 } END { for (i=minp; i<=maxp; i++) { if (i in array) continue; else { print i; break } } }'`
if [ "$local_port" == "" ]; then
    echo "Error: No ports free! Exiting..." >&2
    exit 2
fi
if [ $verbose ]; then
    echo "Selected port was: $local_port"
fi
 
# Create tunnel:
if [ $verbose ]; then
    echo "Creating SSH tunnel..."
fi
ssh_opts="-f -N -L"
ssh $ssh_opts $local_port:$rdesktop_server:$rdesktop_port \
    $ssh_username@$steppingstone 
 
# Allow the ssh tunnel to be established
sleep $timeout
 
# Abort if tunnel has not been established
pidof_ssh=`pgrep -f "ssh $ssh_opts $local_port"`
if [ "$pidof_ssh" == "" ]; then
    echo "Error: Timeout while establishing tunnel" >&2
    echo "Exiting..."
    exit 3
fi
 
# Make rdesktop connection
if [ $verbose ]; then
    echo "Opening Remote desktop connection to $rdesktop_server..."
fi
rdesktop $rdesktop_opts -u $rdesktop_username -p - \
    -d $rdesktop_domain localhost:$local_port
 
# Clean up tunnel
if [ $verbose ]; then
    echo "Cleaning up SSH tunnel with pid $pidof_ssh and local port $local_port"
fi
kill $pidof_ssh

Related Images:

Newer posts »

© 2025 Lennart's weblog

Theme by Anders NorĂ©nUp ↑