Lennart's weblog

Open source, computers, Africa and other more (or less) interesting stuff.

Tag: open source (page 1 of 2)

DatABEL v0.9-6 has been published on CRAN

This morning version 0.9-6 of the DatABEL R package was published on CRAN. This is only a minor update that consists of a few small changes and one bug fix. See the official announcement for more information.

DatABEL is an R package that allows users to access files with large matrices (of several gigabytes or more in size) in a fast and efficient manner. The package is mainly used for genome-wide association analyses using e.g. ProbABEL or OmicABEL.

GNU manifesto turns 30

The New Yorker has a nice article about the GNU manifesto, which turned thirty earlier this month.
It nicely summarises what lead RMS to publish the manifesto and start the Free Software Foundation and also briefly explains the difference between Free Software and Open Source software.

ProbABEL v0.4.4 released

It was quite a long time in the making and then a bunch of other stuff came in between, but I finally managed to release v0.4.4 of ProbABEL!

ProbABEL is a toolset for doing fast, memory (RAM) efficient genome-wide regression tests.

This is a bugfix release, but a major one for those who use the Cox proportional hazards regression module. Thanks to some of our users on the GenABEL forum, a serious bug leading to way to many NaN’s in the output was discovered, fixed and tested. This is one of the best examples of community collaboration I have seen in the GenABEL project.

Another bug fixed in this release is one that caused a failed install on MacOS X and FreeBSD. Again a bug reported on the forum by one of our users. Great work!

Uploads to Debian and the Ubuntu PPA are coming ASAP.

Now, let’s get ready for a new feature release, which will include p-value calculation (a long-standing feature request) and major speed-ups (implemented by former colleague Maarten Kooyman). Time to get to work ;-)!

Using rsync to backup a ZFS file system to a remote Synology Diskstation

Some time ago I moved from using LVM to using ZFS on my home server. This meant I also had to change the backup script I used to make backups on a remote Synology Diskstation. Below is the updated script. I also updated it such that it now needs a single command line argument: the hostname of the Diskstation to backup to (because I now have two Diskstations at different locations). If you want to run this script from cron you should set up key-based SSH login (see also here and here).

#!/bin/bash
#
# This script makes a backup of my home dirs to a Synology DiskStation at
# another location. I use ZFS for my /home, so I make a snapshot first and
# backup from there.
#
# This script requires that the first command line argument is the
# host name of the remote backup server (the Synology NAS). It also
# assumes that the location of the backups is the same on each
# remote backup server.
#
# Time-stamp: <2014-10-27 11:35:39 (L.C. Karssen)>
# This script it licensed under the GNU GPLv3.
 
set -u
 
if [ ${#} -lt 1 ]; then
    echo -n "ERROR: Please specify a host name as first command" 1>&2
    echo " line option" 1>&2
    exit -1
fi
 
###############################
# Some settings
###############################
# Options for the remote (Synology) backup destination
DESTHOST=$1
DESTUSER=root
DESTPATH=/volume1/Backups/
DEST=${DESTUSER}@${DESTHOST}:${DESTPATH}
 
# Options for the client (the data to be backed up)
# ZFS options
ZFS_POOL=storage
ZFS_DATASET=home
ZFS_SNAPSHOT=rsync_snapshot
SNAPDIR="/home/.zfs/snapshot/$ZFS_SNAPSHOT"
 
# Backup source path. Don't forget to have trailing / otherwise
# rsync's --delete option won't work
SRC=${SNAPDIR}/
 
# rsync options
OPTIONS="--delete -azvhHSP --numeric-ids --stats"
OPTIONS="$OPTIONS --timeout=60 --delete-excluded"
OPTIONS="$OPTIONS --skip-compress=gz/jpg/mp[34]/7z/bz2/ace/avi/deb/gpg/iso/jpeg/lz/lzma/lzo/mov/ogg/png/rar/CR2/JPG/MOV"
EXCLUSIONS="--exclude lost+found --exclude .thumbnails --exclude .gvfs"
EXCLUSIONS="$EXCLUSIONS --exclude .cache --exclude Cache"
EXCLUSIONS="$EXCLUSIONS --exclude .local/share/Trash"
EXCLUSIONS="$EXCLUSIONS --exclude home/lennart/tmp/Downloads/*.iso"
EXCLUSIONS="$EXCLUSIONS --exclude home/lennart/.recycle"
EXCLUSIONS="$EXCLUSIONS --exclude _dev_dvb_adapter0_Philips_TDA10023_DVB*"
 
 
 
###############################
# The real work
###############################
 
# Create the ZFS snapshot
if [ -d $SNAPDIR ]; then
    # If the directory exists, another backup process may be running
    echo "Directory $SNAPDIR already exists! Is another backup still running?"
    exit -1
else
    # Let's make snapshots
    zfs snapshot $ZFS_POOL/$ZFS_DATASET@$ZFS_SNAPSHOT
fi
 
 
# Do the actual backup
rsync -e 'ssh' $OPTIONS $EXCLUSIONS $SRC $DEST
 
# Remove the ZFS snapshot
if [ -d $SNAPDIR ]; then
    zfs destroy $ZFS_POOL/$ZFS_DATASET@$ZFS_SNAPSHOT
else
    echo "$SNAPDIR does not exist!" 1>&2
    exit 2
fi
 
exit 0

Richard Stallman’s Free Software TEDx talk

If you’re interested in a short (14 minute) talk on what Free Foftware is all about, have a look at Richard Stallman’s TEDx Geneva presentation. It’s an excellent introduction by the master himself!


Note: the video above is only half the size of the original because I wanted it to fit the width of this blog.

ProbABEL v0.4.2 released

During the Christmas holidays I released a new version of ProbABEL (v0.4.2). The official release announcement can be found here. ProbABEL is a toolset that allows running GWAS (Genome-Wide Association Studies) in a fast and efficient manner. It implements regression using the linear, logistic or Cox proportional hazards models.

This version is mostly a bug fix release. The most important user-visible change is the fact that the ‘official’ name for the wrapper script that runs a GWAS over a range of chromosomes is now called probabel instead of probabel.pl. This change was induced by my attempts to get ProbABEL packaged in the Debian Linux repositories. One of the warnings that occurred during the package creation process was a Lintian warning that said that scripts with ‘language extensions’ are not allowed. There are several reasons for that, but the one I found most compelling was the fact that the user shouldn’t be concerned with the programming/scripting language we used to write it in. Moreover, being ‘agnostic’ in this matter also allows us to write such a script in a different language.
Of course, we have left the original name in place (via a symlink) in order not to disrupt any current pipelines. If the user runs the script with the old name a warning appears, urging him/her to start using the new name and that the old name will be deprecated in the future.

In the mean time, ProbABEL v0.4.1 has been accepted in Debian (unstable) and as of today it is also available in Debian ‘testing’. Lots of thanks to the Debian Med team that helped me a lot in preparing the .deb package. Note that the package has been split up in probabel (architecture-dependent files) and probabel-examples (with architecture independent files: the examples). See the Debian Package Tracking System page for ProbABEL for more details of the package.

From Debian the package has trickled down to Ubuntu as well (Launchpad page here), so it will be available by default in the next Ubuntu release (14.04, a.k.a. Trusty Tahr).

ProbABEL v0.4.1 released

Last week I released v0.4.1 of ProbABEL, just a few days after releasing v0.4.0, which contained a small, but irritating bug.

This release took quite some time to create, but features quite a few bug fixes, including a major one: for the first time since the filevector format was introduced somewhere in 2009/2010, the Cox proportional hazards regression module works with filevector/DatABEL files. This is a major step forward, because up till now we had to maintain two branches of code: trunk, with all the regular updates and improvements, and the old branch that contained the Cox PH module that was only capable of reading text files.

Another notable change is the incorporation of [latex]\chi^2[/latex] values in the output files. At the moment these are based on the LRT (likelihood ratio test), except where that doesn’t make sense (e.g. when using the --mmscore option. The implementation was relatively easy, because part of the code was still there from previous versions; it was disabled however, because it didn’t deal with missing genotype data. Now it does. Using the LRT is also easier in the case of the 2df (or genotypic) genetic model, where using the Wald test is not straightforward.

The third user-visible change was a change in the [code]probabel.pl[/code] script that hides some of the details (e.g. the location of the files with genotype data) of running a regression for the user. Previously, using the -o option meant that the output file name was constructed from the name of the phenotype file, the argument of the -o option and a fixed extension that depends on the model(s) being run. Starting with v.0.4.0 this behaviour has changed. If the -o option is specified its argument is used as the start of the output file name, with only the fixed extension appended to it. This allows users to specify output in a different directory than the one where the phenotype file was created.

Packages for Ubuntu Linux (or one of its derivatives and probably also Debian) can be found in the GenABEL PPA (personal package archive). Previously we also released pre-compiled Windows binaries, but I’ve stopped doing that. They were never tested anyway, and I think there isn’t much demand for them anyway. Most people who do genome-wide association studies use Linux servers anyway.

Development of ProbABEL (and other members of the GenABEL suite) takes place on this R-forge page. If you are in search of an open source project to contribute to, feel free to contact us!

User support for the GenABEL suite can be found at our forum.

ProbABEL v0.3.0 released

On New Year’s day I released version 0.3.0 of ProbABEL, almost two months after the previous release.

This update contains a few small bug fixes, but the most important feature of this new release is that thanks to the work of Maarten Kooyman we have a four to five-fold speed increase for the types of GWAS we run at work. In his e-mail to the GenABEL developers list he explains what he did to achieve this. The take-home-message of it is that you should always look for a suitable library for important tasks of any program you write. The old ProbABEL was based on a self-written matrix class that handled things like matrix multiplication and matrix subsetting. In the new release we make use of the Eigen C++ template library, maintained and developed by people who know much more about fast implementations of linear algebra than we do.

For those of you running Ubuntu Linux (or one of its derivatives and probably also Debian) I have set up the GenABEL PPA (personal package archive) where you can download and install the ProbABEL .deb package and stay up to date with future updates.
ProbABEL is also available for MS Windows, although we don’t have much experience running it on that platform.

Development of ProbABEL (and other members of the GenABEL suite) takes place on this R-forge page. If you are in search of an open source project to contribute to, feel free to contact us!

User support for the GenABEL suite can be found at our forum.

The Raspberry Pi runs ProbABEL

One of the first things I tried on my Raspberry Pi was to compile ProbABEL and see if it runs. Since the Raspberry Pi has an ARM processor I wasn’t sure whether our code was portable to it. Apparently it is! Compiling ProbABEL (r.1027 from SVN) took 30 minutes (single threaded of course) compared to 34 seconds on my Desktop (4 threads on an Intel Core i3 processor), but hey, it worked :-). Surprisingly it also passed all the checks in make check.

Once I hook up some more storage to device I will try to run ProbABEL on some real data. It will be interesting to see how much time it takes to run a linear regression on e.g. chromosome 22 of HapMap3 imputated data for a few hundred samples…

Will the Raspberry Pi be the next platform for GWAS ;-)?

ProbABEL 0.2.2 released

On November 7th I released version 0.2.2 of ProbABEL, a set of programs that allow scientists (usually geneticists and epidemiologists) to run Genome-wide association studies (GWAS) in a fast and efficient way, even on machines with low amounts of RAM.

ProbABEL is part of the GenABEL suite, wich is a set of open source package for statistical genomics. Its main developer is Yurii Aulchenko, my former supervisor at the Erasmus Medical Centre.

This update contains a few small bug fixes and an update of the probabel.pl wrapper script that enables the use of chunked imputation output files as input. For more detailed changes, check the announcement.
For those of you running Ubuntu Linux (or one of its derivatives and probably also Debian) I have set up the GenABEL PPA (personal package archive) where you can download and install the ProbABEL .deb package and stay up to date with future updates.
ProbABEL is also available for MS Windows, although we don’t have much experience running it on that platform.

Development of ProbABEL (and other members of the GenABEL suite) takes place on this R-forge page. If you are in search of an open source project to contribute to, feel free to contact us!

User support for the GenABEL suite can be found at our forum.

Older posts

© 2018 Lennart's weblog

Theme by Anders NorenUp ↑