Tuesday, October 8, 2013

Convert VCF files to PLINK binary format

How many times have you needed to convert a VCF file to PLINK binary format? The 1000 Genomes project has a recommended tool (http://www.1000genomes.org/vcf-ped-converter) but it only works when converting small regions. Another easy to use tool is SnpSift (http://snpeff.sourceforge.net/SnpSift.html#vcf2tped).

Overall, it really isn't that complicated to convert a VCF file. The idea is to first convert it to tped format and then let plink do the job of converting that to binary format.

sudo apt-get install plink tabix
wget http://broadinstitute.org/~giulio/vcf2plink/vcf2tfam.sh
wget http://broadinstitute.org/~giulio/vcf2plink/vcf2tped.sh
chmod a+x vcf2tfam.sh vcf2tped.sh

The tools are fairly simple and they are supposed to be flexible. Now, let's suppose your VCF file is bgzip-comrpessed. A good idea might be to split it in chromosomes and generate one plink file for each chromosome. Here some code that will do that:

for chr in {1..22} X Y MT; do
  tabix -H $vcf | ./vcf2tfam.sh /dev/stdin > gpc.chr$chr.tfam
  tabix $vcf $chr | ./vcf2tped.sh /dev/stdin > gpc.chr$chr.tped
  plink --tfile gpc.chr$chr --make-bed --out gpc.chr$chr
  /bin/rm gpc.chr$chr.tfam gpc.chr$chr.tped gpc.chr$chr.nosex gpc.chr$chr.log
done

1 comment: