PCAtag: Software for Selecting Tagging-SNPs using Principal Component Analysis

PCAtag Download

 

Get PCAtag - This is an alpha version of the PCAtag program. 

The package includes fastPHASE and R. There is no need to download them separately. 

PCAtag Installation Instructions

PCAtag Documentation

 

 

 

Welcome to PCAtag Home Page!

To be able to comprehensively test the role of candidate genes in association studies the selection of informative SNPs is paramount.
Specifically, it is important to select tagging-SNPs (tSNPs) that represent a large portion (>90%) of the genetic variation of a gene.
Here we describe a new software tool, PCAtag , that performs tSNP selection using principal component analysis (PCA)
as described in Horne and Camp (2004). The advantage of PCA analysis for tSNP selection is that LD groups do not need to be contiguous and can be overlapping. This flexible framework does not impose over-simplified assumptions on the genetic architecture structure, and likely fits reality much better.

Algorithms

  • Bayesian method for reconstructing haplotypes is used by interfacing with the software fastPHASE (Stephens et al 2006).
  • Principal Component Analysis (PCA) using a varimax rotation is performed by interfacing with the FactoMineR add-on package available in 'R'.
  • Procedure for determining LD groups and tSNP selection follow from the two step PCA method outlined in Horne and Camp (2004) into multi-step PCA.

Novel Features

   Genotype Data:

  • One issue in performing the based on haplotype data is that haplotypes are not directly observed and must be estimated.
  • PCAtag has an option to perform the PCA based on genotype data directly.

   Phenotype Data:

  • Allele frequencies, haplotype frequencies and LD structure may differ between cases and controls.
  • If phenotype data (or any dichotomous subset criteria) is entered, tagging will be performed in the cases and controls separately, as well as together.
  • Knowledge of such difference at tSNP stage will allow for more powerful subsequent association analyses.

 

fastPHASE Information and links:

fastPHASE is the program that implements a Bayesian statistical method for reconstructing   haplotypes from population genotype data.  

 Developer Page 

                       

 

PCAtag web page - Created on October 20 2007