|
td>
|
Welcome to PCAtag Home Page!
To be able to comprehensively test the role
of candidate genes in association studies the selection of
informative SNPs is paramount.
Specifically,
it is important to select tagging-SNPs (tSNPs) that represent a
large portion (>90%) of the genetic variation of a gene.
Here we describe a new software tool, PCAtag
,
that performs tSNP selection using principal component analysis
(PCA) as
described in Horne and Camp (2004). The
advantage of PCA analysis for tSNP selection is that LD groups do
not need to be contiguous and can be overlapping. This flexible
framework does not impose over-simplified assumptions on the genetic
architecture structure, and likely fits reality much
better.
Algorithms
-
Bayesian method for reconstructing haplotypes is
used by interfacing with the software fastPHASE (Stephens et al 2006).
- Principal Component Analysis (PCA)
using a varimax rotation is
performed by interfacing with
the FactoMineR add-on package available in 'R'.
- Procedure for determining LD groups and tSNP selection
follow from the two step PCA method outlined in Horne
and Camp (2004) into multi-step PCA.
Novel
Features
Genotype Data:
-
One issue in performing
the based on haplotype data is
that haplotypes are not directly observed and must be
estimated.
- PCAtag has an option to perform the
PCA based on genotype data directly.
Phenotype Data:
-
Allele frequencies, haplotype
frequencies and LD structure may differ between cases and
controls.
- If phenotype data (or any
dichotomous
subset criteria) is entered, tagging will be performed
in the cases and controls separately, as well as
together.
- Knowledge of
such difference at tSNP stage
will allow for more powerful subsequent association analyses.
|
fastPHASE Information and links:
fastPHASE is the
program
that implements a Bayesian statistical
method for reconstructing haplotypes from population
genotype data.
Developer
Page
|
|
|