Analysis Parameters (First Part) |
| The following table describes all of the required attributes and their values for root element when using hapConstructor rgen. All attribute values should be enclosed in " ". |
|
|
|
Attribute |
Att Value |
Description |
| rseed |
number |
Random number generator seed value. Specify rseed="random" to have program randomly generate a seed value. |
| nsims |
number |
Number of simulations |
| top |
classname |
Use HapMCTopSeparate |
| drop |
classname |
Use HapMCDropSeparate |
| report |
classname |
Report
options; default is standard report(rgen_filename.report) with full
tables and detail output. Specify report="summary" for an Ascii
space-delimited file (rgen_filename.summary) of results including seed
value, specified statistics, corresponding p-values, and 95% confidence
intervals for odds ratios for each data file followed by meta
statistics, if requested. Specify report="both" to generate standard
and summary reports. |
|
|
| The following table describes the sub-element locus and its attributes and values.
|
|
|
|
Attribute |
Att Value |
Description |
| id |
number |
The locus id number in the data file |
| marker |
name |
Allows user to attach a marker name to the locus id |
| dist |
number |
Allows user to enter a recombination fraction or a distance between a marker and the proceeding marker. If the dist value is ≤0.5, the value is assumed to be a recombination fraction. If the dist value is >0.5, then the distance between the marker and the proceeding marker is assumed to be in cM |
|
|
| The following table describes the sub-element datafile and its attributes and values. |
|
|
|
Attribute |
Att Value |
Description |
| studyname |
name |
Allows user to attach a study name to the genotype data file. |
| genotypedata |
name |
The directory path and genotype data file name for analysis. Specify each genotype data file with a separate datafile statement. |
|
|
| The following table describes the sub-element param and its attributes and values.
|
|
|
|
|
Attribute |
Att Value |
Variable |
Description |
| name |
ccstat# |
classname |
Statistical programs. You can run multiple statistics on the same set of data. Each statistic should have a different ccstat# |
| name |
metastat# |
classname |
Meta statisitcs for multiple study data files. Each meta statistic should have a different metastat#. |
| name |
dumper |
class name |
The dumper class for dumping simulated data. |
| name |
top-sample |
all/founder |
Method for calculating allele frequency for assignment to the pedigree founders for simulation. Two options: all, calculates allele frequencies based on all genotyped members in the pedigree data file, or founder, calculates allele frequencies on genotyped founders only. We recommend the all option if there are a large number of pedigrees and the number of genotyped founders in the resource is limited. |
| name |
hapc_threshold |
0.1, 0.05, 0.005, 0.0005 |
A single or list of values to specify the threshold for the p-values by which SNP sets move to the next step. |
| name |
hapc_sigtesting |
true/false |
Option
to use the Monte Carlo framework to establish the significance of the
models found from the build process using the observed data. If true,
the simulated datasets will go through the same build process as the
observed data and run them through the same build process and track the
p-values generated from all the runs to establish FDR and empirical
p-values. This option is by default turned off. |
| name |
hapc_backsets |
true/false |
Option
for testing association with SNP backsets. Backsets are the locus
subsets in a set that were not tested in the previous step. This option
is more exhaustive in the search, and could considerably affect the run
time. |
| name |
hapc_models |
HAdd,HRec,HDom, MG,HG,CGG,CG, MSpecRed |
Option
for specifying the models to construct for the haplotypes. See
description page for more details about models. HAdd/Rec/Dom =
haplotype additive,recessive, dominant; CGG = composite genotype
global, HG = haplotype global, MG = monotype global, CG = composite
genotypes (Dom and Rec combinations), MSpecRed = monotype specific
reduction (specific haplotypes compared to the rest) |
| name |
hapc_check_mostsignificant |
true/false |
Option
for specifying whether the building process will stop once the most
significant empirical p-value has been obtained from a test. If it is
set to true it will check for the most significant p-value result and
stop if found, otherwise it will continue to build. For example, if
this option is set to true and 1,000 Monte Carlo simulations are used
to establish the empirical p-values for the association tests and a
test at the first step obtained a p-value of 0.001, then the build
process would not continue to the second step. The default is set to
true. |
| name |
hapc_screen |
true/false |
Option
for specifying whether the haplotype additive diplotype tests are used
a screen before performing haplotype dominant diplotype and haplotype
recessive diplotype tests with the haplotypes that passed the screen.
The default value for this option is false. Note that if this option is
set to true the HAdd model option must also be included in the model
options. The user is also recommended to set the screen test p-value
threshold values. These threshold p-values work in a similar fashion to
the step threshold values. If a screen test were to pass the specified
threshold value, then the haplotype dominant and/or recessive models
would then be considered for the same haplotype used in the haplotype
additive test. |
| name |
hapc_screenthreshold |
list of p-values |
If
the hapc_screen option is set to true the user can also specify the
threshold values for the screen tests at each step. These would
determine if the haplotype additive test result was sufficient to then
consider the haplotype dominant and haplotype additive diplotype tests.
Note that these threshold values do not replace the hapc_threshold
values and would need to be specified along with the hapc_threshold
values. If any of the haplotype additive, recessive, or dominant tests
were to pass the overall threshold value for the corresponding level
the loci set would move to the next step for testing. |
|
|
List of available statistical programs and their class names |
|
|
Statistic |
Class Name |
| Chi Squared |
ChiSquared |
| Chi Squared Trend |
ChiSquaredTrend |
| Odds Ratio |
OddsRatios |
| CMH Chi Squared (meta) |
CMHChiSquared |
| CMH Chi Squared Trend (meta) |
CMHChiSqTrend |
| Meta Odds Ratio |
MetaOddsRatios |
|
|
|
Subset Analyses (Second Part) |
| The
second part of the .rgen parameter file defines the subset analyses and
the models to be analyzed. Users may enter markers to be tested
separately (i.e., a single locus at a time approach, where each marker
is assumed to be in linkage equilibrium with other markers), as well as
testing markers jointly in a composite genotype or haplotype analysis. |
| cctable has a sub-element col, or column definition. Within the col, the user can optionally assign a weight, wt, to a particular column. Thus, wt is an attribute of col and the value of wt is defined to be a number . The col has a further sub-element g, or allele group. The g has a further sub-element a, or allele definition. The a defines the genetic pattern to be tested in PedGenie at a single locus. Each a corresponds to a locus defined in the sub-element locus. All of the a's are grouped together into a single g, the g's are grouped together into a single col, and optionally weighted, wt. If more than one group, g, is in the col, an "or" regular expression will apply to all of the groups for testing in the column, col. |
| The following table describles the element cctable, its optional attributes and values. |
|
|
|
|
Attribute |
Att Value |
Description |
| loci |
number(s) |
Allows user to specify the locus or loci for a subset analysis based on the locus id number. Default is all loci. |
| stats |
number(s) |
Allows
user to define which statistics to run for a particular subset
analysis. The stats number is selected from the list of ccstat#'s.
Default is all ccstat. |
| meta |
number(s) |
Allows
user to define which meta statistics to run for a particular subset
analysis. The meta number is selected from the list of metastat#'s.
Default is all metastat. |
| model |
text |
Allows user to define a model for a subset analysis. Model name will be printed in the report for a particular analysis. |
| type |
text |
Allows user to specify the type of analyze, Genotype or Allele
for this subset of data, default value is "Genotype". If user specified
type="Allele", a single allele code should be entered as the variable
for the sub-element a, and each a corresponds to a locus. Default is type="Genotype". |
|
| |
Single locus at a time analysis approach |
| HapConstructor
begins by considering single locus analyses, and constructing and
testing haplotypes based upon the p-values generated. The single locus
analyses are constructed as with analyses using PedGenie. One
requirement is to use the correct model names for each table built. The
model names are: Dom, Rec, Additive, Allele |
Model |
Wt = 0 |
Wt = 1 |
Wt = 2 |
| Dominant |
(1/1) |
(1/2), (2/1), or (2/2) |
|
| Recessive |
(1/1), (1/2), or (2/1) |
(2/2) |
|
| Additive |
(1/1) |
(1/2) or (2/1) |
(2/2) |
|
| The
weights may be modified to be any integer value. For programming
purposes, a (1/.) indicates a genotype of 1 and any other value. Thus
for this biallelic model, the code (1/.) will pull (1/1) and (1/2)
genotype data. Care must be taken to ensure that this file has no
errors. Please see the SingleLocus.rgen for the format of this file. |