hapConstructor .rgen Parameters Description

This is an XML file and it uses a DTD file, called ge-rgen.dtd to describe all of the data for the analysis. All elements in this file starts with "ge:". This parameter file has a root element rgen, and a number of sub-elements and attributes. It can be described in two parts. The first part of the file is for setting up analysis parameters and the second part of the file defines the inheritance models to be analyzed. See general .rgen parameter file description for all other attribute descriptions.

Analysis Parameters (First Part)

The following table describes all of the required attributes and their values for root element when using hapConstructor rgen. All attribute values should be enclosed in " ".

Attribute

Att Value

Description

rseed number Random number generator seed value. Specify rseed="random" to have program randomly generate a seed value.
nsims number Number of simulations
top classname Use HapMCTopSeparate
drop classname Use HapMCDropSeparate
report classname Report options; default is standard report(rgen_filename.report) with full tables and detail output. Specify report="summary" for an Ascii space-delimited file (rgen_filename.summary) of results including seed value, specified statistics, corresponding p-values, and 95% confidence intervals for odds ratios for each data file followed by meta statistics, if requested. Specify report="both" to generate standard and summary reports.

The following table describes the sub-element locus and its attributes and values.

Attribute

Att Value

Description

id number The locus id number in the data file
marker name Allows user to attach a marker name to the locus id
dist number Allows user to enter a recombination fraction or a distance between a marker and the proceeding marker. If the dist value is ≤0.5, the value is assumed to be a recombination fraction. If the dist value is >0.5, then the distance between the marker and the proceeding marker is assumed to be in cM

The following table describes the sub-element datafile and its attributes and values.

Attribute

Att Value

Description

studyname name Allows user to attach a study name to the genotype data file.
genotypedata name The directory path and genotype data file name for analysis. Specify each genotype data file with a separate datafile statement.

The following table describes the sub-element param and its attributes and values.

Attribute

Att Value

Variable

Description

name ccstat# classname Statistical programs. You can run multiple statistics on the same set of data. Each statistic should have a different ccstat#
name metastat# classname Meta statisitcs for multiple study data files. Each meta statistic should have a different metastat#.
name dumper class name The dumper class for dumping simulated data.
name top-sample all/founder Method for calculating allele frequency for assignment to the pedigree founders for simulation. Two options: all, calculates allele frequencies based on all genotyped members in the pedigree data file, or founder, calculates allele frequencies on genotyped founders only. We recommend the all option if there are a large number of pedigrees and the number of genotyped founders in the resource is limited.
name hapc_threshold 0.1, 0.05, 0.005, 0.0005 A single or list of values to specify the threshold for the p-values by which SNP sets move to the next step.
name hapc_sigtesting true/false Option to use the Monte Carlo framework to establish the significance of the models found from the build process using the observed data. If true, the simulated datasets will go through the same build process as the observed data and run them through the same build process and track the p-values generated from all the runs to establish FDR and empirical p-values. This option is by default turned off.
name hapc_backsets true/false Option for testing association with SNP backsets. Backsets are the locus subsets in a set that were not tested in the previous step. This option is more exhaustive in the search, and could considerably affect the run time.
name hapc_models HAdd,HRec,HDom,
MG,HG,CGG,CG,
MSpecRed
Option for specifying the models to construct for the haplotypes. See description page for more details about models. HAdd/Rec/Dom = haplotype additive,recessive, dominant; CGG = composite genotype global, HG = haplotype global, MG = monotype global, CG = composite genotypes (Dom and Rec combinations), MSpecRed = monotype specific reduction (specific haplotypes compared to the rest)
name hapc_check_mostsignificant true/false Option for specifying whether the building process will stop once the most significant empirical p-value has been obtained from a test. If it is set to true it will check for the most significant p-value result and stop if found, otherwise it will continue to build. For example, if this option is set to true and 1,000 Monte Carlo simulations are used to establish the empirical p-values for the association tests and a test at the first step obtained a p-value of 0.001, then the build process would not continue to the second step. The default is set to true.
name hapc_screen true/false Option for specifying whether the haplotype additive diplotype tests are used a screen before performing haplotype dominant diplotype and haplotype recessive diplotype tests with the haplotypes that passed the screen. The default value for this option is false. Note that if this option is set to true the HAdd model option must also be included in the model options. The user is also recommended to set the screen test p-value threshold values. These threshold p-values work in a similar fashion to the step threshold values. If a screen test were to pass the specified threshold value, then the haplotype dominant and/or recessive models would then be considered for the same haplotype used in the haplotype additive test.
name hapc_screenthreshold list of p-values If the hapc_screen option is set to true the user can also specify the threshold values for the screen tests at each step. These would determine if the haplotype additive test result was sufficient to then consider the haplotype dominant and haplotype additive diplotype tests. Note that these threshold values do not replace the hapc_threshold values and would need to be specified along with the hapc_threshold values. If any of the haplotype additive, recessive, or dominant tests were to pass the overall threshold value for the corresponding level the loci set would move to the next step for testing.

List of available statistical programs and their class names

Statistic

Class Name

Chi Squared ChiSquared
Chi Squared Trend ChiSquaredTrend
Odds Ratio OddsRatios
CMH Chi Squared (meta) CMHChiSquared
CMH Chi Squared Trend (meta) CMHChiSqTrend
Meta Odds Ratio MetaOddsRatios
 

Subset Analyses (Second Part)

The second part of the .rgen parameter file defines the subset analyses and the models to be analyzed. Users may enter markers to be tested separately (i.e., a single locus at a time approach, where each marker is assumed to be in linkage equilibrium with other markers), as well as testing markers jointly in a composite genotype or haplotype analysis.
cctable has a sub-element col, or column definition. Within the col, the user can optionally assign a weight, wt, to a particular column. Thus, wt is an attribute of col and the value of wt is defined to be a number . The col has a further sub-element g, or allele group. The g has a further sub-element a, or allele definition. The a defines the genetic pattern to be tested in PedGenie at a single locus. Each a corresponds to a locus defined in the sub-element locus. All of the a's are grouped together into a single g, the g's are grouped together into a single col, and optionally weighted, wt. If more than one group, g, is in the col, an "or" regular expression will apply to all of the groups for testing in the column, col.
The following table describles the element cctable, its optional attributes and values.

Attribute

Att Value

Description

loci number(s) Allows user to specify the locus or loci for a subset analysis based on the locus id number. Default is all loci.
stats number(s) Allows user to define which statistics to run for a particular subset analysis. The stats number is selected from the list of ccstat#'s. Default is all ccstat.
meta number(s) Allows user to define which meta statistics to run for a particular subset analysis. The meta number is selected from the list of metastat#'s. Default is all metastat.
model text Allows user to define a model for a subset analysis. Model name will be printed in the report for a particular analysis.
type text Allows user to specify the type of analyze, Genotype or Allele for this subset of data, default value is "Genotype". If user specified type="Allele", a single allele code should be entered as the variable for the sub-element a, and each a corresponds to a locus. Default is type="Genotype".
 

Single locus at a time analysis approach

HapConstructor begins by considering single locus analyses, and constructing and testing haplotypes based upon the p-values generated. The single locus analyses are constructed as with analyses using PedGenie. One requirement is to use the correct model names for each table built. The model names are: Dom, Rec, Additive, Allele

Model

Wt = 0

Wt = 1

Wt = 2

Dominant   (1/1)   (1/2), (2/1), or (2/2)     
Recessive (1/1), (1/2), or (2/1) (2/2)  
Additive (1/1) (1/2) or (2/1) (2/2)
The weights may be modified to be any integer value. For programming purposes, a (1/.) indicates a genotype of 1 and any other value. Thus for this biallelic model, the code (1/.) will pull (1/1) and (1/2) genotype data. Care must be taken to ensure that this file has no errors. Please see the SingleLocus.rgen for the format of this file.

Home   hapConstructor example Files