GAIN HOME PAGE

>Overview

>Committees

>Partnerships

>Initial Studies

>Policies

>Instructions for Applicants

>Genotyping & Analysis

          Genotyping Services

          DNA Sample Requirements

          Genotyping Platforms and SNP           Coverage

          Genotype Data Quality

          Data Analysis

>Analysis Workshops

>Accessing GAIN Data

>Press Releases

>GAIN Glossary

>GAIN FAQs

Updated 06/22/2007

GAIN Logo

 

 

 

GENOTYPE DATA QUALITY


The Genotyping Group makes recommendations on how to deal with various genotyping quality issues.  Several measures are being taken to ensure that the GAIN data are of high quality.

 

1.  Initial HapMap Samples:  Both centers (Perlegen Sciences Inc. and the Broad Institute) used their platform and SNPs on the 270 HapMap samples.  Both datasets are publicly available.  These genotype data provide information on the platforms and allow the data produced for each disease study to be integrated with related studies using other platforms. 


            Click Here for the Perlegen HapMap Genotype QC data

 

            Click Here for the Broad HapMap Genotype QC data

 

2.  QA Samples:  For each study, QA samples will be genotyped in addition to the study samples.  These samples will include standard HapMap samples already genotyped for 4 million SNPs, duplicate study samples, and (when available) the parents of some study samples.  These samples will provide information on data quality for each study, including information on the completeness rates of samples heterozygous or homozygous for each SNP, and confirmation of Mendelian inheritance of variants. 

 

For studies with all or a substantial number of mother-father-child trios, each plate of 96 samples will include 1 duplicate of a study sample (duplicates of different study samples for different plates) and one standard HapMap sample (for the studies genotyped by the Broad center), or half the plates will have a duplicate and half a HapMap sample (for the studies genotyped by the Perlegen center).  The HapMap sample will be chosen from a standard set of HapMap trios, and may differ among plates.

 

For studies without a substantial number of trios, each plate will include two parents of a study sample to form a trio (with a different trio on each plate), as well as 1 duplicate of a study sample and 1 standard HapMap sample (Broad) or 1 study duplicate on half the plates and 1 standard HapMap sample on half the plates (Perlegen).

 

3.  QC for Genotyping:  The samples will be genotyped in a way that maximizes data quality for interpretation of the association results.  For example, case and control samples will be on the same plates and done at the same time.  Plate layouts will differ to catch any sample mix-ups. 

 

4.  Data Released:  For each study, the data released will include the genotype calls with quality measures, genotype cluster data, and the cel files.  All the genotype data produced will be released.  Data that are considered bad will be flagged, but will still be available.  These bad data are useful to calibrate platforms and calling algorithms, and to search for real phenomena such as Hardy-Weinberg deviations or polymorphic insertions or deletions causing Mendelian inconsistencies.

 

5.  Data QA Pipeline:  When a genotyping vendor has genotyped the samples, it will send the data to NCBI to be put through a data quality assessment pipeline, which will provide information on the genotyping data completeness and quality.  This pipeline is being developed by Gonçalo Abecasis and implemented at NCBI.  Any issues that arise after the data are run through the pipeline will be resolved between the study principal investigator, the genotyping vendor, and NCBI.  When any issues have been resolved the genotype and phenotype data will be released.

 

6.  Genotype Data Quality Standards: Prior to genotyping a study set of samples, each genotyping vendor will perform a quality check to ensure they are suitable for genotyping.  If any DNA samples fail to meet the requirements for quantity, concentration, and quality at this stage, sample replacement will be worked out between the study principal investigator and the genotyping vendor.  When the production genotyping has been done, bad samples will be re-genotyped once.

 

The genotype data should meet and hopefully will exceed these quality standards:

 

Remove samples with fewer than 80% of the SNPs called.

 

Of > 480k SNPs for Perlegen and 500k SNPs for Broad, at least 90% of the SNPs will be good: 

§         any SNPs out of HW will not count as good (where “out of HW” means more significant than p = 0.001 for 2000 samples, but the p value will be adjusted for larger sample sizes that can produce statistically significant but not meaningful HW deviations);

§         the call rate minimum per SNP = 90% and the average across SNPs > 97%;

§         for HapMap QA samples the average call rates for heterozygotes and homozygotes are both > 97%; and

§         the concordance rate in duplicates is > 99.5%. 

 

 

 



Home   |    Site Map   |    Terms and Conditions   |    Privacy Policy   |    Contact Us   |    NIH   |    DHHS

©2003-2008 Foundation for the National Institutes of Health. All Rights Reserved.