My watch list
my.bionity.com  
Login  

Phred quality score



  DNA sequencing is a molecular biology technique that involves labeling the DNA with fluorescent dyes, separating the DNA by electrophoresis, and measuring the intensity of the fluorescence to determine the base order of individual nucleotides in a strand of DNA. When sequencing is performed by electrophoretic techniques a trace file can be obtained. Sequencing machines generating the trace file guess the DNA sequence based on the trace.

Automated DNA sequencing techniques have revolutionized the field of molecular biology- generating vast amounts of DNA sequence data. However, high throughput sequence data is produced at a significantly higher rate than can be processed, creating a bottleneck. To remove the bottleneck both improved accuracy of data processing software and reliable measures of that accuracy are needed. Many software programs have been developed to meet this need. One such program is Phred.

Phred is a base-calling program for automated sequencer traces [1]. Phred is considered to produce significantly fewer errors in the data sets examined than other methods, averaging 40%-50% fewer errors. Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods.

Contents

History

Phred was originally developed in the late 90s by Dr. Phil Green and Dr. Brent Ewing for the Human Genome Project, where large amounts of sequence data were processed by automated scripts. Phred is currently the most widely used basecalling software program by both academic and commercial DNA sequencing laboratories because of its high base calling accuracy [2].

Methods

Phred uses a four-phase procedure as outlined by Ewing et al. [3] to determine a sequence of base calls from the processed DNA sequence tracing: 1. Predicted peak locations are determined, based on the assumption that fragments are relatively evenly spaced, on average, in most regions of the gel, to determine the correct number of bases and their idealized evenly spaced locations in regions where the peaks are not well resolved, noisy, or displaced (as in compressions) 2. Observed peaks are identified in the trace 3. Observed peaks are matched to the predicted peak locations, omitting some peaks and splitting others; as each observed peak comes from a specific array and is thus associated with 1 of the 4 bases (A, G, T, or C), the ordered list of matched observed peaks determines a base sequence for the trace. 4. The unmatched observed peaks are checked for any peak that appears to represent a base but could not be assigned to a predicted peak in the third phase and if found, the corresponding base is inserted into the read sequence. The entire procedure is rapid, usually taking less than half a second per trace.

Applications

Phred is often used together with another software program called Phrap, which is a program for DNA sequence assembly. Phrap was routinely used in some of the largest sequencing projects in the Human Genome Sequencing Project and is currently one of the most widely used DNA sequence assembly programs in the biotech industry. Phrap uses Phred quality scores to determine highly accurate consensus sequences and to estimate the quality of the consensus sequences. Phrap also uses Phred quality scores to estimate whether discrepancies between two overlapping sequences are more likely to arise from random errors, or from different copies of a repeated sequence.

Reliability

Phred quality scores are logarithmically related to error probabilities. For example, if Phred assigns a quality score of 30 to a base, the chances that this base is called incorrectly are 1 in 1000. The most commonly used method is to count the bases with a quality score of 20 and above. The high accuracy of Phred quality scores make them an ideal tool to assess the quality of sequences.

Table 1: Phred quality scores are logarithmically linked to error probabilities.

Quality of Phred Score Probability of incorrect base call Base call accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10000 99.99%
50 1 in 100000 99.999%

References

  1. ^ dkjfgsdj
  2. ^ dkjfgsdj
  3. ^ dkjfgsdj

1. Ewing B., Green P. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 1998; 8:186–194.

2. Richterich P. (1998) Estimation of Errors in “Raw” DNA Sequences: A Validation Study. Genome Res. 8(3): 251-259.

3. Ewing B, Hillier L, Wendl MC, Green P. (1998). Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Research 8:175-85.

 
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Phred_quality_score". A list of authors is available in Wikipedia.
Your browser is not current. Microsoft Internet Explorer 6.0 does not support some functions on Chemie.DE