Monday, February 9, 2015

NGS: Base Quality of Sequencing


Abstract: The base quality provide important information of sequencing quality.


The base quality is taken as base calling accuracy measured by the Phred quality score (Q score), which is defined as a property that is logarithmically related to the base calling error probabilities:

Q = -10 * log10 (P)
Table 1: Q scores
Phred Quality Score
Error
Accuracy (1 - Error)
10
1/10 = 10%
90%
20
1/100 = 1%
99%
30
1/1000 = 0.1%
99.9%
40
1/10000 = 0.01%
99.99%
50
1/100000 = 0.001%
99.999%
60
1/1000000 = 0.0001%
99.9999%

Illumina sequencing by synthesis (SBS) technology uses four fluorescently labelled nucleotides to sequencing billions of clusters on the flowcell surface. Base calls are made from signal intensity measurements during each sequencing cycle. A higher Q score indicates a higher probability that this base decision is correct, or a lower score indicates a higher probability that that decision is incorrect (Figure 1).



Figure 1. Boxplot of Q scores of 50 cycles

There are many factors related to Q scores, including quality of RNA libraries, GC contents of sequences, or number of sequencing cycles. Usually Q score of 30 is considered a benchmark for acceptable quality in high-throughput sequencing (Figure 2).




Figure 2. Histogram of Q scores


Writing data: 2014.05.02, 2015.02.09

No comments:

Post a Comment