Abstract: The
base quality provide important information of sequencing quality.
The base quality is
taken as base calling accuracy measured by the Phred quality score (Q
score), which is defined as a property that is logarithmically
related to the base calling error probabilities:
Q = -10 * log10
(P)
Table 1: Q scores
Phred Quality Score |
Error |
Accuracy (1 - Error) |
---|---|---|
10 |
1/10 = 10% |
90% |
20 |
1/100 = 1% |
99% |
30 |
1/1000 = 0.1% |
99.9% |
40 |
1/10000 = 0.01% |
99.99% |
50 |
1/100000 = 0.001% |
99.999% |
60 |
1/1000000 = 0.0001% |
99.9999% |
Illumina sequencing
by synthesis (SBS) technology uses four fluorescently labelled
nucleotides to sequencing billions of clusters on the flowcell
surface. Base calls are made from signal intensity measurements
during each sequencing cycle. A higher Q score indicates a higher
probability that this base decision is correct, or a lower score
indicates a higher probability that that decision is incorrect (Figure 1).
Figure
1. Boxplot of Q scores of 50 cycles
There are many
factors related to Q scores, including quality of RNA libraries, GC
contents of sequences, or number of sequencing cycles. Usually Q
score of 30 is considered a benchmark for acceptable quality in
high-throughput sequencing (Figure 2).
Figure 2. Histogram of Q scores
Writing data:
2014.05.02, 2015.02.09
No comments:
Post a Comment