Tiezheng Yuan Ph.D.: software: 4C-seq analysis using 3C-analyzer

Abstract: The pipeline of 'One-to-All' in 3C-analyzer is used for 4C-seq data analysis.

This pipeline includes 4 steps: Viewpoints locking, FASTQ trimming, Co-localization detection, and Co-localization counting (Figure 1). Click the run button to trigger running once all related parameters setup are finished.

Figure 1. GUI of 'One-to-All'.

Viewpoints locking
The parameters of the module 'Viewpoint locking' are set in this window (Figure 2). This module allows users to quickly get the chromosome positions of viewpoints, which depend on the chromosome positions of the primary enzyme site you select.

Genome sequences in FASTA: Select reference genome sequences in FASTA format.

4C-library: For 4C-seq, one 4C library will be one viewpoint. Thus, users have to select 4C libraries and set viewpoints one by one.
Upstream/downstream: Upstream or downstream sequences around the primary enzyme site.
Chromosome: chromosome of the interested genomic region.
Start of genomic region: the start of base pair of the interested genomic region.
End of genomic region: the end of base pair of the interested genomic region.
Length of viewpoints: the length of viewpoints. Higher values would increase the counting number of co-localized regions. But avoid overlapped viewpoints each other if there were multiple viewpoints in a genomic region.

Figure 2. GUI of 'Viewpoints locking' of the module 'One-to-All'.

The file known as site_info.csv would be generated by 3C-analyzer as Figure 3 showed.

Figure 3. Viewpoints required for 4C-seq analysis

Adapter trimming

Parameters of adapter trimming pipeline are set in this window (Figure 4).

Adapter sequences: The default adapter is consistent with the Illumina's sequencing kit.
Adapter matching: 3C-analyzer identify adapter sequences based on 8 exact matching or 12 matching read sequences with at most one mismatched base allowed.
Minimum query length: Sequences less than the minimum query length after adapter removal will be removed from sequence alignment.

Figure 4. GUI of adapter trimming.
Tips: We observed that less than 5 percent raw data should be trimmed with adapter sequences. Thus, users would skip this step for acceleration.

Co-localization detection

The third-party aligner (the default is Bowtie2) is required in this module (Figure 5). The options in Bowtie2 integrated into 3C-analyzer are listed below.

Figure 5. GUI of genome mapping.

Co-localization counting

Qualification is done in this step (Figure 6).

Reads background: Any read sequences determined by sequencing below the number of reads background is marked as none-detection and to be neglected.
Level of probability: The probability is determined by the exponential distribution density function.

Figure 6. GUI of co-localization counting.

Writing data: 2014.11.10, 2015.02.06

Tiezheng Yuan Ph.D.

Friday, February 6, 2015

software: 4C-seq analysis using 3C-analyzer

No comments:

Post a Comment