Abstract:
3C-analyzer,
fulfilled all analytic workflows from raw sequencing data to
significance detection of chromatin interactions, and provided a user
friendly interface for data management and analysis.
Chromosome
conformation capture (3C) technology has been widely used to map
physical proximity between two genomic regions in the nucleus.
Initially reported by Dekker et al. in 2002 , the 3C procedure is
involved in restriction enzyme digestion, inter-/intra-molecular
ligation of cross-linked chromatin and quantification of ligation
frequencies between two genomic loci by quantitative polymerase chain
reaction (qPCR). The ligation frequency or cross-link frequency
reveals DNA contact possibility between non-neighbouring genomic
regions and gives insight into chromosome topology. Characterization
of cross-link frequency in conventional 3C technology requires prior
knowledge of the interacting partners between two genomic regions
(one vs. one). Only those interactions between two pre-selected
genomic loci can be tested for interactions due to low throughput
nature of qPCR. To overcome 3C-qPCR limitations, various 3C-derived
technologies have been developed to explore unknown interactions
across whole genome including 4C (chromosome conformation
capture-on-chip and circular chromosome conformation capture ), 5C
(chromosome conformation capture carbon copy) , Hi-C , Capture-C ,
and 3C-MTS (3C-based multiple target sequencing) , T2C (Targeted
Chromatin Capture) , and Capture Hi-C . For the ability on the
detection of cross-linking ligation events in an experiment, Hi-C is
able to detect all chromatin interactions theoretically (all vs.
all), but inadequate sequencing depth of Hi-C often results in loss
of resolution or coverage due to the huge chromatin interactome.
4C-seq has demonstrated an excellent resolution on the genome-wide
interactions, but only one specific locus can be screened in a single
experiment (one vs. all). 3C-MTS, Capture-C, T2C or Capture Hi-C are
recently developed to detect chromatin interactions of many genomic
loci with other regions through the whole genome (many vs. all). To
date, no 3C-based technology exceeds others on all aspects of the
detection of chromatin interactions.
To
facilitate 3C-seq data analysis, we developed the graphic user
interface (GUI)-based 3C-analyzer. The user manual was packaged into
the published 3C-analyzer package. Compared with the previous
software packages, 3C-analyzer provided some unique features
including user-friendly experience and the ability of high-throughput
processing.
User-friendly
experience
The
3C-analyzer is able to process the 3C-seq data in a user-friendly
environment. This package integrates all the pipelines required for
3C-MTS/Capture-C and 4C-seq data analysis, and includes the full
workflow required for 3C-seq data analysis including raw data
processing, genome mapping, co-localization detection, and
significance analysis. All the analytic work can be operated through
graphic user interfaces (GUIs) in 3C-analyzer. The GUIs-based
pipeline is divided into three modules: 'Pre-processing',
'One-to-All', 'Many-to-All' (Figure1).
Figure 1: GUI of 3C-analyzer
After 'Pre-processing', users can follow different modules depending on 3C-seq technologies used. The module 'One-to-All' and 'Many-to-All' were used for 4C-seq and 3C-MTS/Capture-C, respectively. Figure 5 showed the layout of the sub-modules 'Lock Viewpoints', 'Trim FASTQ', 'Detect Co-localization', and 'Count Co-localization' in the 'Many-to-All' module. Following the pipeline by arrows, users can finish all operations required for 3C-MTS data analysis. Another feature of 3C-analyzer is that only a few hand-on steps are required. 3C-analyzer can automatically recognize raw data and establish the connections between the FASTQ format files and 3C libraries in the sub-module 'Sample management' (Figure 2A). All options related to locations of viewpoints and detection of co-localization were set in the sub-module 'Lock Viewpoints' (Figure 2B). 3C-analyzer provided multiple data outputs including text files of RC and Tscore in CSV format, and R data files for all statistical results in RData format.
Figure 2: GUIs of sample management and probe designAfter 'Pre-processing', users can follow different modules depending on 3C-seq technologies used. The module 'One-to-All' and 'Many-to-All' were used for 4C-seq and 3C-MTS/Capture-C, respectively. Figure 5 showed the layout of the sub-modules 'Lock Viewpoints', 'Trim FASTQ', 'Detect Co-localization', and 'Count Co-localization' in the 'Many-to-All' module. Following the pipeline by arrows, users can finish all operations required for 3C-MTS data analysis. Another feature of 3C-analyzer is that only a few hand-on steps are required. 3C-analyzer can automatically recognize raw data and establish the connections between the FASTQ format files and 3C libraries in the sub-module 'Sample management' (Figure 2A). All options related to locations of viewpoints and detection of co-localization were set in the sub-module 'Lock Viewpoints' (Figure 2B). 3C-analyzer provided multiple data outputs including text files of RC and Tscore in CSV format, and R data files for all statistical results in RData format.
High-throughput processing
3C-seq data analysis is usually computation-intensive but often varied dependent on complexity of 3C-seq libraries such as the numbers of 3C libraries, GWs and viewpoints as well as technologies used and the scale of reference genome (cis-/trans-interactions). 3C-analyzer is able to perform genome scale 3C-seq analysis with high-throughput ability. There are no limits on the size of raw data or number of FASTQ files in the pipelines as long as within the capacity of hardware in the users' computer. Our testing showed that preparation for the entire workflow can be easily done even with many sequencing datasets before triggering 3C-seq data analysis. To expedite computation speed, we applied multi-threads technology in 3C-analyzer. During parallel processing, 3C-analyzer first splits raw data into multiple partitions, and then applied multi-threads to compute these partitions simultaneously. The analytic process of the 3C-MTS data showed that the computational time was shorten five times when comparing 16 CPU cores (4 CPUs) to one CUP core. Under the 16 CPU cores, it took ~ 8 minutes per million read pairs per one hundred viewpoints on average . However, users are not encouraged to unlimitedly increase the number of multi-threads in one computer because the I/O ability of hardware restricted the analytic speed beyond 4 cores per CPU at full system load in our computational environment.
Figure3: Multi-threading in 3C-analyzer
Writing date: 2014.11.20, 2015.02.06
No comments:
Post a Comment