Friday, February 6, 2015

Software: mRNA-seq analysis using eRNA

Abstract: eRNA present a bioinformatics pipeline for mRNA-seq.

The pipeline of mRNA identification is categorized into reference sequences, genome mapping, transcripts assembling, and differential expression, of which all parameters are set in batch through GUIs step by step as the guide of arrows in this module. Except selection of RNA samples provided by eRNA, all parameters presented in the GUI windows are consistent with those in TopHat and Cufflinks because the work of mRNA identification in eRNA is finished by these tools.

      Reference selection

FASTA format file of genome sequences and GTF format file of annotation are selected required for mRNA identification (Fig. 1).
Fig. 1 GUI of reference selection.
Tips: Names of reference sequences must be consistent in different file format. Namely human genome, file name of genome sequences in FASTA format is human_NCBI_build372.fa. File name of genome annotation in GTF format is human_NCBI_build372.gft. The index name for Bowtie alignment is human_NCBI_build372.
Tips: References can be directly downloaded from http://bowite-bio.soureforge.net/index.shtml, where all required files are available. Or, you can download references from other source, namely NCBI FTP sites, and build the fasta file and bowtie index by yourself.

      Genome mapping

Options of reference genome mapping are actually options of TopHat (Fig. 2).
  • Reference genome: genome sequences in FASTA format and genome annotation in GTF format are forced selected.
  • Number of threads is an option involved in TopHat. Multiple threads will speed up mRNA analysis.
  • Select sample names: The selected samples will be further analyzed.
  • Other options: These options are consistent with the options in TopHat. See the details on web site (http://tophat.cbcb.umd.edu/).
Fig. 2 GUI of reference genome mapping.

      Transcripts assembling

Options of reference genome mapping are actually options of Cufflinks (Fig. 3).
  • Reference genome: Both options will inherit from options of reference genome mapping.
  • Number of threads is an option involved in Cufflinks. Multiple threads will speed up mRNA analysis.
  • Other options: These options are consistent with the options in Cufflinks. See the details on web site (http://cufflinks.cbcb.umd.edu/).
Fig. 3 GUI of transcripts assembling.

      Differential expression

Options of reference genome mapping are actually options of Cuffdiff , one program in Cufflinks (Fig. 4).
  • Reference genome: Both options will inherit from options of reference genome mapping.
  • Number of threads is an option involved in Cuffdiff. Multiple threads will speed up mRNA analysis.
  • Other options: These options are consistent with the options in Cuffdiff. See the details on web site (http://cufflinks.cbcb.umd.edu/).


Fig. 4 GUI of differential expression profiling analysis.


Writing data: 2013.12.15, 2015.02.06

No comments:

Post a Comment