Abstract:
conversion of data frame between ‘One-on-one’ or ‘One-on-many’
The relationships
namely gene vs. GO terms, or gene vs. Pathways are complicated. There
are one gene annotated by many GO terms, and One GO term involves
many genes. Such conversion is common for GO enrichment analysis.
Here, I showed how to implement the function melt.list() from the R
package reshape and the function ddply() from the package plyr to
achieve such conversion.
Firstly, we get
example of gene~GOterms.
> #list by one gene vs. multiple GO terms
> library(annotate)
> library(hgu95av2.db)
> library(GO.db)
#three genes
> genes<-c("738_at","40840_at","1004_at")
> geneGO2.list<-lapply(genes, FUN=function(x){ names(get(x,
hgu95av2GO)) })
> names(geneGO2.list)<-genes
> geneGO2.list
$`738_at`
[1] "GO:0006144" "GO:0006195" "GO:0016310"
"GO:0016311" "GO:0017144" "GO:0044281"
"GO:0055086"
[8] "GO:0008219" "GO:0016311" "GO:0016311"
"GO:0005829" "GO:0050146" "GO:0005515"
"GO:0000166"
[15] "GO:0046872" "GO:0008253"
$`40840_at`
[1] "GO:0002931" "GO:0010849" "GO:0032780"
"GO:0046902" "GO:0071243" "GO:0071277"
"GO:0090200"
[8] "GO:0090324" "GO:2000276" "GO:0043066"
"GO:0090201" "GO:0070301" "GO:1902445"
"GO:2001243"
[15] "GO:0000413" "GO:0006457" "GO:0008637"
"GO:0010939" "GO:0070266" "GO:0005753"
"GO:0016020"
[22] "GO:0005759" "GO:0005515" "GO:0003755"
"GO:0016018"
$`1004_at`
[1] "GO:0006928" "GO:0032467" "GO:0006935"
"GO:0006955" "GO:0007186" "GO:0048535"
"GO:0042113"
[8] "GO:0070098" "GO:0005886" "GO:0005887"
"GO:0009897" "GO:0004930" "GO:0005515"
"GO:0016494"
Then, we convert
list to data frame type
#data frame by one gene vs. one GO term from the list type
> library(reshape)
> geneGO<-melt.list(geneGO2.list)
> colnames(geneGO)<-c('GO', 'gene')
> head(geneGO)
GO gene
1 GO:0006144 738_at
2 GO:0006195 738_at
3 GO:0016310 738_at
4 GO:0016311 738_at
5 GO:0017144 738_at
6 GO:0044281 738_at
Finally, I showed
how to convert the data frame with one on one into another data frame
with one on many, and the reversed process.
> #data frame by one gene vs. multiple GO terms
> library(plyr)
> geneGO2<-ddply(geneGO, c('gene'), summarize, GO=paste(GO,
collapse=',') )
> head(geneGO2)
gene
1 1004_at
2 40840_at
3 738_at
GO
1
GO:0006928,GO:0032467,GO:0006935,GO:0006955,GO:0007186,GO:0048535,GO:0042113,GO:0070098,GO:0005886,GO:0005887,GO:0009897,GO:0004930,GO:0005515,GO:0016494
2
GO:0002931,GO:0010849,GO:0032780,GO:0046902,GO:0071243,GO:0071277,GO:0090200,GO:0090324,GO:2000276,GO:0043066,GO:0090201,GO:0070301,GO:1902445,GO:2001243,GO:0000413,GO:0006457,GO:0008637,GO:0010939,GO:0070266,GO:0005753,GO:0016020,GO:0005759,GO:0005515,GO:0003755,GO:0016018
3
GO:0006144,GO:0006195,GO:0016310,GO:0016311,GO:0017144,GO:0044281,GO:0055086,GO:0008219,GO:0016311,GO:0016311,GO:0005829,GO:0050146,GO:0005515,GO:0000166,GO:0046872,GO:0008253
> #reversed conversion
> geneGO.rev<-ddply(geneGO2, c('gene'), summarize,
GO=strsplit(GO, split=',')[[1]] )
> head(geneGO.rev)
gene GO
1 1004_at GO:0006928
2 1004_at GO:0032467
3 1004_at GO:0006935
4 1004_at GO:0006955
5 1004_at GO:0007186
6 1004_at GO:0048535
>
> #data frame by one GO term multiple genes
> GOgene2<-ddply(geneGO, c('GO'), summarize, gene=paste(gene,
collapse=',') )
> head(GOgene2)
GO gene
1 GO:0000166 738_at
2 GO:0005515 738_at,40840_at,1004_at
3 GO:0005829 738_at
4 GO:0006144 738_at
5 GO:0006195 738_at
6 GO:0008219 738_at
Writing date:
20150510
No comments:
Post a Comment