ProteomicsAnnotationHubData
所属分类:生物医药技术
开发工具:R
文件大小:0KB
下载次数:0
上传日期:2022-01-06 11:46:42
上 传 者:
sh-1993
说明: 蛋白质组学数据的注释中心数据,
(Annotation hub data for proteomics data,)
文件列表:
.Rbuildignore (42, 2022-01-06)
DESCRIPTION (1090, 2022-01-06)
NAMESPACE (619, 2022-01-06)
NEWS (1361, 2022-01-06)
R/ (0, 2022-01-06)
R/PAHD.R (5519, 2022-01-06)
R/ProteomicsAnnotationHubData.R (1727, 2022-01-06)
R/utils.R (4392, 2022-01-06)
R/zzz.R (438, 2022-01-06)
inst/ (0, 2022-01-06)
inst/extdata/ (0, 2022-01-06)
inst/extdata/PXD000001.dcf (4004, 2022-01-06)
inst/extdata/PXD000001Fasta.rda (1026, 2022-01-06)
inst/extdata/PXD000001MSnSet.rda (1031, 2022-01-06)
inst/extdata/PXD000001MzID.rda (1040, 2022-01-06)
inst/extdata/PXD000001MzML.rda (1030, 2022-01-06)
inst/scripts/ (0, 2022-01-06)
inst/scripts/PXD000001.R (3324, 2022-01-06)
man/ (0, 2022-01-06)
man/PAHD-class.Rd (479, 2022-01-06)
man/PAHD.Rd (1156, 2022-01-06)
man/ProteomicsAnnotationHubData.Rd (1051, 2022-01-06)
man/identicalRemLoc.Rd (595, 2022-01-06)
man/makeAnnotationHubMetadata.Rd (677, 2022-01-06)
man/readPahdFiles.Rd (815, 2022-01-06)
man/writePahdTemplate.Rd (945, 2022-01-06)
tests/ (0, 2022-01-06)
tests/testthat.R (264, 2022-01-06)
tests/testthat/ (0, 2022-01-06)
tests/testthat/test_PXD000001.R (958, 2022-01-06)
vignettes/ (0, 2022-01-06)
vignettes/ProteomicsAnnotationHubData.Rmd (18024, 2022-01-06)
# `AnnotationHub` for proteomics
The aim of this package is to offer access to mass spectrometry and
proteomics data throught the *[AnnotationHub](http://bioconductor.org/packages/AnnotationHub)*
infrastructure.
```r
library("AnnotationHub")
#> Loading required package: BiocGenerics
#> Loading required package: methods
#> Loading required package: parallel
#>
#> Attaching package: 'BiocGenerics'
#>
#> The following objects are masked from 'package:parallel':
#>
#> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#> clusterExport, clusterMap, parApply, parCapply, parLapply,
#> parLapplyLB, parRapply, parSapply, parSapplyLB
#>
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, xtabs
#>
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, append, as.data.frame, as.vector, cbind,
#> colnames, do.call, duplicated, eval, evalq, Filter, Find, get,
#> grep, grepl, intersect, is.unsorted, lapply, lengths, Map,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, Position, rank, rbind, Reduce, rownames, sapply,
#> setdiff, sort, table, tapply, union, unique, unlist, unsplit
ah <- AnnotationHub()
#> snapshotDate(): 2015-11-19
ah
#> AnnotationHub with 35372 records
#> # snapshotDate(): 2015-11-19
#> # $dataprovider: BroadInstitute, UCSC, Ensembl, ftp://ftp.ncbi.nlm.nih....
#> # $species: Homo sapiens, Mus musculus, Bos taurus, Pan troglodytes, Da...
#> # $rdataclass: GRanges, BigWigFile, FaFile, ChainFile, OrgDb, Inparanoi...
#> # additional mcols(): taxonomyid, genome, description, tags,
#> # sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["AH2"]]'
#>
#> title
#> AH2 | Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa
#> AH3 | Ailuropoda_melanoleuca.ailMel1.69.dna_rm.toplevel.fa
#> AH4 | Ailuropoda_melanoleuca.ailMel1.69.dna_sm.toplevel.fa
#> AH5 | Ailuropoda_melanoleuca.ailMel1.69.ncrna.fa
#> AH6 | Ailuropoda_melanoleuca.ailMel1.69.pep.all.fa
#> ... ...
#> AH49587 | org.Ce.eg.db.sqlite
#> AH49588 | org.Xl.eg.db.sqlite
#> AH49589 | org.Sc.sgd.db.sqlite
#> AH49590 | org.Dr.eg.db.sqlite
#> AH49591 | org.Pf.plasmo.db.sqlite
```
Let's start by querying the entries that originate from the PRIDE
database:
```r
query(ah, "PRIDE")
#> AnnotationHub with 4 records
#> # snapshotDate(): 2015-11-19
#> # $dataprovider: PRIDE
#> # $species: Erwinia carotovora
#> # $rdataclass: AAStringSet, MSnSet, mzRident, mzRpwiz
#> # additional mcols(): taxonomyid, genome, description, tags,
#> # sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["AH49006"]]'
#>
#> title
#> AH49006 | PXD000001: Erwinia carotovora and spiked-in protein fasta file
#> AH49007 | PXD000001: Peptide-level quantitation data
#> AH49008 | PXD000001: raw mass spectrometry data
#> AH49009 | PXD000001: MS-GF+ identiciation data
```
Or those of a specific project
```r
ah[grep("PXD000001", ah$title)]
#> AnnotationHub with 4 records
#> # snapshotDate(): 2015-11-19
#> # $dataprovider: PRIDE
#> # $species: Erwinia carotovora
#> # $rdataclass: AAStringSet, MSnSet, mzRident, mzRpwiz
#> # additional mcols(): taxonomyid, genome, description, tags,
#> # sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["AH49006"]]'
#>
#> title
#> AH49006 | PXD000001: Erwinia carotovora and spiked-in protein fasta file
#> AH49007 | PXD000001: Peptide-level quantitation data
#> AH49008 | PXD000001: raw mass spectrometry data
#> AH49009 | PXD000001: MS-GF+ identiciation data
```
To see the metadata of a specific entry, we use its AnnotationHub
entry number inside single `[`
```r
ah["AH49008"]
#> AnnotationHub with 1 record
#> # snapshotDate(): 2015-11-19
#> # names(): AH49008
#> # $dataprovider: PRIDE
#> # $species: Erwinia carotovora
#> # $rdataclass: mzRpwiz
#> # $title: PXD000001: raw mass spectrometry data
#> # $description: Four human TMT spliked-in proteins in an Erwinia caroto...
#> # $taxonomyid: 554
#> # $genome: NA
#> # $sourcetype: mzML
#> # $sourceurl: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2012/03/PXD0...
#> # $sourcelastmodifieddate: NA
#> # $sourcesize: NA
#> # $tags: Proteomics, TMT6, LTQ Orbitrap Velos, PMID:23692960
#> # retrieve record with 'object[["AH49008"]]'
```
To access the actual data, raw mass spectrometry data in this case, we
double the `[[`
```r
library("mzR")
#> Loading required package: Rcpp
rw <- ah[["AH49008"]]
#> loading from cache '/home/lg390/.AnnotationHub/55314'
rw
#> Mass Spectrometry file handle.
#> Filename: 55314
#> Number of scans: 7534
```
In this case, we have an instance of class mzRpwiz,
that can be processed as anticipated.
In the short demonstration above, we had **direct** and
**standardised** access to the raw data, without a need to manually
open this raw data or worry about the file format. The data was
prepared and converted into a **standard Bioconductor data types** for
immediate consumption by the user. This is also valid for other
relevant data types such as identification results, fasta files or
protein of peptide quantitation data.
See the `ProteomicsAnnotationHubData` vignette for details on
available data and how to add new proteomics data to AnnotationHub.
近期下载者:
相关文件:
收藏者: