ProteomicsAnnotationHubData

所属分类:生物医药技术
开发工具:R
文件大小:0KB
下载次数:0
上传日期:2022-01-06 11:46:42
上 传 者sh-1993
说明:  蛋白质组学数据的注释中心数据,
(Annotation hub data for proteomics data,)

文件列表:
.Rbuildignore (42, 2022-01-06)
DESCRIPTION (1090, 2022-01-06)
NAMESPACE (619, 2022-01-06)
NEWS (1361, 2022-01-06)
R/ (0, 2022-01-06)
R/PAHD.R (5519, 2022-01-06)
R/ProteomicsAnnotationHubData.R (1727, 2022-01-06)
R/utils.R (4392, 2022-01-06)
R/zzz.R (438, 2022-01-06)
inst/ (0, 2022-01-06)
inst/extdata/ (0, 2022-01-06)
inst/extdata/PXD000001.dcf (4004, 2022-01-06)
inst/extdata/PXD000001Fasta.rda (1026, 2022-01-06)
inst/extdata/PXD000001MSnSet.rda (1031, 2022-01-06)
inst/extdata/PXD000001MzID.rda (1040, 2022-01-06)
inst/extdata/PXD000001MzML.rda (1030, 2022-01-06)
inst/scripts/ (0, 2022-01-06)
inst/scripts/PXD000001.R (3324, 2022-01-06)
man/ (0, 2022-01-06)
man/PAHD-class.Rd (479, 2022-01-06)
man/PAHD.Rd (1156, 2022-01-06)
man/ProteomicsAnnotationHubData.Rd (1051, 2022-01-06)
man/identicalRemLoc.Rd (595, 2022-01-06)
man/makeAnnotationHubMetadata.Rd (677, 2022-01-06)
man/readPahdFiles.Rd (815, 2022-01-06)
man/writePahdTemplate.Rd (945, 2022-01-06)
tests/ (0, 2022-01-06)
tests/testthat.R (264, 2022-01-06)
tests/testthat/ (0, 2022-01-06)
tests/testthat/test_PXD000001.R (958, 2022-01-06)
vignettes/ (0, 2022-01-06)
vignettes/ProteomicsAnnotationHubData.Rmd (18024, 2022-01-06)

# `AnnotationHub` for proteomics The aim of this package is to offer access to mass spectrometry and proteomics data throught the *[AnnotationHub](http://bioconductor.org/packages/AnnotationHub)* infrastructure. ```r library("AnnotationHub") #> Loading required package: BiocGenerics #> Loading required package: methods #> Loading required package: parallel #> #> Attaching package: 'BiocGenerics' #> #> The following objects are masked from 'package:parallel': #> #> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, #> clusterExport, clusterMap, parApply, parCapply, parLapply, #> parLapplyLB, parRapply, parSapply, parSapplyLB #> #> The following objects are masked from 'package:stats': #> #> IQR, mad, xtabs #> #> The following objects are masked from 'package:base': #> #> anyDuplicated, append, as.data.frame, as.vector, cbind, #> colnames, do.call, duplicated, eval, evalq, Filter, Find, get, #> grep, grepl, intersect, is.unsorted, lapply, lengths, Map, #> mapply, match, mget, order, paste, pmax, pmax.int, pmin, #> pmin.int, Position, rank, rbind, Reduce, rownames, sapply, #> setdiff, sort, table, tapply, union, unique, unlist, unsplit ah <- AnnotationHub() #> snapshotDate(): 2015-11-19 ah #> AnnotationHub with 35372 records #> # snapshotDate(): 2015-11-19 #> # $dataprovider: BroadInstitute, UCSC, Ensembl, ftp://ftp.ncbi.nlm.nih.... #> # $species: Homo sapiens, Mus musculus, Bos taurus, Pan troglodytes, Da... #> # $rdataclass: GRanges, BigWigFile, FaFile, ChainFile, OrgDb, Inparanoi... #> # additional mcols(): taxonomyid, genome, description, tags, #> # sourceurl, sourcetype #> # retrieve records with, e.g., 'object[["AH2"]]' #> #> title #> AH2 | Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa #> AH3 | Ailuropoda_melanoleuca.ailMel1.69.dna_rm.toplevel.fa #> AH4 | Ailuropoda_melanoleuca.ailMel1.69.dna_sm.toplevel.fa #> AH5 | Ailuropoda_melanoleuca.ailMel1.69.ncrna.fa #> AH6 | Ailuropoda_melanoleuca.ailMel1.69.pep.all.fa #> ... ... #> AH49587 | org.Ce.eg.db.sqlite #> AH49588 | org.Xl.eg.db.sqlite #> AH49589 | org.Sc.sgd.db.sqlite #> AH49590 | org.Dr.eg.db.sqlite #> AH49591 | org.Pf.plasmo.db.sqlite ``` Let's start by querying the entries that originate from the PRIDE database: ```r query(ah, "PRIDE") #> AnnotationHub with 4 records #> # snapshotDate(): 2015-11-19 #> # $dataprovider: PRIDE #> # $species: Erwinia carotovora #> # $rdataclass: AAStringSet, MSnSet, mzRident, mzRpwiz #> # additional mcols(): taxonomyid, genome, description, tags, #> # sourceurl, sourcetype #> # retrieve records with, e.g., 'object[["AH49006"]]' #> #> title #> AH49006 | PXD000001: Erwinia carotovora and spiked-in protein fasta file #> AH49007 | PXD000001: Peptide-level quantitation data #> AH49008 | PXD000001: raw mass spectrometry data #> AH49009 | PXD000001: MS-GF+ identiciation data ``` Or those of a specific project ```r ah[grep("PXD000001", ah$title)] #> AnnotationHub with 4 records #> # snapshotDate(): 2015-11-19 #> # $dataprovider: PRIDE #> # $species: Erwinia carotovora #> # $rdataclass: AAStringSet, MSnSet, mzRident, mzRpwiz #> # additional mcols(): taxonomyid, genome, description, tags, #> # sourceurl, sourcetype #> # retrieve records with, e.g., 'object[["AH49006"]]' #> #> title #> AH49006 | PXD000001: Erwinia carotovora and spiked-in protein fasta file #> AH49007 | PXD000001: Peptide-level quantitation data #> AH49008 | PXD000001: raw mass spectrometry data #> AH49009 | PXD000001: MS-GF+ identiciation data ``` To see the metadata of a specific entry, we use its AnnotationHub entry number inside single `[` ```r ah["AH49008"] #> AnnotationHub with 1 record #> # snapshotDate(): 2015-11-19 #> # names(): AH49008 #> # $dataprovider: PRIDE #> # $species: Erwinia carotovora #> # $rdataclass: mzRpwiz #> # $title: PXD000001: raw mass spectrometry data #> # $description: Four human TMT spliked-in proteins in an Erwinia caroto... #> # $taxonomyid: 554 #> # $genome: NA #> # $sourcetype: mzML #> # $sourceurl: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2012/03/PXD0... #> # $sourcelastmodifieddate: NA #> # $sourcesize: NA #> # $tags: Proteomics, TMT6, LTQ Orbitrap Velos, PMID:23692960 #> # retrieve record with 'object[["AH49008"]]' ``` To access the actual data, raw mass spectrometry data in this case, we double the `[[` ```r library("mzR") #> Loading required package: Rcpp rw <- ah[["AH49008"]] #> loading from cache '/home/lg390/.AnnotationHub/55314' rw #> Mass Spectrometry file handle. #> Filename: 55314 #> Number of scans: 7534 ``` In this case, we have an instance of class mzRpwiz, that can be processed as anticipated. In the short demonstration above, we had **direct** and **standardised** access to the raw data, without a need to manually open this raw data or worry about the file format. The data was prepared and converted into a **standard Bioconductor data types** for immediate consumption by the user. This is also valid for other relevant data types such as identification results, fasta files or protein of peptide quantitation data. See the `ProteomicsAnnotationHubData` vignette for details on available data and how to add new proteomics data to AnnotationHub.

近期下载者

相关文件


收藏者