FastIntegration
所属分类:collect
开发工具:R
文件大小:0KB
下载次数:0
上传日期:2023-01-27 09:04:30
上 传 者:
sh-1993
说明: FastIntegrate集成数千个scRNA序列数据集,并输出用于下游分析的批量校正值,
(FastIntegrate integrates thousands of scRNA-seq datasets and outputs batch- corrected values for downstream analysis,)
文件列表:
DESCRIPTION (891, 2023-01-27)
FastIntegration.Rproj (356, 2023-01-27)
NAMESPACE (523, 2023-01-27)
R/ (0, 2023-01-27)
R/CELLiD.R (1606, 2023-01-27)
R/FastFindAnchors.R (9231, 2023-01-27)
R/FastIntegration.R (7915, 2023-01-27)
R/GetDiscoData.R (2610, 2023-01-27)
R/OneStop.R (1808, 2023-01-27)
R/RcppExports.R (329, 2023-01-27)
R/Utility.R (132, 2023-01-27)
src/ (0, 2023-01-27)
src/FastIntegration.so (1759784, 2023-01-27)
src/Makevars (16, 2023-01-27)
src/RcppExports.cpp (1491, 2023-01-27)
src/RcppExports.o (2246784, 2023-01-27)
src/inetgration.cpp (364, 2023-01-27)
src/inetgration.o (1447592, 2023-01-27)
# FastIntegration v1.1.0
FastIntegration provides two main functions:
- A fast and high-capacity version of Seurat Integration
- Access data from [DISCO](https://www.immunesinglecell.org/) database
**Recent update:
Thanks to [Nathan Siemers](https://github.com/NathanSiemers) for reporting [bugs](https://github.com/JinmiaoChenLab/FastIntegration/issues/6) and providing [suggestions](https://github.com/JinmiaoChenLab/FastIntegration/issues/6) on DownloadDiscoData function. We have added resume download functions in the new version.
More vignettes can be found at https://immunesinglecell.org/vignette/docs/DISCO/FastIntegration/cell-type-atlas
## Requirement
FastIntegration requires the following packages:
- [R](https://www.r-project.org/) (\>= 4.0.0)
- [Seurat](https://cran.r-project.org/web/packages/Seurat/index.html) (\>= 4.0.0)
- [SeuratObject](https://cran.r-project.org/web/packages/SeuratObject/index.html) (\>= 4.0.0)
- [data.table](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html)
- [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html)
- [tictoc](https://cran.r-project.org/web/packages/tictoc/index.html)
- [dplyr](https://cran.r-project.org/web/packages/dplyr/index.html)
- [pbmcapply](https://cran.r-project.org/web/packages/pbmcapply/index.html)
- [stringr](https://cran.r-project.org/web/packages/stringr/vignettes/stringr.html)
- [tools](https://www.rdocumentation.org/packages/tools/versions/3.6.2)
We highly recommend you to build R with openblas which will accelerate integration 2-3x times.
Here is the common way to do it:
sudo yum install -y openblas openblas-threads openblas-openmp \# for centos
sudo apt-get install libopenblas-dev \# for debian
./configure --enable-R-shlib --enable-byte-compiled-packages --enable-BLAS-shlib --enable-memory-profiling
--with-blas="-lopenblas"
## Installation
``` r
devtools::install_github("git@github.com:JinmiaoChenLab/FastIntegrate.git")
```
## Usage
### Preprocess
``` r
library(Seurat)
library(pbmcapply)
library(stringr)
rna.list = readRDS("rna_list.rds") # read list of Seurat object, each element in list is a sample
# make all samples have same genes
overlapped.gene = Reduce(intersect, lapply(rna.list, rownames))
for (i in 1:length(rna.list)) {
rna.list[[i]] = subset(rna.list[[i]], features = overlapped.gene)
rna.list[[i]] = NormalizeData(rna.list[[i]])
rna.list[[i]] = FindVariableFeatures(rna.list[[i]])
rna.list[[i]] = RenameCells(rna.list[[i]], new.names = paste0(Cells(rna.list[[i]]), "--", i))
}
```
### Step by step integration
``` r
library(Seurat)
library(pbmcapply)
library(FastIntegration)
# rna.list is the list of seurat object
BuildIntegrationFile(rna.list = rna.list, tmp.dir = "./", nCores = 50)
FastFindAnchors(tmp.dir = "./", nCores = 50)
# close current R session and open a new one to clean the memory (This is important for large data integration)
# In the new session, please just set work directory and do not load any data. Then run the following codes:
library(Seurat)
library(pbmcapply)
library(FastIntegration)
genes = readRDS("FastIntegrationTmp/raw/1.rds")
genes = rownames(genes)
idx = split(1:length(genes), cut(1:length(genes), 20, labels = FALSE))
pbmclapply(
1:20, function(i) {
rna.integrated = FastIntegration(tmp.dir = "./", npcs = 1:30, slot = "data",
features.to.integrate = genes[idx[[i]]])
saveRDS(rna.integrated, paste0("FastIntegrationTmp/inte/inte_", i, ".rds"), compress = F)
}, mc.cores = 20
)
```
### After integration
``` r
##### create Seurat obj with the variable features of integration (For very big dataset) #####
features = readRDS("FastIntegrationTmp/others/features.rds")
rna.data = pbmclapply(
1:20, function(i) {
rna = readRDS(paste0("./FastIntegrationTmp/inte/inte_", i, ".rds"))
rna = rna[intersect(rownames(rna), features),]
return(rna)
}, mc.cores = 20
)
rna.data = do.call(rbind, rna.data)
rna.data = CreateSeuratObject(rna.data)
rna.data = ScaleData(rna.data, features = features)
rna.data = RunPCA(rna.data, features = features, npcs = 50)
rna.data = FindNeighbors(rna.data, dims = 1:50)
rna.data = FindClusters(rna.data, graph.name = "RNA_snn", algorithm = 2)
rna.data = RunUMAP(rna.data, dims = 1:50)
##### select varibale gene based on integrated data (For dataset with less than 100 samples) #####
rna.data = pbmclapply(
1:20, function(i) {
rna = readRDS(paste0("FastIntegrationTmp/inte/inte_", i, ".rds"))
return(rna)
}, mc.cores = 20
)
rna.data = do.call(rbind, rna.data)
rna.data = CreateSeuratObject(rna.data)
rna.data = FindVariableFeatures(rna.data, nfeatures = 2000)
features = VariableFeatures(rna.data)
rna.data = ScaleData(rna.data, features = features)
rna.data = RunPCA(rna.data, features = features)
rna.data = FindNeighbors(rna.data, dims = 1:50)
rna.data = FindClusters(rna.data, resolution = 0.5, algorithm = 2)
rna.data = RunUMAP(rna.data, dims = 1:50)
```
### Download data from DISCO
``` r
##### Filter samples and get metadata #####
# You can filter samples by their different headers: tissue, disease, platform, project.id. sample.id, sample.type
# For each header, you can select multiple items as follows:
meta = FindSampleByMetadata(tissue = c("blood", "kidney"))
##### Download sample #####
# dir is the location where the files are saved
DownloadDiscoData(meta, dir = "./disco") # mostly CD4 T cells (CD3E+CD8A-)
##### Recover counts slot #####
# To reduce file size, we removed counts slot from data. You can recover it as follow:
rna = readRDS("/test/AML0024_3p.rds")
rna = AddCountsSlot(rna)
```
## Usage Scenario
We have apply FastIntegration to [DISCO](http://www.immunesinglecell.org/) database for integrating thousands of samples.
## License
All other code in this repository is licensed under a [GPL-3](https://www.r-project.org/Licenses/GPL-3) license.
近期下载者:
相关文件:
收藏者: