Huanan_market_samples

所属分类:生物医药技术
开发工具:TeX
文件大小:8515KB
下载次数:0
上传日期:2023-05-13 21:19:51
上 传 者sh-1993
说明:  中国疾病预防控制中心华南海鲜市场测序数据的SARS-CoV-2读取计数与宏基因组含量分析
(Analysis of SARS-CoV-2 read counts versus metagenomic content for Huanan Seafood market sequencing data from Chinese CDC)

文件列表:
LICENSE (1068, 2023-07-03)
Snakefile (24974, 2023-07-03)
config.yaml (6364, 2023-07-03)
data (0, 2023-07-03)
data\CRA010170.xlsx (90585, 2023-07-03)
data\docs_plot_annotations.yaml (13524, 2023-07-03)
data\positive_table.csv (5117, 2023-07-03)
docs (0, 2023-07-03)
docs\crits_christoph_vs_current_run_corr.html (22248, 2023-07-03)
docs\ct_vs_reads.html (73028, 2023-07-03)
docs\genomic_contig_composition.html (4937, 2023-07-03)
docs\index.html (2520, 2023-07-03)
docs\mito_composition.html (473419, 2023-07-03)
docs\overall_corr.html (1242880, 2023-07-03)
docs\per_species_corr_faceted.html (21371785, 2023-07-03)
docs\per_species_corr_single.html (21372506, 2023-07-03)
docs\sars2_aligned.html (94600, 2023-07-03)
docs\sars2_aligned_vertical.html (95111, 2023-07-03)
docs\theil_sen_corr.html (1243228, 2023-07-03)
environment.yml (363, 2023-07-03)
notebooks (0, 2023-07-03)
notebooks\aggregate_all_counts.py.ipynb (17575, 2023-07-03)
notebooks\check_sha512_vs_crits_christoph.py.ipynb (5320, 2023-07-03)
notebooks\make_plots.py.ipynb (58328, 2023-07-03)
notebooks\mitochondrial_genomes_to_retain.py.ipynb (7135, 2023-07-03)
notebooks\rt_qpcr.py.ipynb (9154, 2023-07-03)
paper (0, 2023-07-03)
paper\figures (0, 2023-07-03)
paper\figures\composition.pdf (52997, 2023-07-03)
paper\figures\composition.svg (41015, 2023-07-03)
paper\figures\crits_christoph_vs_current_run_corr.pdf (17767, 2023-07-03)
paper\figures\crits_christoph_vs_current_run_corr.svg (65332, 2023-07-03)
paper\figures\different_corrs (0, 2023-07-03)
paper\figures\different_corrs\correlations.pdf (466009, 2023-07-03)
paper\figures\different_corrs\correlations.svg (865520, 2023-07-03)
paper\figures\different_corrs\human.jpeg (10956, 2023-07-03)
... ...

# Association between SARS-CoV-2 and metagenomic content of samples from the Huanan Seafood Market This repository contains a fully reproducible computational pipeline for analyzing the SARS-CoV-2 and chordate mitochondrial metagenomic content of the deep sequencing of environmental samples taken from the Huanan Seafood market by [Liu et al (2023)](https://www.nature.com/articles/s41586-023-06043-2). The pipeline was created by Jesse Bloom. The analysis and results are described in [Bloom et al, bioRxiv, DOI 10.1101/2023.04.25.538336](https://doi.org/10.1101/2023.04.25.538336). ## Results and plots Key results are in the [./results/](results) subdirectory. These include: - [results/metadata/merged_metadata.csv](results/metadata/merged_metadata.csv): metadata about the samples extracted by processing files provided on the NGDC by [Liu et al (2023)](https://www.nature.com/articles/s41586-023-06043-2). - [results/crits_christoph_data/check_sha512_vs_crits_christoph.csv](results/crits_christoph_data/check_sha512_vs_crits_christoph.csv): comparison of SHA-512 hashes for the FASTQ files downloaded from the NGDC to those reported in the earlier analysis by [Crits-Christoph et al (2023)](https://zenodo.org/record/7754299#.ZEghB-zMKX0). - [results/mitochondrial_genomes/retained.csv](results/mitochondrial_genomes/retained.csv): the set of chordate mitochondrial genomes to which reads were aligned for the metagenomic analysis. - [results/aggregated_counts/sars2_aligned_by_run.csv](results/aggregated_counts/sars2_aligned_by_run.csv): number of aligned SARS-CoV-2 reads for each sequencing run. - [results/aggregated_counts/sars2_aligned_by_sample.csv](results/aggregated_counts/sars2_aligned_by_sample.csv): number of aligned SARS-CoV-2 reads for each sample. Note a few samples (like A20) are listed twice with the *description* column indicating how each listing was sequenced (the sample was sequenced two different ways). - [results/aggregated_counts/mito_composition_by_run.csv](results/aggregated_counts/mito_composition_by_run.csv): chordate mitochondrial composition for each sequencing run. - [results/aggregated_counts/mito_composition_by_sample.csv](results/aggregated_counts/mito_composition_by_sample.csv): chordate mitochondrial composition for each sample. Note a few samples (like A20) are listed twice with the *description* column indicating how each listing was sequenced (the sample was sequenced two different ways). - [results/rt_qpcr/rt_qpcr.csv](results/rt_qpcr/rt_qpcr.csv): SARS-CoV-2 content of samples determined in current sequencing and Ct values from RT-qPCR reported by [Liu et al (2023)](https://www.nature.com/articles/s41586-023-06043-2). - [results/plots/susceptible_table.csv](results/plots/susceptible_table.csv): SARS-CoV-2 content of samples with high chordate mitochondrial composition from susceptible species sold live at the market. - [results/plots/susceptible_mammal_table.csv](results/plots/susceptible_mammal_table.csv): SARS-CoV-2 content of samples with high mammalian mitochondrial composition from susceptible species sold live at the market. - [results/plots/raccoon_dog_long.csv](results/plots/raccoon_dog_long.csv): SARS-CoV-2 content of all samples ordered by the percent of the chordate mitochondrial composition from raccoon dogs. - [results/contigs/counts_and_coverage/processed_counts.csv](results/contigs/counts_and_coverage/processed_counts.csv): results for aligning assembled contigs to full genomes for selected samples and genomes. Note that the pipeline also produces many other results files (some of which are very large) that are not tracked in this repo. Interactive plots of the results created using [Altair](https://altair-viz.github.io/) are rendered from the [./docs/](docs) subdirectory via GitHub Pages at [https://jbloom.github.io/Huanan_market_samples/](https://jbloom.github.io/Huanan_market_samples/) ## Understanding and running the pipeline The entire analysis can be run in automated fashion using [snakemake](https://snakemake.readthedocs.io/). The pipeline itself is in [Snakefile](Snakefile). The configuration for the pipeline is specified in [config.yaml](config.yaml). The pipeline uses the [conda](https://docs.conda.io/) environment in [environment.yml](environment.yml), which specifies the precise versions of all software used. The one exception is that the rule `build_contigs` in [Snakefile](Snakefile) uses an [environment module](https://modules.readthedocs.io/en/latest/) that is pre-built on the Fred Hutch computing cluster to run the [Trinity](https://github.com/trinityrnaseq/trinityrnaseq/wiki) to build contigs---to run this rule, you will need to specify a comparable module for whatever computing system you are using, or skip the contig building by commenting out the file `results/contigs/counts_and_coverage/processed_counts.csv` as an input to the `all` rule in [Snakefile](Snakefile). The scripts and Jupyter notebooks used by the pipeline are in [./scripts/](scripts) and [./notebooks/](notebooks), respectively. Most data used by the pipeline is downloaded by the pipeline, but it takes the following to input files, both found in [./data/](data): - [data/CRA010170.xlsx](data/CRA010170.xlsx) is the GSA BioProject metadata sheet downloaded from the NGDC GSA page [https://ngdc.cncb.ac.cn/gsa/browse/CRA010170](https://ngdc.cncb.ac.cn/gsa/browse/CRA010170) on March-29-2023. - [data/positive_table.csv](data/positive_table.csv) is a version of Supplementary Table 2 from [Liu et al (2023)](https://www.nature.com/articles/s41586-023-06043-2), taken from [this link](https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-023-06043-2/MediaObjects/41586_2023_6043_MOESM4_ESM.docx) (archived [here](https://web.archive.org/web/20230405155400/https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-023-06043-2/MediaObjects/41586_2023_6043_MOESM4_ESM.docx)). To run the pipeline on the Fred Hutch computing cluster, use the commands in [run_Hutch_cluster.bash](run_Hutch_cluster.bash).

近期下载者

相关文件


收藏者