genetic-data-analysis-rcc-1
所属分类:数据挖掘/数据仓库
开发工具:matlab
文件大小:34KB
下载次数:0
上传日期:2016-11-17 16:44:07
上 传 者:
sh-1993
说明: 研究计算中心遗传数据分析和可视化导论,第1部分
(Research Computing Center introduction to analysis and visualization of genetic data, Part 1)
文件列表:
LICENSE (1108, 2016-11-18)
code (0, 2016-11-18)
code\demo.ggplot.R (735, 2016-11-18)
code\geno_pca.m (2183, 2016-11-18)
code\misc.R (254, 2016-11-18)
code\plotadmix.R (1375, 2016-11-18)
code\plotpca.R (3011, 2016-11-18)
code\read.data.R (2176, 2016-11-18)
code\svdk.m (4128, 2016-11-18)
code\traw2mat.m (1775, 2016-11-18)
code\write_mean_genotypes.m (376, 2016-11-18)
code\write_pc_matrix.m (529, 2016-11-18)
code\write_rot_matrix.m (527, 2016-11-18)
conduct.md (1461, 2016-11-18)
data (0, 2016-11-18)
data\omni_samples.20141118.panel (32474, 2016-11-18)
episodes (0, 2016-11-18)
episodes\01-setup.md (3510, 2016-11-18)
episodes\02-pca.md (9854, 2016-11-18)
episodes\03-pca-project.md (5566, 2016-11-18)
episodes\04-admixture.md (5641, 2016-11-18)
extras (0, 2016-11-18)
extras\1kg.md (4325, 2016-11-18)
results (0, 2016-11-18)
# Analysis of Genetic Data, Part 1
**Research Computing Center, University of Chicago**
November 2, 2016
2:00 pm - 4:00 pm
**Instructor:** Peter Carbonetto
**Helper:** Will Graybeal
Register [here](http://training.uchicago.edu/course_detail.cfm?course_id=1714).
## General Information
In this 2-hour workshop, participants will apply simple approaches to
investigate and visualize large-scale genetic data sets, with an
emphasis on practical skills that can be applied to genetics
research. This is intended to be a more informal, hands-on workshop,
and no background in genetics is required; anyone with intermediate
computing skills (see "Prerequisites") who is curious about human
genetics and the "genomics revolution" is encouraged to register. Over
the course of the 2 hours, interesting insights will be generated
directly from "raw" genetic data, and participants can continue to
explore the data independently using the techniques introduced in
class.
**Level:** Intermediate
**Prerequistes:** This workshop assumes some experience performing
simple tasks in a UNIX-like shell environment, as well as basic
familiarity with R. Participants must be able to log in to the RCC
compute cluster, although experience using the RCC cluster is not
required. *All participants must bring a laptop with a Mac, Linux, or
Windows operating sytem that they have administrative privileges on.*
**Where:** Kathleen A. Zar Room, John Crerar Library, University of
Chicago ([OpenStreetMap](https://www.openstreetmap.org/search?query=john%20crerar%20library#map=18/41.79053/-87.60282)).
**Additional info:** This workshop is an attempt to apply elements of
the
[Software Carpentry approach](http://software-carpentry.org/lessons)
(see also
[this article](http://dx.doi.org/10.12688/f1000research.3-62.v2)) to
interactive instruction for computing/quantitative sciences. Some of
the materials contained within are adapted from a
[Stanford workshop](https://github.com/Ancestry/cehg16-workshop) given
in March 2016. For a more in-depth exploration of the concepts and
techniques introduced, see [John Novembre's](http://jnpopgen.org)
[PopGen workshop](https://github.com/NovembreLab/HGDP_PopStruct_Exercise).
Please also take a look at the [Code of Conduct](conduct.md), and the
[Software License](LICENSE) which applies to all the scripts and code
examples in this repository. All instructional material contained in
this repository is made available under the Creative Commons
Attribution license
([CC BY 4.0](https://creativecommons.org/licenses/by/4.0)).
## Aims
1. Explore the application of numeric techniques for investigating
genetic diversity and population structure from large-scale genotype
data.
2. Understand how large genetic data sets are commonly represented in
computer files.
3. Use command-line tools to manipulate genetic data, and use R to
summarize and visualize the results of a genetic data analysis.
4. Practice using the RCC shell environment (*midway*) for large-scale
computation.
## Episodes
| Episode | Concepts |
| --- | --- |
| 1. [Setup](episodes/01-setup.md) | How do I set up my shell environment on *midway* for an analysis of genetic data? |
| 2. [Principal component analysis of genetic data](episodes/02-pca.md) | How do I encode genetic polymorphism data?
How do I represent genetic polymorphism data as a matrix?
How can I visualize the results of PCA to gain insight into structure of genetic data? |
| 3. [Making predictions using PCA](episodes/03-pca-project.md) | How do I ensure a consistent encoding of the genotype data?
How do I map another genetic data set onto an existing PCA result?
What does this mapping tell us (and not tell us) about a sample's ancestral origins? |
| 4. [ADMIXTURE analysis of genetic data](episodes/04-admixture.md) | How do I visualize and interpret the results of running ADMIXTURE on genetic data?
How do I use the ADMIXTURE results to make predictions for new samples? |
## Extras
[Preparation of the 1000 Genomes genotype data](extras/1kg.md)
近期下载者:
相关文件:
收藏者: