ped-sim

所属分类:操作系统开发
开发工具:C++
文件大小:0KB
下载次数:0
上传日期:2022-12-02 03:11:57
上 传 者sh-1993
说明:  谱系模拟器
(Pedigree simulator)

文件列表:
LICENSE (35141, 2022-12-01)
Makefile (1943, 2022-12-01)
Makefile-gsl (1957, 2022-12-01)
bpvcffam.cc (34811, 2022-12-01)
bpvcffam.h (1775, 2022-12-01)
cmdlineopts.cc (11598, 2022-12-01)
cmdlineopts.h (2490, 2022-12-01)
cointerfere.cc (8189, 2022-12-01)
cointerfere.h (1000, 2022-12-01)
datastructs.h (5749, 2022-12-01)
example/ (0, 2022-12-01)
example/cousins-1st_half_to_3rd.def (1739, 2022-12-01)
example/cousins-parent-sex-assign.def (247, 2022-12-01)
example/full-1cousin1.png (57773, 2022-12-01)
example/full_half_1st_2nd_cousins.def (1170, 2022-12-01)
example/once-removed.def (760, 2022-12-01)
example/second_deg.def (1943, 2022-12-01)
fam2def.py (14022, 2022-12-01)
fileorgz.cc (3968, 2022-12-01)
fileorgz.h (602, 2022-12-01)
fixedcos.cc (3110, 2022-12-01)
fixedcos.h (743, 2022-12-01)
geneticmap.cc (3217, 2022-12-01)
geneticmap.h (1937, 2022-12-01)
ibdseg.cc (22657, 2022-12-01)
ibdseg.h (1524, 2022-12-01)
interfere/ (0, 2022-12-01)
interfere/nu_p_campbell.tsv (622, 2022-12-01)
interfere/nu_p_campbell_X.tsv (644, 2022-12-01)
interfere/sex_av_nu_p_campbell.tsv (1118, 2022-12-01)
main.cc (8973, 2022-12-01)
plot-fam.R (1235, 2022-12-01)
readdef.cc (47304, 2022-12-01)
readdef.h (2033, 2022-12-01)
simulate.cc (21717, 2022-12-01)
simulate.h (1652, 2022-12-01)

Pedigree Simulator ================== Program to simulate pedigree structures. The method can use sex-specific genetic maps and randomly assigns the sex of each parent (or uses user-specified sexes) when using such maps. Recent updates -------------- Version 1.4 chooses alleles from the first haplotype if males are input with heterozygous genotypes. Formerly Ped-sim picked an allele at random, but this effectively introduces switch errors into the transmitted haplotypes. Version 1.3 now supports branch-specific sex assignments in the [def file](https://github.com/williamslab/ped-sim/blob/master/#def-file). Version 1.2 now supports simulating the X chromosome. See the [map file](https://github.com/williamslab/ped-sim/blob/master/#map-file) section for code to generate a map file that includes X chromosome positions from the [Bhrer et al. (2017)](https://github.com/williamslab/ped-sim/blob/master/http://dx.doi.org/10.1038/ncomms14994) map. To simulate genetic data (i.e., output a VCF file) that includes the X chromosome, [specify sexes of the input VCF using `--sexes`](https://github.com/williamslab/ped-sim/blob/master/#specifying-sexes-of-samples-in-the-input-vcf). To change the name Ped-sim considers as the X chromosome [use the `-X` option](https://github.com/williamslab/ped-sim/blob/master/#x-chromosome-name--x-string). Table of Contents ----------------- * Pedigree Simulator * [Basic usage](https://github.com/williamslab/ped-sim/blob/master/#basic-usage) * [Quick start](https://github.com/williamslab/ped-sim/blob/master/#quick-start) * [Compiling](https://github.com/williamslab/ped-sim/blob/master/#compiling) * [Def file (with examples)](https://github.com/williamslab/ped-sim/blob/master/#def-file) * [Map file](https://github.com/williamslab/ped-sim/blob/master/#map-file) * [Input VCF file](https://github.com/williamslab/ped-sim/blob/master/#input-vcf-file) * [Specifying sexes of samples in the input VCF](https://github.com/williamslab/ped-sim/blob/master/#specifying-sexes-of-samples-in-the-input-vcf) * [Crossover model](https://github.com/williamslab/ped-sim/blob/master/#crossover-model) * [Output IBD segments file](https://github.com/williamslab/ped-sim/blob/master/#output-ibd-segments-file) * [Output VCF file](https://github.com/williamslab/ped-sim/blob/master/#output-vcf-file) * [Output log file](https://github.com/williamslab/ped-sim/blob/master/#output-log-file) * [Sample ids for simulated individuals](https://github.com/williamslab/ped-sim/blob/master/#samp-ids) * [Output fam file](https://github.com/williamslab/ped-sim/blob/master/#output-fam-file) * [Output BP file](https://github.com/williamslab/ped-sim/blob/master/#output-bp-file) * [Output MRCA file](https://github.com/williamslab/ped-sim/blob/master/#output-mrca-file) * [Extra notes: sex-specific maps](https://github.com/williamslab/ped-sim/blob/master/#extra-notes-sex-specific-maps) * [Citing Ped-sim](https://github.com/williamslab/ped-sim/blob/master/#citing-ped-sim-and-related-papers) * [Other optional arguments](https://github.com/williamslab/ped-sim/blob/master/#other-optional-arguments) * [Specifying random seed](https://github.com/williamslab/ped-sim/blob/master/#specifying-random-seed---seed-) * [Using specified crossovers](https://github.com/williamslab/ped-sim/blob/master/#using-specified-set-of-crossovers---fixed_co-filename) * [Genotyping error rate](https://github.com/williamslab/ped-sim/blob/master/#genotyping-error-rate---err_rate-) * [Rate of opposite homozygote errors](https://github.com/williamslab/ped-sim/blob/master/#rate-of-opposite-homozygote-errors---err_hom_rate-) * [Missingness rate](https://github.com/williamslab/ped-sim/blob/master/#missingness-rate---miss_rate-) * [Pseudo-haploid rate](https://github.com/williamslab/ped-sim/blob/master/#pseudo-haploid-rate---pseudo_hap-) * [X chromosome name](https://github.com/williamslab/ped-sim/blob/master/#x-chromosome-name--x-string) * [Maintaining phase in output](https://github.com/williamslab/ped-sim/blob/master/#maintaining-phase-in-output---keep_phase) * [Listing input sample ids used as founders](https://github.com/williamslab/ped-sim/blob/master/#listing-input-sample-ids-used-as-founders---founder_ids) * [Retaining extra input samples](https://github.com/williamslab/ped-sim/blob/master/#retaining-extra-input-samples---retain_extra-) * [Extraneous tools](https://github.com/williamslab/ped-sim/blob/master/#extraneous-tools) * [Plotting pedigree structures: plot-fam.R](https://github.com/williamslab/ped-sim/blob/master/#plotting-pedigree-structures-plot-famr) * [Converting fam to def file: fam2def.py](https://github.com/williamslab/ped-sim/blob/master/#converting-fam-to-def-file-fam2defpy) ------------------------------------------------------ Basic usage: ------------ ./ped-sim -d -m -o --intf To use a non-interference crossover model, i.e., a Poisson model, use: ./ped-sim -d -m -o --pois The above both produce a file `[out_prefix].seg` containing IBD segments and `[out_prefix].log`, a log of what Ped-sim printed to stdout. With the above input options, Ped-sim does not produce genetic data, but only IBD segments for artificial (ungenotyped) relatives. To simulate relatives with genetic data (and using crossover interference modeling), run: ./ped-sim -d -m -i -o --intf Which will generate a fourth output file, `[out_prefix].vcf` (or `[out_prefix].vcf.gz`), Run `ped-sim` without arguments to see a summary of options. This document gives a detailed description of the input and output files and all options. ### Quick start To use Ped-sim to simulate from the def file `example/second_deg.def`: 1. Obtain a genetic map. For humans, links and code to generate a sex-specific map in Ped-sim format are [below](https://github.com/williamslab/ped-sim/blob/master/#map-file). 2. Run Ped-sim: ./ped-sim -d example/second_deg.def -m refined_mf.simmap \ -o output --intf interfere/nu_p_campbell.tsv This uses the [below](https://github.com/williamslab/ped-sim/blob/master/#map-file) genetic map, and [human crossover interference parameters](https://github.com/williamslab/ped-sim/blob/master/https://www.nature.com/articles/ncomms7260) stored in `interfere/`. The `output.seg` file is the primary result of this run and lists the IBD segments the samples share. This example does not include any input genetic data and so does not produce any output genetic data. ------------------------------------------------------ Compiling --------- Ped-sim requires **either** the [boost](https://github.com/williamslab/ped-sim/blob/master/https://www.boost.org/) developmental libraries _or_ [GSL](https://github.com/williamslab/ped-sim/blob/master/https://www.gnu.org/software/gsl/) (see below regarding GSL). With one of these libraries in place (and an update to the Makefile for GSL), most Linux/Unix-based users should simply be able to compile by running make Other systems may require editing of the Makefile or alternate means of compiling. **GSL**: our analyses suggest that Ped-sim runs a bit faster with GSL (although the component that uses boost/GSL is quite fast regardless). Use the following to compile with the GSL library: cp Makefile-gsl Makefile make ------------------------------------------------------ Def file -------- The def file defines the pedigree structure(s) to be simulated. Comments are allowed on a line by themselves beginning with #. Example def files are in the `example/` directory, and [descriptions of the example files are below](https://github.com/williamslab/ped-sim/blob/master/#example-def-file-examplecousins-1st_half_to_3rddef). **The specification below is perhaps best for more advanced users. The examples are a good place to start.** The first line of a pedigree definition contains four (or five) columns: def [name] [#copies] [#generations] `[name]` gives the name of the pedigree, which must be unique for each pedigree structure in a given simulation run (i.e., a given def file). The simulator uses this to generate the simulated individuals' sample ids (details in [Sample ids for simulated individuals](https://github.com/williamslab/ped-sim/blob/master/#samp-ids)). `[#copies]` gives the number of replicate simulations of the given pedigree structure to produce. While the replicates all have the same structure, they will descend from different founders and will have different randomized sex assignments (when using sex specific maps and assuming `` is not given), and so are independent. `[#generations]` indicates the number of generations in the pedigree. `` is an optional field giving the sex (F for female, M for male) of the individual with id `i1` (the reproducing individual) in each branch. See [Sample ids for simulated individuals](https://github.com/williamslab/ped-sim/blob/master/#samp-ids). After this first line, the def file lists simulation details corresponding to various generations in the pedigree. Each such line has the following format: [generation#] [#samples_to_print] <#branches> `[generation#]` gives an integer value for the pedigree generation number. This value can range from 1 (the earliest generation) to the total number of generations included in the pedigree, as listed on the first line of the definition (`[#generations]` just above). `[#samples_to_print]` indicates how many samples the simulator should print for each branch (see below for a definition of "branch") in the indicated generation; it defaults to 0 so that Ped-sim prints no individuals in generations not explicitly listed. All individuals in a given branch and given generation have the same parents and so are full siblings of one another. Because only one member of each branch can have children, setting this to a value greater than 1 generates data for individuals that do not have any offspring. **To simulate a pedigree in which multiple full siblings each have children, increase the number of branches in the third `<#branches>` field.** Note that any _founder_ spouses of the person who does have children in a branch are always printed if this field is greater than 0. These spouses do not count in the value of this field: the field gives the number of full siblings to generate for each branch. Also note that if a branch contains a founder individual (such as in generation 1), it will only ever contain one individual (and any spouses of that person): the `[#samples_to_print]` value will only control whether (if it is greater than 0) or not (if it is 0) Ped-sim prints that founder and his/her spouses. **Note: it is possible to print the members of specific branches (instead of all individuals) in a given generation. See the "no-print" branch specification described below.** `<#branches>` is an optional field. By default: * Generation 1 has one branch that contains a founder individual, and generation 2 has two branches that are both children of the founder individual and his/her spouse from generation 1; thus they are full siblings. * Other than generations 1 and 2, every generation includes the same number of branches as the previous generation. In consequence, not all generations need an explicit listing in the def file. For generation 1, multiple branches are allowed, and all such branches contain only founder individuals. For all other generations, if the `branch_specifications` field (described below) is empty, the parents of each branch are as follows: * If the number of branches is *an integer multiple* n *times the number of branches in the previous generation*, individuals in branch *i* in the previous generation are the parents of branches *n\*(i-1)+1* through *n\*i* in the current generation. (Thus, *if the number of branches is the same*, individuals in each branch *i* in the previous generation are the parents of branch *i* in the current generation.) * If the number of branches is *less than the number of branches in the previous generation*, individuals in each branch *i* in the previous generation are the parents of branch *i* in the current generation. (Thus some branches in the previous generation do not have children.) * If the number of branches is *greater than but not divisible by the number of branches in the previous generation*, branches 1 through *n\*p* have parents assigned according to the integer multiple case above; here *p* is the branch number in the previous generation and *n* is the largest integer divisor by *p* of the number of branches in the current generation. The remaining branches *n\*p+1* through *n\*p+r* contain founder individuals (as in generation 1), where *r* is the remainder branch number after integer division. The above are defaults, and the parents of a branch can be assigned in the branch specifications. `` is an optional set of one or more fields containing (a) no-print branches, (b) sex assignments, and/or (c) non-default parent assignments for a set of branches. By default, all branches have the same number of individuals printed (given in the `[#samples_to_print]` field), and the sexes are assigned randomly. **No-print branches** have the format: [current_branches]n **Sex assignments** have two possible formats for males and females, respectively: [current_branches]sM [current_branches]sF For example, `2sM` says that branch 2 in the current generation should be male and `1,3-5sF` indicates that branches 1, 3, 4, and 5 should be female. (See just below for more detail on `[current_branches]`.) _Note_ these branch-specific sex assignments override the `` field that appears on the `def` line (see above). **Parent assignments** have any of the following formats: [current_branches]: [current_branches]:[parent_branch1] [current_branches]:[parent_branch1]_[parent_branch2] [current_branches]:[parent_branch1]_[parent_branch2]^[parent_branch2_generation] In all three cases, `[current_branches]` contains a range of branches from the current generation who should either not be printed, should have their sex assigned or whose parents are assigned after the `:` character. This can be a single branch or comma separated list of branches such as `1,2,3` or, for a contiguous range, you can use a hyphen as in `1-3`. Any combination of contiguous ranges and comma separated sets of branches are allowed such as `2-5,7,9-10`. For parent assignments: If no text appears after the ':', the indicated branches will contain founder individuals. For example, `1-3,5:` specifies that branches 1 through 3 and 5 should contain founders. If only `[parent_branch1]` is listed, the reproducing parent from that branch in the previous generation has children with a founder spouse. So for example, `1,7:2` indicates that branches 1 and 7 will be the children of an individual from branch 2 in the previous generation and a founder spouse. Because these branches are listed together, they will contain full siblings. To generate these branches as half-sibling children of branch 2, the specification should be `1:2 7:2`. Here, branch 2 contains the parent of both individuals, but the separate specifications for branches 1 and 7 ensures that that parent has children with two different founder spouses, making the children in the branches half-siblings. If two parent branches are listed as in `[parent_branch1]_[parent_branch2]`, the two parents are from the indicated branches in the previous generation. Thus, for example, `2,4:1_3` indicates that branches 2 and 4 from the current generation are to be the children of the reproducing individuals in branches 1 and 3 in the previous generation. To have parents from different generations, the format is `[parent_branch1]_[parent_branch2]^[parent_branch2_generation]`. Here, one parent (the first one listed) is required to be in the previous generation and the second parent comes from some other generation. Because the children are in the current generation, the generation of both parents must be earlier than the current one. As an example `2:1_3^2` indicates that branch 2 in the current generation has parents from branch 1 in the previous generation and branch 3 from generation 2. The simulator keeps track of the constraints on the sex of the parents implied by the requested matings and will give an error if it is not possible to assign sexes necessary to have offspring. For example, `1:1_3 2:1_4 3:3_4` is impossible since the reproducing individuals in branches 3 and 4 must be the same sex in order to both have children with the individual in branch 1. ### Example def file: `example/cousins-1st_half_to_3rd.def` The first def entry in `example/cousins-1st_half_to_3rd.def` is def full-1cousin 10 3 3 1 The first line names the pedigree `full-1cousin`, and calls for 10 replicate pedigrees to be generated. The last column, 3, says that the `full-1cousin` pedigree spans three generations. The following is a plot of the `full-1cousin` pedigree, with generations labeled and outlined in red, branches labeled and outlined in blue, and `i1` individuals circled in purple (in generations 1 and 2). Only the individuals in generation 3 are printed, and these individuals' shapes are filled in black; non-printed individuals are unfilled. The sexes of these individuals are random. (Use the [plot-fam.R](https://github.com/williamslab/ped-sim/blob/master/#plotting-pedigree-structures-plot-famr) script to generate black and white portions of this plot for your def file.) ![Pedigree plot of full-1cousin](https://github.com/williamslab/ped-sim/blob/master/example/full-1cousin1.png?raw=true "Pedigree plot of full-1cousin") This definition does not mention generations 1 and 2 (the line that reads `3 1` refers to generation 3), so those generations have the default number of branches and do not have data printed for the individuals in them. By default, generation 1 has one branch that contains one random individual (the `i1` individual) and all spouses of this person. (For this pedigree, the generation 1 branch contains only one couple.) Generation 2 has the default two branches, with the `i1` individuals in these branches being the children of the branch in generation 1 (strictly speaking, of the couple in that branch). This means that the `i1` individuals in these branches are full siblings of each other. The def line `3 1` says that in generation 3, 1 sample per branch should be printed, and it does not specify the number of branches in this generation. This means that generation 3 also has the default branch count, which is assigned to be the same as the previous generation, or two branches. These branches contain the children of the `i1` individuals in the corresponding branches in the last generation, so generation 2, branch 1's child is in generation 3, branch 1, and generation 2, branch 2's child is in generation 3, branch 2. This completes the definition of the pedigree, which will print a pair of first cousins. The next two definitions are for second and third cousins: def full-2cousin 10 4 4 1 def full-3cousin 10 5 5 1 These pedigrees differ from the first cousin pedigree in their names and numbers of generations: 4 and 5 for second and third cousins, respectively. Like the first cousin pedigree, they use the default branch counts for all generations. This means that generation 1 contains one branch, and all other generations have two branches. When successive generations have the same number of branches, branch _i_ in one generation contains the parents of branch _i_ in the next generation. (So branch 2's parents are in the previous generation's branch 2.) The `4 1` and `5 1` lines specify that one sample per branch should be printed in these generations, and lead to the production of the second and third cousins as needed. These two-line def entries are perhaps the simplest type and generate pairs of full cousins of any distance (determined by the number of generations). Ped-sim also generates half-cousins, and the def file contains two more entries for printing half-first and half-second cousins. These involve a few more instructions: def half-1cousin 10 3 2 0 2 1:1 2:1 3 1 This specifies a pedigree with the name `half-1cousin` with 10 replicate copies to be produced and 3 generations in the pedigree. As with the full first cousin case, generation 1 uses the default of one branch. The first part of the generation 2 definition reads `2 0 2`. The 0 indicates that no samples from generation 2 should be printed, and the third column says that this generation has 2 branches. These are in fact the default settings, but are explicitly listed ahead of the second, non-default part of this line. The latter half of the generation 2 definition reads `1:1 2:1`. Here, `1:1` says that the current generation's branch 1 `i1` individual is the child of the previous generation's (generation 1's) branch 1. Similarly, `2:1` says that the current generation's branch 2 `i1` individual is also a child of the previous generation's branch 1. So both branches in generation 2 are children of the same person, but because the specifications are separated, they are children of two spouses, so produce half-siblings. In contrast, if this line instead specified `1,2:1`, the branches would be full siblings. With the two branches in generation 2 containing half-siblings, the remainder of the definition is the same as for full cousins, with `3 1` indicating that in gene ... ...

近期下载者

相关文件


收藏者