如何计算群体中的单倍型频率
大家好,我是邓飞。
昨天写了一篇(单倍型的显著性分析)的博文,里面介绍了为什么GWAS分析后,要进行单倍型的显著性分析,简而言之,如果显著性位点在block中,以block为代表进行利用,可以进行PRS(多基因评分)或者MAS(分子标记辅助选择。
1,数据
数据是plink格式的map和ped格式的基因型数据。
测试数据下载:
有54个位点,310个个体。
2,计算单倍型
使用plink1.9,通过参数--blocks计算单倍型
代码语言:javascript代码运行次数:0运行复制$ plink --file a19 --blocks no-pheno-req
PLINK v1.90b4.6 64-bit (15 Aug 2017) www.cog-genomics/plink/1.9/
(C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink.log.
Options in effect:
--blocks no-pheno-req
--file a19
15488 MB RAM detected; reserving 7744 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (54 variants, 310 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
54 variants loaded from .bim file.
310 people (0 males, 0 females, 310 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 310 founders and 0 nonfounders present.
Calculating allele frequencies... done.
54 variants and 310 people pass filters and QC.
Note: No phenotypes present.
--blocks: 8 haploblocks written to plink.blocks .
Extra block details written to plink.blocks.det .
Longest span: 102.68kb.
从日志可以看出,共有8个单倍型,打印结果:
代码语言:javascript代码运行次数:0运行复制]$ cat plink.blocks.det
CHR BP1 BP2 KB NSNPS SNPS
19 26609728 26619327 9.6 4 QTLSaanen19.135_O|QTLSaanen19.137_O|QTLSaanen19.26_O|QTLSaanen19.28_O
19 26647834 26659230 11.397 2 QTLSaanen19.32|GoatD01.005522
19 26818809 26824298 5.49 2 QTLSaanen19.45_O|QTLSaanen19.46
19 27170279 27214437 44.159 5 snp24006-scaffold244048787|QTLSaanen19.49|QTLSaanen19.50_O|QTLSaanen19.51|QTLSaanen19.53_O
19 27220123 27228780 8.658 2 QTLSaanen19.54_O|QTLSaanen19.55
19 27480793 27482976 2.184 2 snp24012-scaffold244-1259949|19_27482976_AF-PAKI
19 27502643 27605322 102.68 15 QTLSaanen19.63_O|QTLSaanen19.64|QTLSaanen19.65_O|snp24013-scaffold244-1309587|QTLSaanen19.67|QTLSaanen19.68|QTLSaanen19.69_O|snp24014-scaffold244-1338180|QTLSaanen19.70_O|QTLSaanen19.71|QTLSaanen19.74|QTLSaanen19.76_O|QTLSaanen19.77_O|QTLSaanen19.78_O|snp24015-scaffold244-1384770
19 27618016 27623534 5.519 3 QTLSaanen19.79|QTLSaanen19.79_O|QTLSaanen19.80
3,计算每个block的频率
使用plink1.07,参数:--hap-freq
代码语言:javascript代码运行次数:0运行复制$ plink1.07 --file a19 --hap plink.blocks --noweb --hap-freq
@----------------------------------------------------------@
| PLINK! | v1.07 | 10/Aug/2009 |
|----------------------------------------------------------|
| (C) 2009 Shaun Purcell, GNU General Public License, v2 |
|----------------------------------------------------------|
| For documentation, citation & bug-report instructions: |
| / |
@----------------------------------------------------------@
Skipping web check... [ --noweb ]
Writing this text to log file [ plink.log ]
Analysis started: Wed Apr 2 07:56:49 2025
Options in effect:
--file a19
--hap plink.blocks
--noweb
--hap-freq
54 (of 54) markers to be included from [ a19.map ]
Warning, found 310 individuals with ambiguous sex codes
These individuals will be set to missing ( or use --allow-no-sex )
Writing list of these individuals to [ plink.nosex ]
310 individuals read from [ a19.ped ]
0 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
0 cases, 0 controls and 310 missing
0 males, 0 females, and 310 of unspecified sex
Before frequency and genotyping pruning, there are 54 SNPs
310 founders and 0 non-founders found
Total genotyping rate in remaining individuals is 1
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 54 SNPs
After filtering, 0 cases, 0 controls and 310 missing
After filtering, 0 males, 0 females, and 310 of unspecified sex
Read 8 haplotypes from [ plink.blocks ]
Estimating haplotype frequencies/phases ( MHF >= 0.01 )
Considering phases P(H|G) >= 0.01
Requiring per individual per haplotype missingness < 0.5
Writing haplotype frequencies to [ plink.frq.hap ]
8 out of 8 haplotypes phased
Analysis finished: Wed Apr 2 07:56:49 2025
查看结果文件:
代码语言:javascript代码运行次数:0运行复制$ cat plink.frq.hap
LOCUS HAPLOTYPE F
H1 ACCC 0.05323
H1 GATC 0.35
H1 GATT 0.5968
H2 GC 0.4821
H2 AT 0.4885
H2 GT 0.01956
H3 CA 0.1339
H3 AG 0.8661
H4 CGGTT 0.1758
H4 TAGCC 0.01867
H4 CAACC 0.0477
H4 TAACC 0.7442
H5 GT 0.1774
H5 AC 0.4613
H5 GC 0.3613
H6 AT 0.1806
H6 GT 0.1435
H6 GC 0.6758
H7 TGCCAATCTTACAAT 0.1871
H7 ATTTGGTTCCCAGGC 0.25
H7 ATTTGGCTCCCAGGC 0.1581
H7 ATTTGGTCTCACAAT 0.04355
H7 ATTTGGTCTTACAAT 0.3323
H7 ATTTGGTTCCACAAT 0.01774
H8 AAG 0.1371
H8 GGT 0.1242
H8 AGT 0.3323
H8 AAT 0.4065
可以看到,H1~H8是八个block,每个block里面包括不同的单倍型以及对应的频率,比如第一个block,包括3个类型:
- ACCC
- GATC
- GATT
对应的频率分别是:0.05323, 0.35和0.5968
把对应的基因型提取出来,确定每个样本的单倍型类型,结合表型数据,就可以进行单倍型间的显著性分析了,进而绘制箱线图、小提琴图等图了。
这种图:
这种图:
或者这种图:
发布者:admin,转转请注明出处:http://www.yc00.com/web/1747982774a4714731.html
评论列表(0条)