跳至正文
首页 » LD score regression

LD score regression

LD score regression is method widely used in summary statistics to estimate the heritability of a trait (Bulik-Sullivan, Po-Ru Loh, et al. 2015) and the genetic correlation among traits (Bulik-Sullivan, Finucane, et al. 2015). It doesn’t require individual level genotype data, only the GWAS summary data and the LD scores calculated from reference panel are involved, thus it’s pretty time-efficient and memory-saving compared with REML or HE regression.

1. Heritability estimation

./hiblup --ldreg
         --sumstat demo.ma   #the summary data
         --lds demo.ldsc     #the pre-computed LD scores
         --out demo

As shown above, the summary data file and LD scores file should be provided, the file format of summary data and how to calculate/make LD scores can be found at other tutorial chapters (i.e., summary data and LD scores). Please always remember not to delete SNPs in LD scores file to keep it consistent with that in summary data file, just leave it as it is, because the total number of SNPs used to calculate this LD scores is quite crucial for LD score regression.

The estimated results are stored in a file named “demo.ldsr.h2“, overview of this file:

Item	Intercept	Intercept_SE	h2	h2_SE	h2_Pval
demo	1.08285	0.011433	0.122826	0.00393122	2.71554e-214

The ‘Intercept‘ is associated with the population structure, the closer it is to 1, the less stratified of the population is. The ‘h2‘ is the estimated heritability of the trait, and ‘h2_Pval‘ is the p-value of chi-square testing significance.

2. Genetic correlation estimation

The usage of genetic correlation estimation is quite similar with heritability estimation, if HIBLUP detected more than one summary data file in the command, it will estimate heritability and genetic correlation automatically:

./hiblup --ldreg
         --sumstat demo1.ma demo2.ma demo3.ma   #the summary data of multiple traits, use space as separator
         --lds demo.ldsc     #the pre-computed LD scores
         --out demo

Note that the number of summary data of traits is not limited. Two files will be generated in the work directory: the file “test.ldsr.h2” recorded the estimated heritability of trait as described before; and the file “test.ldsr.rg” stores the genetic correlation of pairs of traits, overview of this file:

Item	CovG	CovG_SE	Intercept	Intercept_SE	rG	rG_SE	rG_Pval
demo1:demo2	0.0252414	0.00716841	0.0166108	0.00819384	0.141956	0.0403148	0.000429607
demo1:demo3	0.0744506	0.00892225	0.102821	0.00671784	0.296832	0.0355727	7.15917e-17
demo2:demo3	0.262124	0.0340515	0.315166	0.0112682	0.608769	0.0790827	1.38348e-14

The ‘CovG‘ is the genetic covariance, ‘rG‘ is the estimated genetic correlation between traits, and ‘rG_Pval‘ is the p-value of chi-square testing significance.

The relevant options can be specified by users for LD score regression:

  • --M: to specify the number of SNPs in LD score regression. By default, HIBLUP use the number of SNPs in the LD score file with MAF between 5% and 50% as it is suggested by Bulik-Sullivan.
  • --chisq-max: to specify the maximum threshold of X2 for the first step estimator of intercept, the default is 30.
  • --intercept-h2: to constrain the intercept with a constant rather than estimating it from data for heritability estimation.
  • --intercept-gencov: to constrain the intercept with a constant rather than estimating it from data for genetic correlation estimation.