跳至正文

Summary level BLUP (SBLUP)

SBLUP model is a method to estimate marker effects using summary data from a GWAS/meta-analysis and LD matrix derived from a reference panel with individual-level data (Robinson, et al. 2017). The summary data should be prepared in COJO format, as described here. There are two ways to run SBLUP model by HIBLUP, see details as follows:

1. Run SBLUP using genotype data

The first way to run SBLUP model is to use the genotype data provided directly:

./hiblup --sblup
         --sumstat demo.ma   #the summary data
         --bfile demo 
         --window-bp 1e6
         --h2 0.3234
         --threads 16
         --out demo

The command --h2 is the heritability of the trait in analysis, which can be estimated from REML if the individual-level data are available or from LD score regression using the summary data. The command --window-bp is used to specify the size of non-overlapped window (default 1Mb), in which the number of SNPs is not fixed, users can replace it by --window-num to specify a fixed number of SNPs in a window, but the size of window is not constant in this case, or just use --window-geno to define all SNPs across entire genome as one window. If the number of SNPs in a defined window size is pretty large (e.g., over 10k), it is recommended to add flag --pcg for fast computing of SNP effects in analysis.

2. Run SBLUP using pre-computed LD correlation matrix

Instead of using genotype data, using the LD correlation matrix to fit SBLUP model is more straight-forward. Although this strategy is more computationally efficient and memory-saving than the first one, it should be noted that all the SNPs should satisfy the Hardy-Weinberg equilibrium. If not, the estimated SNP effects would be biased, resulting in a bad prediction performance.

./hiblup --sblup
         --sumstat demo.ma   #the summary data
         --ldm demo_ldm      #the pre-computed LD correlation matrix
         --h2 0.3234
         --threads 10
         --out demo

The command --ldm is used to specify the LD correlation matrix, which could be computed by HIBLUP using the individual-level genotype data, see more details here. Also, If the number of SNPs in a window is pretty large (e.g., over 10k), it is recommended to add flag --pcg for fast computing of SNP effects in analysis.