Construct relationship matrix (XRM)

HIBLUP can construct XRM from pedigree and genomic information, the output binary file has two types of storage format. For dense matrix, the values at lower triangle of relationship matrix are stored one by one, one element take over 4 bytes, and for sparse matrix, only non-zero values (k) at lower triangle of relationship matrix and its row (i) and column index (j) are stored in order of "ijk" format in binary file, that means one element will take over 12 bytes.

Pedigree based relationship matrix (PRM)

# construct A and its inverse simultaneously
./hiblup --make-xrm 
         --pedigree demo.ped
         --add --add-inv 
         --thread 32 
         --out demo 

Files named demo.PA.bin, demo.PA.id, demo.PAinv.sparse.bin, demo.PAinv.sparse.id will be generated in the work directory.

Sometimes when computing the inverse of XRM, it may encounter an error saying:

 Error: matrix is not invertible,…

that is because that the matrix is not positive-defined, users can specify a small value to --ridge-value by using ridge regression method to address this problem.

Genome based relationship matrix (GRM)

The relationship matrix could also be derived from genome-wide markers. The mathematical formula of constructing a genomic relationship matrix (GRM) is as follows:

\begin{equation}
\begin{aligned}
G &= \frac{ZZ^\top}{tr(ZZ^\top)/n}
\end{aligned}
\end{equation}

where Z is the numeric coded genotype matrix with a dimension of n by m, n is the number of individuals and m is the number of markers, tr is the the trace of the matrix. Here in HIBLUP, total 4 methods have been achieved to obtain matrix Z for additive and dominant genetic effect, the way of coding the genotype for different methods can be referred to the following table:

Table 1. The way of coding genotype for different methods

where p_{A} and p_{a} are the allele frequency of A and a, respectively. p_{AA}, p_{Aa} and p_{aa} are the genotype frequency of AA, Aa, aa, respectively.

Users can easily switch to any of method above by specifying the flag --code-method with corresponding value, e.g. --code-method 2. The reference for different methods can be found at the following papers:

General GRM

The script to construct additive and dominant genomic relationship matrix:

# construct A and D simultaneously
./hiblup --make-xrm --code-method 2
         --bfile demo 
         --add --dom 
         --step 10000 
         --thread 32 
         --out demo

Files named demo.GA.bin, demo.GA.id, demo.GD.bin, demo.GD.id will be generated in the work directory, and HIBLUP will directly compute the inverse of the GRM and output it if --add-inv or --dom-inv is detected in the command.

Weighted GRM

HIBLUP supports to compute the weighted GRM by the following script:

./hiblup --make-xrm --code-method 1
         --bfile demo 
         --add --dom 
         --snp-weight snp.wt.txt 
         --thread 32 
         --out demo 

The file format of SNP weights please refer to here.
Users can assign a file listing the names of SNPs to the flag --extract or --exclude to construct GRM by a subset of SNPs.

Pedigree and Genome based relationship matrix (HRM)

When pedigree and genome information are provided simultaneously, HIBLUP will automatically construct H matrix following the equation below (Aguilar 2010, Christensen 2010):

\begin{equation}
H =
\begin{bmatrix}
A_{11} + A_{12} A_{22}^{-1} (G_w - A_{22}) A_{22}^{-1} A_{21} & A_{12}A_{22}^{-1}G_w \\
G_wA_{22}^{-1}A_{21} & G_w \\
\end{bmatrix}
\end{equation}

where A_{11} and A_{22} are the submatrices of A matrix for non-genotyped and genotyped individuals, respectively; A_{12} and A_{21} are the submatrices of A matrix, which represent the relationships between genotyped and nongenotyped individuals, G_w = (1 - alpha) * G_{adj} + alpha * A_{22}, G_{adj} is adjusted from the original genomic relastionship matrix G by simple linear regression:

\begin{equation}
\begin{aligned}
Avg(diag(G)) * a + b = Avg(diag(A_{22})) \\
Avg(offdiag(G)) * a + b = Avg(offdiag(A_{22}))
\end{aligned}
\end{equation}

Then we have G_{adj} = G * a + b.

./hiblup --make-xrm 
         --pedigree demo.ped 
         --bfile demo 
         --alpha 0.05 
         --add --dom 
         --out demo

The above command will generate additive and dominant relationship matrix simultaneously, the filenames are: demo.HA.bin, demo.HA.id, demo.HD.bin, demo.HD.id.
Users can also specify --snp-weight to construct a weighted GRM for constructing HRM.
Also, HIBLUP can construct the inverse of H matrix by following script:

./hiblup --make-xrm
         --pedigree demo.ped 
         --bfile demo 
         --alpha 0.05 
         --add-inv --dom-inv 
         --out demo

the generated filenames are: demo.HAinv.sparse.bin, demo.HAinv.sparse.id, demo.HDinv.sparse.bin, demo.HD.sparse.id.

XRM for environmental random effect

Except for constructing XRM for genetic random effect, HIBLUP can also construct XRM for environmental random effect, phenotype file should be provided for this case, the command is as following:

./hiblup --make-xrm
         --pheno demo.phe
         --rand 6,7
         --out demo

the generated filenames are: demo.loc.sparse.bin, demo.lco.sparse.id, demo.dam.sparse.bin, demo.dam.sparse.id.

NOTE

(1) All above relationship matrix, except for those whose prefix contain character ".sparse.", can be used by GCTA, LDAK, but additional processes are needed, for example:

awk '{print $1,$1}' demo.GA.id > gcta.grm.id
cp demo.GA.bin gcta.grm.bin
# then use it by GCTA
./gcta64 ... --grm gcta ...

(2) The binary format relationship matrix calculated by GCTA or LDAK can be used directly by HIBLUP without any format transformation, just assign its prefix to flag --xrm.