## User Manual of HIBLUP

### 1.1.What is HIBLUP

HIBLUP (say "Hi!" to BLUP), which is called “天权”in Chinese, is an user-friendly software that provides estimated genetic value of each individual by maximizing the usage of information from pedigree, genome, and phenotype, as well as all process-related functions, such as construction of relationship matrix, estimation of variance components (heritability) of traits and co-variance components (genetic correlation) between traits using various algorithms, solution of mixed model, computation of SNP effects, and implement of genomic mating. Users can easily achieve simple linear model (LM), best linear unbiased predictors (BLUP), pedigree basedBLUP (PBLUP), genomic BLUP (GBLUP), and single-step GBLUP (SSGBLUP) for single trait model, repeated traits model, and multiple traits model. The functions of HIBLUP will keep on being enriched with more functionalities based on users feedback.

### 1.2.Where to download HIBLUP

The latest released executable file of HIBLUP for different platforms can be downloaded from:

https://www.hiblup.com/download.

It is free of charge for academic purposes. For commercial purposes, please contact the development team.

### 1.3.How to cite HIBLUP

The manuscript is on its way to submission.

### 1.4.Contact us

HIBLUP is developed at **Zhao lab** of Huazhong Agricultural University.

Design and Maintenance team:

*Lilin Yin ^{#}, Haohao Zhang^{#}, Zhenshuang Tang, Dong Yin, Yuhua Fu, Xiaohui Yuan, Xinyun Li^{*}, Xiaolei Liu^{*}, Shuhong Zhao^{*}*

Questions, suggestions, and bug reports are welcome and appreciated, we recommend sending your message by filling the form in the following page:

https://www.hiblup.com/contact-us

If it cannot fully express your question or you get a trouble from the above page, please feel free to send us email by:

**Lilin Yin** **(尹立林)**: ylilin@mail.hzau.edu.cn**HaoHao Zhang (张浩浩)**: haohaozhang@whut.edu.cn**Xiaolei Liu (刘小磊)**: xiaoleiliu@mail.hzau.edu.cn

### 2.1.Phenotype file

Different with other software, the phenotype file of HIBLUP should include not only the phenotypic records, but also the covariates, fixed and random effects. The first column must be the individual id, header should be included in the file. Brief view of example phenotype file is as following:

id sex season day weight loc dam tr1 tr2 tr3 Ind1 F Winter 100 2.9 l3 Ind902 73.9776 21.3252 35.7737 Ind2 M Summer 100 1.5 l2 Ind902 94.1994 19.1162 54.2453 Ind3 M Winter 100 1.8 l4 Ind903 88.6698 5.9538 71.0284 Ind4 M Winter 100 1.9 l1 Ind903 87.0458 -2.8179 37.2703 Ind5 M Spring 100 2.2 l3 Ind905 68.8077 2.5357 59.5402 Ind6 F Autumn 100 1.9 l1 Ind905 85.9242 -3.06 48.4822

Any symbol below will be treated as missing when loading the file:

NA Na . - NaN NAN nan na N/A n/a <NA>

The file should be assigned to flag `--pheno`

.

By default, HIBLUP will take the trait at second column for analysis, users can specify a different column by using `--pheno-pos n`

, for example, `--pheno-pos 8`

represents using the 8th column for analysis.

**NOTE**

If there are multiple traits to be used for estimating genetic correlation, please use space as separator, e.g. `--pheno-pos 8 9 10`

.

### 2.2.Covariates, fixed and environmental random effects

For covariates, fixed effect and environmental random effect of a trait, please specify its position at phenotype file to flag `--qcovar`

, `--dcovar`

, `--rand`

respectively.

For single trait, please use comma as separator, for example:`--qcovar 4,5`

: take "*day*" and "*weight*" as covariates`--dcovar 2,3`

: take "*sex*" and "*season*" as fixed effects`--rand 6`

: take "*loc*" as random effect

For multiple traits, if there are some environmental factors needed to be fitted as a model effect, all traits should be specified at this model effect, and please use comma as separator within a trait, and use space among traits, use 0 for a trait if those factors are not considered for it, take fixed effect for an example:

./hiblup ... --dcovar 2,3 0 2,3,6 ...

which means there are 3 traits in analysis, the first trait has two fixed effects: "*sex*" and "*season*"; the second trait has no fixed effect; the third trait has three fixed effects, which are "*sex*", "*season*" and "*loc*".

**IMPORTANT**

`--qcovar`

is used to specify the environmental effects as quantitative covariates, the records for the effects should be in digits, any character is not allowed.

`--dcovar`

is used to specify the environmental effects as discrete covariates, the records for the effects could be in either digits or characters, both are acceptable.

`--rand`

is used to specify the environmental random effects, which come from the recorded environmental factors, the genetic random effects are from either pedigree or genomic information, if pedigree or genotype file is provided, HIBLUP will fit the genetic random effect in the model automatically, users DO NOT need to add the column of individuals id as random effect here.

### 2.3.Pedigree file

Pedigree file is used for deriving numerator relationship matrix and inbreeding coefficient. Three columns in file are required orderly. The first column is the individual id, the second and third columns are its father and mother id respectively. As the same with phenotype file, any symbol of the list above appears in the pedigree file will be treated as missing. An example file for pedigree is as following:

id father mother Ind2 NA NA Ind5 Ind1 NA Ind11 NA Ind2 Ind17 Ind11 Ind5 Ind22 Ind1 Ind5 Ind45 Ind22 Ind2

The file should be assigned to flag `--pedigree`

.

**NOTE**

The order of individual id in the first column can be ranked by birth date, but it is not a compulsive requirement, as HIBLUP has the automatic function of sorting and rearrangement on the input pedigree.

### 2.4.Genotype file

HIBLUP currently only accept the genotype in PLINK binary format, e.g. * demo.fam*,

*and*

`demo.bim`

*, please see more details at PLINK user manual. Users can convert any other format of genotype (e.g.*

`demo.bed`

*VCF*,

*HapMap*,

*PED/MAP*) to binary format by PLINK2 and TASSEL:

# HapMap to VCF by TASSEL ./run_pipeline.pl -SortGenotypeFilePlugin -inputFile demo.hmp.txt -outputFile demo.sort.hmp.txt -fileType Hapmap ./run_pipeline.pl -fork1 -h demo.sort.hmp.txt -export-exportType VCF -runfork1 # VCF to Binary ./plink2 --vcf demo.vcf --make-bed --out demo # PED/MAP to Binary ./plink2 --ped demo.ped --map demo.map --make-bed --out demo

Those binary files should be assigned to flag `--bfile [prefix]`

.

**NOTE**

**(1) **HIBLUP only supports to load one binary file, does not support to load more than one binary files (e.g. whole genome is stored separately by chromosome), please merge them by PLINK prior to using HIBLUP, a rough guidance can be found here.**(2)** HIBLUP will code the genotype A1A1 as 2, A1A2 as 1, and A2A2 as 0, where A1 is the first allele of each marker in * *.bim* file, therefore the estimated effect size is on A1 allele, users should pay attention to it when a process involves marker effect.

**IMPORTANT**

HIBLUP has no function of imputation, any missing genotype in binary file will be treated as heterozygote, that means missings will be forced to be coded as 1 in analysis, we suggest users to implement imputation by other software prior to using HIBLUP if missings exist in genotype.

### 2.5.Relationship matrix file

HIBLUP currently only accept Relationship Matrix (XRM) in binary format, which has the most efficiency of file writing and data loading, and costs the least usage of disk space, e.g. * demo.GA.id* and

*.*

`demo.GA.bin`

The file * *.id* has one column listing the id of all individuals, the file

*store the lower triangle elements of the XRM for dense matrix, and the file*

`*.bin`

*store the row index, column index, and triangle elements of the XRM for sparse matrix.*

`*.sparse.bin`

By default, those files is generated by `--make-xrm`

in HIBLUP (see Construct relationship matrix).

Additionally, we have developed a file format converter for making binary relationship matrix, please refer to corresponding section (see Format conversion of XRM). To use the binary XRM by HIBLUP, just assign its prefix to the option `--xrm [prefix]`

:

./hiblup ... --xrm demo.GA ...

If there are more than one XRMs, please use comma as separator between prefix:

./hiblup ... --xrm demo.GA,demo.GD ...

**IMPORTANT**

HIBLUP has the function of computing the inverse of XRM, which maybe useful for other software (e.g. BLUPF90, DMU, ASReml, ...), but it's useless for HIBLUP, as HIBLUP uses XRM directly to estimate variance components or solve mixed model equation, therefore please do not assign an inverse of relationship matrix to HIBLUP, unless you know what you are doing.

### 2.6.Genomic estimated breeding value file

Genomic estimated breeding value (GEBV) is used for computing additive and dominant SNP effect, the first column should be the sample id, and the rest columns are the additive or dominant GEBVs, an example with three individuals is as following:

id add Ind1 -1.1345 Ind2 0.3245 Ind3 0.0234

The file should be assigned to flag `--gebv`

.

**NOTE**

If `--add`

is specified, additive breeding value should be provided, so does the flag `--dom`

. If both `--add`

and `--dom`

are specified, the breeding value for additive and dominance should be provided in file, the column of additive breeding value should be at the front of dominant breeding value.

### 2.7.SNP weighting file

SNPs weighting file is used for constructing weighted XRM or computing SNP effect, the first column should be the SNP id, and the rest columns are the additive or dominant weights of SNPs, please ensure that all weights should be positive values, an example with three SNPs is as following:

SNPid weight SNP1 42 SNP2 0.3 SNP3 100

The file should be assigned to flag `--snp-weight`

.

**NOTE**

If `--add`

is specified, additive weights should be provided, so does the flag `--dom`

. If both `--add`

and `--dom`

are specified, the weights of all SNPs for additive and dominance should be provided in file, the column of additive weights should be at the front of dominant weights.

### 2.8.SNP effect file

SNPs effect file is used for genomic mating, at least 5 columns are required, an example is as following:

SNPid a1 a2 freq add dom SNP1 A C 0.1243 0.30134 0.00000 SNP2 G T 0.0345 -0.06324 0.00124 SNP3 A G 0.3635 0.15425 -0.00913

The file should be assigned to flag `--score`

.

**NOTE**

As mentioned above, the provided type of SNP effect should be consistent with the flag `--add`

and `--dom`

, and the column of additive SNP effect should be at the front of dominant SNP effect if both of them are provided.

### 2.9.Sample and SNP filter file

HIBLUP can filter the sample and SNPs in analysis for all available functions, the file is easy to prepare, just list the name of individual or the id of SNP in file with no header.

Example file * id.txt*:

Ind16 Ind56 Ind110

if users are supposed to remove those individuals in analysis, just assign it to flag `--remove`

. Contrarily, if it is assigned to flag `--keep`

, only those individuals in file will be used for analysis.

Example file * snp.txt*:

SNP23 SNP198 SNP432

if users are supposed to remove those SNPs in analysis, just assign it to flag `--exclude`

. Contrarily, if it is assigned to flag `--extract`

, only those SNPs in file will be used for analysis.

### 3.Running HIBLUP

**Multi-thread computing****Allele frequency and Genotype frequency****Homozygosity and Heterozygosity****Construct relationship matrix (XRM)****Format conversion of XRM****Calculate inbreeding and relationship coefficient****Principal component analysis (PCA)****LM, BLUP, PBLUP, GBLUP(rrBLUP), SSGBLUP****Fitting interaction for effects****Single trait model****Repeat records of single trait model****Multiple traits model****Solve mixed model equation (MME)****Statistical tests of significance for effects****Reliability and PEV****Compute SNP effect****Genomic mating**

### 3.1.Multi-thread computing

We have achieved most of the procedures in HIBLUP being able to run on multiple threads. Users can easily change the number of threads used for analysis by following:

./hiblup --threads 32

If `--threads`

is not specified, HIBLUP will try to get the maximum thread number from standard OpenMP environment variable, and use all the available resource to accelerate the computational efficiency.

### 3.2.Allele frequency and Genotype frequency

HIBLUP can calculate allele frequency and genotype frequency for markers.

**Allele frequency**

If the flag `--allele-freq`

is detected in the input command, HIBLUP will calculate A1 allele frequency for all SNPs in genotype:

./hiblup --bfile demo --allele-freq --out demo

It will generate a file named `demo.afreq`

, brief view of this file:

SNP a1 a2 freq_a1 snp1 A C 0.2775 snp2 G T 0.4475 snp3 A G 0.1918

Then users can filter SNPs with a specific condition, for example, pick up SNPs at the condition of 0.05 < freq <= 0.1:

awk '$4 > 0.05 && $4 <= 0.1 {print $1}' demo.afreq > freq_filter_snp.txt

**Genotype frequency**

If the flag `--geno-freq`

is detected in the input command, HIBLUP will calculate the genotype frequency of A1A1 and A2A2 for all SNPs in genotype:

./hiblup --bfile demo --geno-freq --out demo

It will generate a file named `demo.gfreq`

, brief view of this file:

SNP a1 a2 freq_a1a1 freq_a2a2 snp1 A C 0.3456 0.1244 snp2 G T 0.2358 0.2345 snp3 A G 0.1395 0.4643

### 3.3.Homozygosity and Heterozygosity

The homozygosity and heterozygosity are defined as the proportion of homozygous and heterozygous of an individual, it can be easily obtained from HIBLUP by using the flag `--homo`

and `--hete`

, respectively.

**Homozygosity**

./hiblup --bfile demo --homo --out demo

A file named `demo.homo`

will be generated as following:

id a1a1 a2a2 IND0701 0.42255 0.201235 IND0702 0.415291 0.204923 IND0703 0.426455 0.199282

**Heterozygosity**

./hiblup --bfile demo --hete --out demo

A file named `demo.hete`

will be generated as following:

id a1a2 IND0701 0.376216 IND0702 0.379786 IND0703 0.374263

### 3.4.Construct relationship matrix (XRM)

HIBLUP can construct XRM from pedigree and genomic information, the output binary file has two types of storage format. For dense matrix, the values at lower triangle of relationship matrix are stored one by one, one element take over 4 bytes, and for sparse matrix, only non-zero values (k) at lower triangle of relationship matrix and its row (i) and column index (j) are stored in order of "ijk" format in binary file, that means one element will take over 12 bytes.

**Pedigree based relationship matrix (PRM)**

# construct A and its inverse simultaneously ./hiblup --make-xrm --pedigree demo.ped --add --add-inv --thread 32 --out demo

Files named `demo.PA.bin`

, `demo.PA.id`

, `demo.PAinv.sparse.bin`

, `demo.PAinv.sparse.id`

will be generated in the work directory.

Sometimes when computing the inverse of XRM, it may encounter an error saying:

Error: matrix is not invertible,…

that is because that the matrix is not positive-defined, users can specify a small value to `--ridge-value`

by using ridge regression method to address this problem.

**Genome based relationship matrix (GRM)**

The relationship matrix could also be derived from genome-wide markers. The mathematical formula of constructing a genomic relationship matrix (GRM) is as follows:

\begin{equation} \begin{aligned} G &= \frac{ZZ^\top}{tr(ZZ^\top)/n} \end{aligned} \end{equation}

where Z is the numeric coded genotype matrix with a dimension of n by m, n is the number of individuals and m is the number of markers, tr is the the trace of the matrix. Here in HIBLUP, total 4 methods have been achieved to obtain matrix Z for additive and dominant genetic effect, the way of coding the genotype for different methods can be referred to the following table:

where p_{A} and p_{a} are the allele frequency of A and a, respectively. p_{AA}, p_{Aa} and p_{aa} are the genotype frequency of AA, Aa, aa, respectively.

Users can easily switch to any of method above by specifying the flag `--code-method`

with corresponding value, e.g. `--code-method 2`

. The reference for different methods can be found at the following papers:

`--code-method 1`

, (default) Su, Guosheng, et al. "Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers."*Plos one*(2012): e45293.`--code-method 2`

, Zeng, Zhao-Bang, Tao Wang, and Wei Zou. "Modeling quantitative trait loci and interpretation of models."*Genetics*169.3 (2005): 1711-1725.`--code-method 3`

, Yang, Jian, et al. "Common SNPs explain a large proportion of the heritability for human height."*Nature genetics*42.7 (2010): 565-569.`--code-method 4`

, Vitezica, Zulma G., et al. "Orthogonal estimates of variances for additive, dominance, and epistatic effects in populations."*Genetics*206.3 (2017): 1297-1307.

**General GRM**

The script to construct additive and dominant genomic relationship matrix:

# construct A and D simultaneously ./hiblup --make-xrm --code-method 2 --bfile demo --add --dom --step 10000 --thread 32 --out demo

Files named * demo.GA.bin*,

*,*

`demo.GA.id`

*,*

`demo.GD.bin`

*will be generated in the work directory, and HIBLUP will directly compute the inverse of the GRM and output it if*

`demo.GD.id`

`--add-inv`

or `--dom-inv`

is detected in the command.**Weighted GRM**

HIBLUP supports to compute the weighted GRM by the following script:

./hiblup --make-xrm --code-method 1 --bfile demo --add --dom --snp-weight snp.wt.txt --thread 32 --out demo

The file format of SNP weights please refer to here.

Users can assign a file listing the names of SNPs to the flag `--extract`

or `--exclude`

to construct GRM by a subset of SNPs.

**Pedigree and Genome based relationship matrix (HRM)**

When pedigree and genome information are provided simultaneously, HIBLUP will automatically construct H matrix following the equation below (*Aguilar 2010, Christensen 2010*):

\begin{equation} H = \begin{bmatrix} A_{11} + A_{12} A_{22}^{-1} (G_w - A_{22}) A_{22}^{-1} A_{21} & A_{12}A_{22}^{-1}G_w \\ G_wA_{22}^{-1}A_{21} & G_w \\ \end{bmatrix} \end{equation}

where A_{11} and A_{22} are the submatrices of A matrix for non-genotyped and genotyped individuals, respectively; A_{12} and A_{21} are the submatrices of A matrix, which represent the relationships between genotyped and nongenotyped individuals, G_w = (1 - alpha) * G_{adj} + alpha * A_{22}, G_{adj} is adjusted from the original genomic relastionship matrix G by simple linear regression:

\begin{equation} \begin{aligned} Avg(diag(G)) * a + b = Avg(diag(A_{22})) \\ Avg(offdiag(G)) * a + b = Avg(offdiag(A_{22})) \end{aligned} \end{equation}

Then we have G_{adj} = G * a + b.

./hiblup --make-xrm --pedigree demo.ped --bfile demo --alpha 0.05 --add --dom --out demo

The above command will generate additive and dominant relationship matrix simultaneously, the filenames are: * demo.HA.bin*,

*,*

`demo.HA.id`

*,*

`demo.HD.bin`

*.*

`demo.HD.id`

Users can also specify

`--snp-weight`

to construct a weighted GRM for constructing HRM.Also, HIBLUP can construct the inverse of H matrix by following script:

./hiblup --make-xrm --pedigree demo.ped --bfile demo --alpha 0.05 --add-inv --dom-inv --out demo

the generated filenames are: * demo.HAinv.sparse.bin*,

*,*

`demo.HAinv.sparse.id`

*,*

`demo.HDinv.sparse.bin`

`demo.HD.sparse.id`

.**XRM for environmental random effect**

Except for constructing XRM for genetic random effect, HIBLUP can also construct XRM for environmental random effect, phenotype file should be provided for this case, the command is as following:

./hiblup --make-xrm --pheno demo.phe --rand 6,7 --out demo

the generated filenames are: * demo.loc.sparse.bin*,

*,*

`demo.lco.sparse.id`

*,*

`demo.dam.sparse.bin`

*.*

`demo.dam.sparse.id`

**NOTE**

**(1)** All above relationship matrix, except for those whose prefix contain character ".sparse.", can be used by GCTA, LDAK, but additional processes are needed, for example:

awk '{print $1,$1}' demo.GA.id > gcta.grm.id cp demo.GA.bin gcta.grm.bin # then use it by GCTA ./gcta64 ... --grm gcta ...

**(2)** The binary format relationship matrix calculated by GCTA or LDAK can be used directly by HIBLUP without any format transformation, just assign its prefix to flag `--xrm`

.

### 3.5.Format conversion of XRM

HIBLUP only accept XRM in binary file format, as some software (e.g. DMU, BLUPF90, ASReml) can not recognize it, moreover, it would be hard for programming beginners to prepare those binary files, we therefore provide a format converter for easily linking HIBLUP with other software.

**Binary to text**

The following command will output the XRM into a square matrix in text format with names of * demo.txt* and

*:*

`demo.id.txt`

./hiblup --trans-xrm --xrm demo.GA --out demo # --triangle

An example for * demo.txt* with three individuals is as follows:

1.55755 1.04847 0.64622 1.04847 1.58144 0.66971 0.64622 0.66971 1.56794

If the flag `--triangle`

is detected, HIBLUP will only write the lower triangle of the matrix into text file with format of <i j k>, where i, j, k are the row, column index, and its corresponding value respectively. For sparse matrix, HIBLUP will ignore zero elements when outputting "ijk" format. An example file with the same three individuals above is shown below:

1 1 1.55755 2 1 1.04847 2 2 1.58144 3 1 0.646224 3 2 0.669715 3 3 1.56794

**Text to binary**

If users have their own prior calculated relationship matrix in text format, HIBLUP can transform it into binary file, which can be used by HIBLUP directly, the script is as following:

./hiblup --trans-xrm --xrm-txt demo.txt --xrm-id demo.id.txt --out demo

Above command requires * demo.txt* in square format and

*. If the file*

`demo.id.txt`

*is in "ijk" format, please remember to add flag*

`demo.txt`

`--triangle`

.**NOTE**

HIBLUP can also convert the relationship matrix calculated by GCTA, LDAK to text format:

./hiblup --trans-xrm --xrm gcta.grm --out hiblup

### 3.6.Calculate inbreeding and relationship coefficient

Inbreeding and relationship coefficients are derived from either pedigree or genotype information, HIBLUP firstly constructs additive numerator matrix, then derive and output the coefficients in file. The flag for inbreeding coefficient and relationship coefficient is `--ibc`

and `--rc`

, respectively.

**Inbreeding coefficients**

# derived from pedigree ./hiblup --ibc --pedigree demo.ped --thread 32 --out demo # or derived from prior calculated PA matrix ./hiblup --ibc --xrm demo.PA --out demo # derived from genotype ./hiblup --ibc --bfile demo --thread 32 --out demo # or derived from prior calculated GA matrix ./hiblup --ibc --xrm demo.GA --out demo

The inbreeding coefficients are saved in file * demo.ibc*:

id ibc IND0702 0.00290048 IND0703 0.00262272 IND0704 0.00262272

**Relationship coefficients**

# derived from pedigree ./hiblup --rc --pedigree demo.ped --thread 32 --out demo # or derived from prior calculated PA matrix ./hiblup --rc --xrm demo.PA --out demo # derived from genotype ./hiblup --rc --bfile demo --thread 32 --out demo # or derived from prior calculated GA matrix ./hiblup --rc --xrm demo.GA --out demo

The relationship coefficients are saved in file * demo.rc*:

id1 id2 rc IND0701 IND0701 1 IND0701 IND0702 0.472652 IND0701 IND0703 0.0721178 IND0701 IND0704 0.0721178

**NOTE**

1. The inbreeding coefficients and relationship coefficients can be calculated at a time, like:

./hiblup --ibc --rc --pedigree demo.ped --out demo ./hiblup --ibc --rc --bfile demo --out demo

2. The genomic inbreeding coefficients and relationship coefficients can be calculated by different methods using the flag `--code-method`

, more details can be seen here.

./hiblup --ibc --rc --bfile demo --code-method 1 --out demo

### 3.7.Principal component analysis

HIBLUP has the function of calculating PCA for a population, the genotype or the relationship matrix should be provided.

**Calculating PCs from genotype**

./hiblup --pca --bfile demo --out demo

Note that user can add flag `--step n`

to control the memory cost, and specify flag `--npc n`

to output several number of PCs instead of all.

**Calculating PCs from XRM**

./hiblup --pca --xrm demo.GA --npc 5 --out demo

There are two files `demo.pc`

and `demo.pcp`

will be generated.

The file `demo.pc`

contains the calculated PCs, overview of the file:

id PC1 PC2 PC3 PC4 PC5 IND0701 0.0413326 -0.00664808 -0.00795809 0.0218009 -0.0156735 IND0702 0.00856316 0.0378125 0.000503819 0.0205049 0.00972465 IND0703 -0.00793242 0.0458023 -0.00438572 -0.0173733 -0.0254701 IND0704 -0.00793248 0.0458023 -0.00438575 -0.0173733 -0.0254701

The file `demo.pcp`

contains the Proportion and Cumulative Proportion of variance of PCs, overview of the file:

Components PC1 PC2 PC3 PC4 PC5 Standard deviation 4.23976 4.10211 3.76024 3.66821 3.60725 Proportion of Variance 0.0224695 0.0210342 0.0176743 0.0168197 0.0162653 Cumulative Proportion 0.0224695 0.0435037 0.0611779 0.0779976 0.094263

### 3.8.LM, BLUP, PBLUP, GBLUP(rrBLUP), SSGBLUP

Users can easily implement simple linear model (LM), BLUP, PBLUP, GBLUP (rrBLUP) and SSGBLUP, HIBLUP can automatically switch to one of them according to the data type provided by users. The following table details the required or non-required flags for different model in HIBLUP:

Model | --qcovar/--dcovcar | --rand | --pedigree | --bfile |
---|---|---|---|---|

LM | √ | × | × | × |

BLUP | O | √ | O | O |

PBLUP | O | O | √ | × |

GBLUP | O | O | × | √ |

SSGBLUP | O | O | √ | √ |

√: required; ×: not required; O: optional.

rrBLUP model is theoretically the same with GBLUP model, users can just add a flag `--snp-effect`

on the base of GBLUP model to obtain the SNP effect for rrBLUP model.

**IMPORTANT**

It should be noted that, the PBLUP model can be pretty efficient owing to the high performance of FSPAK technology on the very sparse 'LHS' of mixed model equation. However, HIBLUP is designed for genome-based genetic evaluation, in which the 'LHS' is no longer sparse, the 'V' matrix based strategy that is achieved in HIBLUP would be more competent for this case. Although users can fit PBLUP model by using HIBLUP, it may be **slower and memory-costing** compared with those of the software, e.g., DMU, BLUPF90. Therefore, we recommend using DMU or BLUPF90 when fitting a model only involves pedigree information.

### 3.9.Fitting interaction for effects

**Environmental interaction (E by E)**

Interaction between recorded environments can be fitted as fixed or random effects, please use colon as separator.

If fitted as fixed effects:

./hiblup ... --dcovar 2,2:3:6

Above command represents that the fixed effects of this trait are "*sex*", the interaction among "*sex*", "*season*", "*loc*".

If fitted as random effects:

./hiblup ... --rand 6,6:7,7

Similarly, the random effect for this trait are "*loc*", the interaction between "*loc*" and "*dam*", and "*dam*".

For multiple trait model, use space for separator, take 3 traits for an example:

./hiblup ... --dcovar 2:3 0 2,3,2:3 --rand 6 7,6:7 0

which means the fixed effect for the first trait is the interaction between "*sex*", "*season*", no fixed effect for the second trait, the fixed effects of the third trait are "*sex*", "*season*", and the interaction between "*sex*", "*season*"; the environmental random effects for the first trait is "*loc*", the second trait are "*dam*"and the interaction between "*loc*" and "*dam*", the third trait has no environmental random effect.

**Genetic interaction (G by G)**

HIBLUP can fit interaction for genetic random effect, for example: additive by additive, additive (A) by dominant (D), dominant by dominant, and so on, the number of genetic random effect at an interaction term is not limited. The interaction of G by G for *n* genetic random effects in HIBLUP can be mathematically formulated as following:

\begin{equation} \begin{aligned} G_{inter} &= \frac{G_1 \bigodot G_2 \bigodot ... \bigodot G_n}{tr(G_1 \bigodot G_2 \bigodot ... \bigodot G_n) / N} \end{aligned} \end{equation}

where N is the number of individuals, \bigodot is the Hadamard product (i.e. element-wise multiplica-tion) of all matrix, and tr() is the trace of the matrix.

The only way to fit G by G is to appoint the prefix of the random effects to flag `--xrm`

, please use colon as separator within an interaction term:

# fit A by D only ./hiblup ... --xrm demo.GA:demo.GD #fit A, D, and A by D ./hiblup ... --xrm demo.GA,demo.GD,demo.GA:demo.GD #fit A by A, A by D and D by D simultaneously ./hiblup ... --xrm demo.GA:demo.GA,demo.GA:demo.GD,demo.GD:demo.GD

**Genetic-Environmental interaction (G by E)**

Genetic-Environmental interaction will be included in the model as random effect, please assign a digital value representing its position at phenotype file for environment factors, and assign prefix of the genetic effects to the flag "`--rand-gxe`

".

For single trait model, please use colon as separator within an interaction term, and use comma as separator for multiple interaction terms, for example:

./hiblup --single-trait ... --rand-gxe 6:demo.GA,6:demo.GD #interaction among multiple effects ./hiblup --single-trait ... --rand-gxe 6:demo.GA:demo.GD ./hiblup --single-trait ... --rand-gxe 6:7:GA:demo

For multiple traits model, please use space among traits, use 0 if there is no G by E interaction to be considered for a trait, take 3 traits for an example:

./hiblup --multi-traits ... --rand-gxe 6:demo.GA 6:demo.GA,7:demo.GA 0

Above command means that the interaction for the first trait is "*loc*" by additive genetic effect; the interactions for the second trait are "*loc*" by additive genetic effect and "*dam*" by additive genetic effect; no interactions are considered for the third trait.

### 3.10.Single trait model

Single trait model aims to estimate variance components and its heritability of all random effects for a trait. Example command of running SSGLBUP for single trait model:

./hiblup --single-trait --pheno demo.phe --pheno-pos 8 --dcovar 2,3 #fixed effect --qcovar 4,5 #covariates --rand 6,7 #non-genetic (environmental) random effect --pedigree demo.ped #genetic random effect --bfile demo #genetic random effect --add --dom --threads 32 --out demo

There are 5 types of algorithms for variance components estimation:**"AI", "EM", "HE", "EMAI", "HI"**, where "**AI**" is AIREML,"**EM**" is EMREML, and "**HE**" represents HE regression, "**HI**" is "HE+AI", which uses the outcome of HE regression as the start values of AIREML, users can choose one of them by flag `--vc-method`

, and change the maximum iteration number for AIREML and EMREML by flag `--ai-maxit`

and `--em-maxit`

respectively. For "HE" regression, if covariates for fixed effects are provided, HIBLUP will first regress the phenotype on these, then perform HE Regression using the residuals, the standard error(SE) of HE is computed by Jackknife resampling strategy at 100 repeats.

However, above command maybe time-consuming when fitting trait by trait, as it repeatedly constructs XRM for each trait. We can construct XRM only once at the beginning, then assign it to `--xrm`

when fitting multiple traits, which can give the same results, but would be much more efficient, the command can be referred as following:

# Step 1: construct XRMs ./hiblup --make-xrm --pedigree demo.ped --bfile demo --add --dom --out demo # Step 2: link XRMs to fit model ./hiblup --single-trait --pheno demo.phe --pheno-pos 8 --dcovar 2,3 --qcovar 4,5 --rand 6,7 --xrm demo.HA,demo.HD --add --dom --threads 32 --out demo

Several files will be generated in the work directory:* demo.vars*: the estimated variance and heritability of all random effects:

Var Var_SE h2 h2_SE loc 12.2127 8.9109 0.1035 0.0682 dam 10.6268 4.8605 0.0901 0.041 HA 59.2007 11.7641 0.5019 0.0807 HD 28.9946 8.9294 0.2458 0.083 e 6.9232 1.6957 0.0587 0.0161

* demo.beta*: the estimated coefficients of all covariates and fixed effects:

Levels Estimation SE mu 32.5285 3.4758 sex_F 6.2933 1.7593 sex_M 26.2353 1.7634 season_Autumn 18.0905 0.9753 season_Spring -1.9168 0.9904 season_Summer 7.8471 1.0066 season_Winter 8.5078 0.9944 day 0.1547 0.0574 bornweight 1.5703 0.4614

* demo.anova*: analysis of variance table of all covariates and fixed effects:

Factors Df SumSq MeanSq F Pr(>F) sex 1 49710.3845 49710.3845 7180.311 <2e-16 season 3 24515.0941 8171.698 1180.344 <2e-16 day 1 925.3352 925.3352 133.658 <2e-16 bornweight 1 437.3432 437.3432 63.171 1.30E-14 e 493 3413.1141 6.9232

* demo.rand*: the estimated environmental random effects, genetic random effects (the column named "PA", "GA", or "HA" is the list of breeding values for phenotypic and non-phenotypic individuals) and residuals.

Users can specify the start values for all unknown variance to `--vc-priors`

for fast convergence, the way of how to input the variance components can be found at here.

### 3.11.Repeat records of single trait model

Running repeat records model is nearly the same with fitting single trait model, the only difference is that repeat records model can fit the permanent environment effect, which can be achieved by assigning individual id at the first column of phenotype file to environmental random effect, take GBLUP model for an example:

./hiblup --single-trait --pheno demo.repeat.phe --pheno-pos 2 --rand 1 --bfile demo --threads 32 --out demo

### 3.12.Multiple traits model

Multiple traits model is used for estimating genetic correlation among traits, there is no limitation of the number of traits used in analysis for HIBLUP, moreover, users can appoint different covariates, fixed and random effect for different traits. PBLUP, GBLUP, SSGBLUP are all available for multiple traits model, HIBLUP can automatically switch to one of them according to the provided file.

Taking GBLUP for an example, the multiple model can be achieved by following command:

./hiblup --multi-trait --pheno demo.phe --pheno-pos 8 9 10 --qcovar 4,5 5 4 --dcovar 2,3 0 2 --rand 6,7 7 0 --xrm demo.GA,demo.GD # same with --bfile demo --add --dom --vc-method AI --ai-maxit 30 --threads 32 --out demo

The way of how to set covariates (`--qcovar`

), fixed (`--dcovar`

) and random (`--rand`

) effect for multiple traits, please refer to here.

Several files will be generated:

* demo.vars*: the variance, heritability and SE of all random effects for all traits:

Item Var Var_SE h2 h2_SE tr1_loc 13.73854 9.9309 0.11739 0.07538 tr1_dam 2.29288 3.18932 0.01959 0.02731 tr1_demo.GA 61.35173 9.57171 0.52423 0.06964 tr1_demo.GD 29.93077 6.27996 0.25575 0.06022 tr1_e 9.71917 1.67088 0.08305 0.01696 tr2_dam 3.42172 3.80424 0.02971 0.03308 tr2_demo.GA 63.59 10.52058 0.55219 0.06017 tr2_demo.GD 38.14332 7.27491 0.33122 0.06696 tr2_e 10.00395 1.63578 0.08687 0.01578 tr3_demo.GA 21.03906 6.94184 0.2203 0.0668 tr3_demo.GD 15.25593 10.56861 0.15974 0.10919 tr3_e 59.20689 9.04084 0.61996 0.09801

* demo.covars*: the co-variance, correlation and SE of all traits for all genetic random effects:

Item COVar COVar_SE r r_SE tr1:tr2_demo.GA 46.6899 8.56852 0.747524 0.171364 tr1:tr3_demo.GA 16.3376 5.96333 0.454749 0.185599 tr2:tr3_demo.GA 12.3489 6.21401 0.337612 0.157671 tr1:tr2_demo.GD 26.9273 4.65452 0.796922 0 tr1:tr3_demo.GD -2.25039 4.95673 -0.105301 0.23413 tr2:tr3_demo.GD 8.23124 5.21494 0.341194 0.259133 tr1:tr2_e 0.24039 1.17428 0.024379 0.119333 tr1:tr3_e -1.10713 2.87487 -0.0461537 0.11985 tr2:tr3_e -4.80127 2.85341 -0.197284 0.128375

* demo.*.anova*: the variance analysis table of covariates, fixed effects for different traits.

*: the estimated coefficients and SE of covariates, fixed effects for different traits.*

`demo.*.beta`

*: the estimated environmental random effects, genetic random effects (the column named "PA", "GA", or "HA" is the list of breeding values for phenotypic and non-phenotypic individuals) and residuals. for different traits.*

`demo.*.rand`

Users can specify the start values for all unknown variance and co-variance to `--vc-priors`

and `--covc-priors`

, the way of how to input the genetic components can be found at here.

**NOTE**

HIBLUP only estimates co-variance for genetic random effect and residual effect, not for environmental random effect, therefore at least one genetic random effect should be included in the model, which can be from pedigree (`--pedigree`

), genotype (`--bfile`

) or XRMs (`--xrm`

). If users insist on estimating correlation for environmental random effects, just make relationship matrix for those random effects by the flag `--make-xrm`

(see the chapter "**XRM for environmental random effect**" here), then assign it to the option `--xrm`

.

### 3.13.Solve mixed model equation

The objective of solving mixed model is to obtain the estimated breeding values of all individuals by using the known genetic parameters. HIBLUP automatically switches to single trait or multiple traits model by the length of specified flag `--pheno-pos`

. Taking GBLUP model for an example, the command for single trait and multiple traits model is as following:

# (1) additive effect for single trait model ./hiblup --mme --pheno demo.phe --pheno-pos 8 --rand 6 --xrm demo.GA # can be replaced by --bfile demo --add --vc-priors v1,v2,v3 --pcg --threads 32 --out demo # (2) additive and dominant effect for multiple traits model ./hiblup --mme --pheno demo.phe --pheno-pos 8 9 10 --rand 6,7 7 0 --bfile demo # can be replaced by --xrm demo.GA,demo.GD --add --dom --vc-priors t1_v1,t1_v2,t1_v3,t1_v4,t1_v5 t2_v1, t2_v2,t2_v3,t2_v4 t3_v1,t1_v2,t1_v3 --covc-priors cov1,cov2,cov3 cov4,cov5,cov6 cov7,cov8,cov9 --pcg --threads 32 --out demo

By default, HIBLUP directly compute the inverse of ** V** matrix, users can try to add flag

`--pcg`

to use *Preconditioned Conjugate Gradients*algorithm to solve the system of linear equation, which is much more faster in certain situation.

**IMPORTANT**

**(1)** `--vc-priors`

contains the variance components of all random effects, including environmental random effects, genetic random effects, and residuals, the provided variance components should also be in this order. **For single trait model**, please use comma as separator, as shown in the first example command:

v1: the variance of the environmental random effect located at the 6th column in phenotype file. v2: the additive genetic variance. v3: the residual variance.

**For multiple traits model**, the variance components are listed trait by trait orderly, and please use comma within a trait, use space among traits, as shown in the second example command:

t1_v1: the variance of the environmental random effect located at the 6th column in phenotype file for the first trait. t1_v2: the variance of the environmental random effect located at the 7th column in phenotype file for the first trait. t1_v3: the variance of additive genetic effect for the first trait. t1_v4: the variance of dominant genetic effect for the first trait. t1_v5: the variance of residuals for the first trait. t2_v1: the variance of the environmental random effect located at the 7th column in phenotype file for the second trait. t2_v2: the variance of additive genetic effect for the second trait. t2_v3: the variance of dominant genetic effect for the second trait. t2_v4: the variance of residuals for the second trait. t3_v1: the variance of additive genetic effect for the third trait. t3_v2: the variance of dominant genetic effect for the third trait. t3_v3: the variance of residuals for the third trait.

(2) `--covc-priors`

contains the co-variance components among traits at all genetic random effects, the co-variance components are listed by genetic random effect orderly, and please use comma within a genetic random effect, use space among genetic random effects, the input order can be referred as following:

where each box represents a genetic random effect, the off-diagonals are the co-variances between traits at this genetic random effect, and the diagonals are its variance at different traits. The number of box is equal to the number of XRM plus 1, where "1" is the residual. Users need to assign values at the lower triangle of each box to HIBLUP:

cov1: the co-variance between the first and second trait at "demo.GA". cov2: the co-variance between the first and third trait at "demo.GA". cov3: the co-variance between the second and third trait at "demo.GA". ...

The order of input co-variances should strictly follow the route in the above figure.

### 3.14.Statistical tests of significance for effects

For fixed effects and covariates, HIBLUP can do statistical test automatically, and list the results in file * *.anova*, the p-value of F test at the last column could be used as a diagnosis of which fixed effects or covariates are statistically significant for a trait.

For random effect (genetic or environmental), there is no automatic function of statistical test for significance, users can achieve it manually by the Likelihood Ratio Test (LRT) following the steps below:

(1) run full model with the random effect of interest, get the loglike value L1 (which can be found from the LOG information of HIBLUP at the last iteration of REML).

(2) run reduced model in which the random effect of interest is dropped off, get the loglike value L0.

(3) calculate the p-value by Chi-square test in R:

> stat <- 2 * (L1 - L0) > pvaue <- pchisq(stat, df = 1, lower.tail = FALSE) > print(pvaue)

### 3.15.Reliability and PEV

For single trait model, multiple traits model and solving mixed model equation, HIBLUP can calculate the reliability and prediction error variance (PEV) of all random effects. The mathematical formula for reliability calculation is as follows:

\begin{equation} \begin{aligned} r^{2} &= 1-\frac{PEV}{\sigma^{2}} \end{aligned} \end{equation}

where \sigma^{2} is the estimated variance from REML analysis for the random effect of interest. The prediction error variance PEV can be obtained from the following equation:

\begin{equation} \begin{aligned} PEV &= C_{ii}*\sigma_{e}^{2} \end{aligned} \end{equation}

where \sigma_e^{2} is the estimated residual variance. C_{ii} is the i_{th} diagonal element of the inverse matrix of left hand side (LHS) of mixed model equation.

By default, HIBLUP does not calculate and output those statistics, as the reliability estimation usually takes longer than whole REML process, thus please add flag `--r2`

only if you really want to get the reliability.

Taking single trait model for an example, the script to obtain the reliability is as follows:

./hiblup --single-trait --pheno demo.phe --pheno-pos 8 --dcovar 2,3 #fixed effect --qcovar 4,5 #covariates --rand 7 #non-genetic (environmental) random effect --bfile demo #genetic random effect --r2 #to calculate the reliability --threads 32 --out demo

Then each random effect, including residual, has several additional columns listing its corresponding standard error, PEV and reliability, overview of the file:

ID dam dam_SE dam_PEV dam_R2 GA GA_SE GA_PEV GA_R2 residuals residuals_SE residuals_PEV residuals_R2 IND1001 0.496645 3.75992 14.137 0.255863 6.66127 4.27597 18.2839 0.774767 -1.45194 3.18313 10.1323 0.55248 IND1002 0.496645 3.75992 14.137 0.255863 6.66127 4.27597 18.2839 0.774767 2.04383 3.17906 10.1064 0.553624 IND1003 0.674833 3.53678 12.5088 0.341566 2.08201 4.62091 21.3528 0.736963 -0.104053 4.14567 17.1866 0.24091 IND1004 0.674833 3.53678 12.5088 0.341566 -0.682836 4.70199 22.1087 0.727651 0.9083 4.17824 17.4577 0.228936

### 3.16.Compute SNP effect

HIBLUP now can compute the additive and dominant effects for all SNPs. There are two ways to compute SNP effects, the first way is to add flag `--snp-effect`

in the command when fitting single trait, multiple traits model, or solving mixed model:

# compute additive SNP effect by SSGBLUP of single trait model ./hiblup --single-trait --pheno demo.phe --pheno-pos 8 --bfile demo --pedigree demo.ped --add --snp-effect --thread 32 --out demo

the second way is to derive SNP effect from prior calculated GEBVs:

# compute dominant SNP effect ./hiblup --snp-effect --gebv demo.gebv.d.txt --bfile demo --dom --thread 32 --out demo

The file format of GEBVs can be found at here. The genotype file specified by `--bfile`

is used to construct GRM and compute SNP effect, for the second way, HIBLUP will construct additive or dominant GRM for individuals, to save time, users can provide prior calculated GRM, HIBLUP will use it directly:

# compute additive and dominant SNP effect ./hiblup --snp-effect --gebv demo.gebv.ad.txt --bfile demo --xrm demo.GA,demo.GD --add --dom --thread 32 --out demo

The SNP effects of all SNPs are stored in file * demo.snpeff*.

**NOTE**

If the GEBVs are calculated by using a weight GRM, please remember to provide the same weights of all SNPs to flag `--snp-weight`

.

### 3.17.Genomic mating

Genomic mating is a technology unionizing all available SNPs to predict the expected genetic performance of progeny for all combinations between males and females. To implement genomic mating, genotype of all individuals and additive or dominant effects of all SNPs should be provided. The format of SNP effect file can be found at here. The command to do genomic mating by HIBLUP is as following:

./hiblup --mating --bfile demo --score demo.snpeff --thread 32 --out demo

Mating results are saved in file * demo.mating*, brief view of this file:

sir dam g Ind704 Ind710 -0.0675817 Ind704 Ind711 1.06056 Ind704 Ind712 2.00997 Ind704 Ind713 -0.0879987 Ind704 Ind714 -0.492576

The last column is the expected genetic performance of progeny for different combinations.

**IMPORTANT**

**(1)** The SNP with the same name between * demo.bim* and

*should have the consistent reference (A1) and alternative (A2) allele.*

`demo.snpeff`

**(2)**All the individuals in

*should be clearly marked with gender (located at the 5th column, 1=male; 2=female; other=unknown), any individual with unknown gender will be deleted in analysis.*

`demo.fam`

### 4.Questions

Q1: What are the symbols (e.g. "C", "F", "R", "GR") in model equation of LOG file?

The meanings of the symbols are below:**C:** quantitative covariates**F:** discrete covariates, also is called as fixed effect**R:** environmental random effect**GR:** genetic random effect**PA:** pedigree based additive relationship matrix**PD:** pedigree based dominant relationship matrix**GA:** genome based additive relationship matrix**GD:** genome based dominant relationship matrix**HA:** pedigree and genome based additive relationship matrix**HD:** pedigree and genome based dominant relationship matrix

Q2: How to decrease the memory cost for analysis?

There are two strategies:**1.** assign lower steps to the option `--step`

when genotype is used for analysis.**2.** add flag `--float-prec`

in the command, it can nearly save half of memory cost, but it should be noted that mathematical operation in float precision may lose the accuracy of estimated parameters for specific cases.

Q3: Can options "--extract", "--exclude", "--remove", "--keep" work for all functions in HIBLUP?

Yes, if the analysis involves individuals and SNPs, all above options can work effectively.

Q4: Can HIBLUP estimate genetic correlation for environmental random effect?

Yes, users can firstly construct XRM for this environmental random effect (see Construct relationship matrix), then assign it to flag `--xrm`

when fitting the multiple traits model.

Q5: Can users assign the relationship matrix in text format to HIBLUP directly?

By now, HIBLUP can only accept relationship matrix in binary format (*.bin and *.id), relationship matrix in text format is unacceptable, users can convert it to binary format by the function `--trans-xrm`

(see Format conversion of XRM).

Q6: How to compute the proportion of phenotypic variance explained (PVE) for significant SNPs by HIBLUP?

It's easy to be achieved by HIBLUP. Firstly, please write the id of significant SNPs into a file * snp.txt* as a column without header, then calculate the variance components by following steps:

#step1: construct GRM1 using the significant SNPs ./hiblup --make-xrm --bfile demo --extract snp.txt --out grm1 --threads 32 # grm1.GA.id, grm1.GA.bin will be generated #step2: construct GRM2 using the SNPs excluding the significant SNPs ./hiblup --make-xrm --bfile demo --exclude snp.txt --out grm2 --threads 32 # grm2.GA.id, grm2.GA.bin will be generated #step3: compute the variance components for GRM1 and GRM2 by single trait model ./hiblup --single-trait --pheno demo.phe --pheno-pos X --xrm grm1.GA,grm2.GA --out vc --threads 32 #step4: calculate the PVE for significant SNPs by following equation: pve = V(grm1) / ( V(grm1)+V(grm2)+V(e) )