User Manual of HIBLUP

1.1.What is HIBLUP #

HIBLUP (say "Hi!" to BLUP), which is called “天权”in Chinese, is an user-friendly software that provides estimated genetic value of each individual by maximizing the usage of information from pedigree, genome, and phenotype, as well as all process-related functions, such as construction of relationship matrix, estimation of variance components (heritability) of traits and co-variance components (genetic correlation) between traits using various algorithms, solution of mixed model, computation of SNP effects, and implement of genomic mating. Users can easily achieve simple linear model (LM), best linear unbiased predictors (BLUP), pedigree basedBLUP (PBLUP), genomic BLUP (GBLUP), and single-step GBLUP (SSGBLUP) for single trait model, repeated traits model, and multiple traits model. The functions of HIBLUP will keep on being enriched with more functionalities based on users feedback.

1.2.Where to download HIBLUP #

The latest released executable file of HIBLUP for different platforms can be downloaded from:

https://www.hiblup.com/download.

It is free of charge for academic purposes. For commercial purposes, please contact the development team.

1.3.How to cite HIBLUP #

The manuscript is on its way to submission.

1.4.Contact us #

HIBLUP is developed at Zhao lab of Huazhong Agricultural University.

Design and Maintenance team: 

Lilin Yin#, Haohao Zhang#, Zhenshuang Tang, Dong Yin, Yuhua Fu, Xiaohui Yuan, Xinyun Li*, Xiaolei Liu*, Shuhong Zhao*

Questions, suggestions, and bug reports are welcome and appreciated, we recommend sending your message by filling the form in the following page:

https://www.hiblup.com/contact-us

If it cannot fully express your question or you get a trouble from the above page, please feel free to send us email by:

Lilin Yin (尹立林): ylilin@mail.hzau.edu.cn
HaoHao Zhang (张浩浩): haohaozhang@whut.edu.cn
Xiaolei Liu (刘小磊): xiaoleiliu@mail.hzau.edu.cn

2.1.Phenotype file #

Different with other software, the phenotype file of HIBLUP should include not only the phenotypic records, but also the covariates, fixed and random effects. The first column must be the individual id, header should be included in the file. Brief view of example phenotype file is as following:

id	sex	season	day	weight	loc	dam	tr1	tr2	tr3
Ind1	F	Winter	100	2.9	l3	Ind902	73.9776	21.3252	35.7737
Ind2	M	Summer	100	1.5	l2	Ind902	94.1994	19.1162	54.2453
Ind3	M	Winter	100	1.8	l4	Ind903	88.6698	5.9538	71.0284
Ind4	M	Winter	100	1.9	l1	Ind903	87.0458	-2.8179	37.2703
Ind5	M	Spring	100	2.2	l3	Ind905	68.8077	2.5357	59.5402
Ind6	F	Autumn	100	1.9	l1	Ind905	85.9242	-3.06	48.4822

Any symbol below will be treated as missing when loading the file:

NA  Na  .  -  NaN  NAN  nan  na  N/A  n/a  <NA>

The file should be assigned to flag --pheno.

By default, HIBLUP will take the trait at second column for analysis, users can specify a different column by using --pheno-pos n, for example, --pheno-pos 8 represents using the 8th column for analysis.

NOTE

If there are multiple traits to be used for estimating genetic correlation, please use space as separator, e.g. --pheno-pos 8 9 10.

2.2.Covariates, fixed and environmental random effects #

For covariates, fixed effect and environmental random effect of a trait, please specify its position at phenotype file to flag --qcovar, --dcovar, --rand respectively.

For single trait, please use comma as separator, for example:
--qcovar 4,5: take "day" and "weight" as covariates
--dcovar 2,3: take "sex" and "season" as fixed effects
--rand 6: take "loc" as random effect

For multiple traits, if there are some environmental factors needed to be fitted as a model effect, all traits should be specified at this model effect, and please use comma as separator within a trait, and use space among traits, use 0 for a trait if those factors are not considered for it, take fixed effect for an example:

./hiblup ... --dcovar 2,3 0 2,3,6 ...

which means there are 3 traits in analysis, the first trait has two fixed effects: "sex" and "season"; the second trait has no fixed effect; the third trait has three fixed effects, which are "sex", "season" and "loc".

IMPORTANT

--qcovar is used to specify the environmental effects as quantitative covariates, the records for the effects should be in digits, any character is not allowed.

--dcovar is used to specify the environmental effects as discrete covariates, the records for the effects could be in either digits or characters, both are acceptable.

--rand is used to specify the environmental random effects, which come from the recorded environmental factors, the genetic random effects are from either pedigree or genomic information, if pedigree or genotype file is provided, HIBLUP will fit the genetic random effect in the model automatically, users DO NOT need to add the column of individuals id as random effect here.

2.3.Pedigree file #

Pedigree file is used for deriving numerator relationship matrix and inbreeding coefficient. Three columns in file are required orderly. The first column is the individual id, the second and third columns are its father and mother id respectively. As the same with phenotype file, any symbol of the list above appears in the pedigree file will be treated as missing. An example file for pedigree is as following:

id	father	mother
Ind2	NA	NA
Ind5	Ind1	NA
Ind11	NA	Ind2
Ind17	Ind11	Ind5
Ind22	Ind1	Ind5
Ind45	Ind22	Ind2

The file should be assigned to flag --pedigree.

NOTE

The order of individual id in the first column can be ranked by birth date, but it is not a compulsive requirement, as HIBLUP has the automatic function of sorting and rearrangement on the input pedigree.

2.4.Genotype file #

HIBLUP currently only accept the genotype in PLINK binary format, e.g. demo.fam, demo.bim and demo.bed, please see more details at PLINK user manual. Users can convert any other format of genotype (e.g. VCF, HapMap, PED/MAP) to binary format by PLINK2 and TASSEL:

# HapMap to VCF by TASSEL
./run_pipeline.pl -SortGenotypeFilePlugin -inputFile demo.hmp.txt
                  -outputFile demo.sort.hmp.txt -fileType Hapmap
./run_pipeline.pl -fork1 -h demo.sort.hmp.txt -export-exportType VCF -runfork1
# VCF to Binary
./plink2 --vcf demo.vcf --make-bed --out demo
# PED/MAP to Binary
./plink2 --ped demo.ped --map demo.map --make-bed --out demo

Those binary files should be assigned to flag --bfile [prefix].

NOTE

(1) HIBLUP only supports to load one binary file, does not support to load more than one binary files (e.g. whole genome is stored separately by chromosome), please merge them by PLINK prior to using HIBLUP, a rough guidance can be found here.
(2) HIBLUP will code the genotype A1A1 as 2, A1A2 as 1, and A2A2 as 0, where A1 is the first allele of each marker in *.bim file, therefore the estimated effect size is on A1 allele, users should pay attention to it when a process involves marker effect.

IMPORTANT

HIBLUP has no function of imputation, any missing genotype in binary file will be treated as heterozygote, that means missings will be forced to be coded as 1 in analysis, we suggest users to implement imputation by other software prior to using HIBLUP if missings exist in genotype.

2.5.Relationship matrix file #

HIBLUP currently only accept Relationship Matrix (XRM) in binary format, which has the most efficiency of file writing and data loading, and costs the least usage of disk space, e.g. demo.GA.id and demo.GA.bin.

The file *.id has one column listing the id of all individuals, the file *.bin store the lower triangle elements of the XRM for dense matrix, and the file *.sparse.bin store the row index, column index, and triangle elements of the XRM for sparse matrix.

By default, those files is generated by --make-xrm in HIBLUP (see Construct relationship matrix).
Additionally, we have developed a file format converter for making binary relationship matrix, please refer to corresponding section (see Format conversion of XRM). To use the binary XRM by HIBLUP, just assign its prefix to the option --xrm [prefix]:

./hiblup ... --xrm demo.GA ...

If there are more than one XRMs, please use comma as separator between prefix:

./hiblup ... --xrm demo.GA,demo.GD ...

IMPORTANT

HIBLUP has the function of computing the inverse of XRM, which maybe useful for other software (e.g. BLUPF90, DMU, ASReml, ...), but it's useless for HIBLUP, as HIBLUP uses XRM directly to estimate variance components or solve mixed model equation, therefore please do not assign an inverse of relationship matrix to HIBLUP, unless you know what you are doing.

2.6.Genomic estimated breeding value file #

Genomic estimated breeding value (GEBV) is used for computing additive and dominant SNP effect, the first column should be the sample id, and the rest columns are the additive or dominant GEBVs, an example with three individuals is as following:

id    add
Ind1  -1.1345
Ind2  0.3245
Ind3  0.0234

The file should be assigned to flag --gebv.

NOTE

If --add is specified, additive breeding value should be provided, so does the flag --dom. If both --add and --dom are specified, the breeding value for additive and dominance should be provided in file, the column of additive breeding value should be at the front of dominant breeding value.

2.7.SNP weighting file #

SNPs weighting file is used for constructing weighted XRM or computing SNP effect, the first column should be the SNP id, and the rest columns are the additive or dominant weights of SNPs, please ensure that all weights should be positive values, an example with three SNPs is as following:

SNPid  weight
SNP1   42
SNP2   0.3
SNP3   100

The file should be assigned to flag --snp-weight.

NOTE

If --add is specified, additive weights should be provided, so does the flag --dom. If both --add and --dom are specified, the weights of all SNPs for additive and dominance should be provided in file, the column of additive weights should be at the front of dominant weights.

2.8.SNP effect file #

SNPs effect file is used for genomic mating, at least 5 columns are required, an example is as following:

SNPid  a1  a2  freq     add       dom
SNP1   A   C   0.1243  0.30134   0.00000
SNP2   G   T   0.0345  -0.06324  0.00124
SNP3   A   G   0.3635  0.15425   -0.00913

The file should be assigned to flag --score.

NOTE

As mentioned above, the provided type of SNP effect should be consistent with the flag --add and --dom, and the column of additive SNP effect should be at the front of dominant SNP effect if both of them are provided.

2.9.Sample and SNP filter file #

HIBLUP can filter the sample and SNPs in analysis for all available functions, the file is easy to prepare, just list the name of individual or the id of SNP in file with no header.

Example file id.txt:

Ind16
Ind56
Ind110

if users are supposed to remove those individuals in analysis, just assign it to flag --remove. Contrarily, if it is assigned to flag --keep, only those individuals in file will be used for analysis.

Example file snp.txt:

SNP23
SNP198
SNP432

if users are supposed to remove those SNPs in analysis, just assign it to flag --exclude. Contrarily, if it is assigned to flag --extract, only those SNPs in file will be used for analysis.

3.1.Multi-thread computing #

We have achieved most of the procedures in HIBLUP being able to run on multiple threads. Users can easily change the number of threads used for analysis by following:

./hiblup --threads 32

If --threads is not specified, HIBLUP will try to get the maximum thread number from standard OpenMP environment variable, and use all the available resource to accelerate the computational efficiency.

3.2.Allele frequency and Genotype frequency #

HIBLUP can calculate allele frequency and genotype frequency for markers.

Allele frequency

If the flag --allele-freq is detected in the input command, HIBLUP will calculate A1 allele frequency for all SNPs in genotype:

./hiblup --bfile demo --allele-freq --out demo

It will generate a file named demo.afreq, brief view of this file:

SNP     a1  a2  freq_a1
snp1    A   C   0.2775
snp2    G   T   0.4475
snp3    A   G   0.1918

Then users can filter SNPs with a specific condition, for example, pick up SNPs at the condition of 0.05 < freq <= 0.1:

awk '$4 > 0.05 && $4 <= 0.1 {print $1}' demo.afreq > freq_filter_snp.txt

Genotype frequency

If the flag --geno-freq is detected in the input command, HIBLUP will calculate the genotype frequency of A1A1 and A2A2 for all SNPs in genotype:

./hiblup --bfile demo --geno-freq --out demo

It will generate a file named demo.gfreq, brief view of this file:

SNP     a1  a2  freq_a1a1  freq_a2a2
snp1    A   C    0.3456     0.1244
snp2    G   T    0.2358     0.2345
snp3    A   G    0.1395     0.4643

3.3.Homozygosity and Heterozygosity #

The homozygosity and heterozygosity are defined as the proportion of homozygous and heterozygous of an individual, it can be easily obtained from HIBLUP by using the flag --homo and --hete, respectively.

Homozygosity

./hiblup --bfile demo --homo --out demo

A file named demo.homo will be generated as following:

id         a1a1      a2a2
IND0701  0.42255   0.201235
IND0702  0.415291  0.204923
IND0703  0.426455  0.199282

Heterozygosity

./hiblup --bfile demo --hete --out demo

A file named demo.hete will be generated as following:

id       a1a2
IND0701  0.376216
IND0702  0.379786
IND0703  0.374263

3.4.Construct relationship matrix (XRM) #

HIBLUP can construct XRM from pedigree and genomic information, the output binary file has two types of storage format. For dense matrix, the values at lower triangle of relationship matrix are stored one by one, one element take over 4 bytes, and for sparse matrix, only non-zero values (k) at lower triangle of relationship matrix and its row (i) and column index (j) are stored in order of "ijk" format in binary file, that means one element will take over 12 bytes.

Pedigree based relationship matrix (PRM)

# construct A and its inverse simultaneously
./hiblup --make-xrm 
         --pedigree demo.ped
         --add --add-inv 
         --thread 32 
         --out demo 

Files named demo.PA.bin, demo.PA.id, demo.PAinv.sparse.bin, demo.PAinv.sparse.id will be generated in the work directory.

Sometimes when computing the inverse of XRM, it may encounter an error saying:

 Error: matrix is not invertible,…

that is because that the matrix is not positive-defined, users can specify a small value to --ridge-value by using ridge regression method to address this problem.

Genome based relationship matrix (GRM)

The relationship matrix could also be derived from genome-wide markers. The mathematical formula of constructing a genomic relationship matrix (GRM) is as follows:

\begin{equation}
\begin{aligned}
G &= \frac{ZZ^\top}{tr(ZZ^\top)/n}
\end{aligned}
\end{equation}

where Z is the numeric coded genotype matrix with a dimension of n by m, n is the number of individuals and m is the number of markers, tr is the the trace of the matrix. Here in HIBLUP, total 4 methods have been achieved to obtain matrix Z for additive and dominant genetic effect, the way of coding the genotype for different methods can be referred to the following table:

Table 1. The way of coding genotype for different methods

where p_{A} and p_{a} are the allele frequency of A and a, respectively. p_{AA}, p_{Aa} and p_{aa} are the genotype frequency of AA, Aa, aa, respectively.

Users can easily switch to any of method above by specifying the flag --code-method with corresponding value, e.g. --code-method 2. The reference for different methods can be found at the following papers:

General GRM

The script to construct additive and dominant genomic relationship matrix:

# construct A and D simultaneously
./hiblup --make-xrm --code-method 2
         --bfile demo 
         --add --dom 
         --step 10000 
         --thread 32 
         --out demo

Files named demo.GA.bin, demo.GA.id, demo.GD.bin, demo.GD.id will be generated in the work directory, and HIBLUP will directly compute the inverse of the GRM and output it if --add-inv or --dom-inv is detected in the command.

Weighted GRM

HIBLUP supports to compute the weighted GRM by the following script:

./hiblup --make-xrm --code-method 1
         --bfile demo 
         --add --dom 
         --snp-weight snp.wt.txt 
         --thread 32 
         --out demo 

The file format of SNP weights please refer to here.
Users can assign a file listing the names of SNPs to the flag --extract or --exclude to construct GRM by a subset of SNPs.

Pedigree and Genome based relationship matrix (HRM)

When pedigree and genome information are provided simultaneously, HIBLUP will automatically construct H matrix following the equation below (Aguilar 2010, Christensen 2010):

\begin{equation}
H =
\begin{bmatrix}
A_{11} + A_{12} A_{22}^{-1} (G_w - A_{22}) A_{22}^{-1} A_{21} & A_{12}A_{22}^{-1}G_w \\
G_wA_{22}^{-1}A_{21} & G_w \\
\end{bmatrix}
\end{equation}

where A_{11} and A_{22} are the submatrices of A matrix for non-genotyped and genotyped individuals, respectively; A_{12} and A_{21} are the submatrices of A matrix, which represent the relationships between genotyped and nongenotyped individuals, G_w = (1 - alpha) * G_{adj} + alpha * A_{22}, G_{adj} is adjusted from the original genomic relastionship matrix G by simple linear regression:

\begin{equation}
\begin{aligned}
Avg(diag(G)) * a + b = Avg(diag(A_{22})) \\
Avg(offdiag(G)) * a + b = Avg(offdiag(A_{22}))
\end{aligned}
\end{equation}

Then we have G_{adj} = G * a + b.

./hiblup --make-xrm 
         --pedigree demo.ped 
         --bfile demo 
         --alpha 0.05 
         --add --dom 
         --out demo

The above command will generate additive and dominant relationship matrix simultaneously, the filenames are: demo.HA.bin, demo.HA.id, demo.HD.bin, demo.HD.id.
Users can also specify --snp-weight to construct a weighted GRM for constructing HRM.
Also, HIBLUP can construct the inverse of H matrix by following script:

./hiblup --make-xrm
         --pedigree demo.ped 
         --bfile demo 
         --alpha 0.05 
         --add-inv --dom-inv 
         --out demo

the generated filenames are: demo.HAinv.sparse.bin, demo.HAinv.sparse.id, demo.HDinv.sparse.bin, demo.HD.sparse.id.

XRM for environmental random effect

Except for constructing XRM for genetic random effect, HIBLUP can also construct XRM for environmental random effect, phenotype file should be provided for this case, the command is as following:

./hiblup --make-xrm
         --pheno demo.phe
         --rand 6,7
         --out demo

the generated filenames are: demo.loc.sparse.bin, demo.lco.sparse.id, demo.dam.sparse.bin, demo.dam.sparse.id.

NOTE

(1) All above relationship matrix, except for those whose prefix contain character ".sparse.", can be used by GCTA, LDAK, but additional processes are needed, for example:

awk '{print $1,$1}' demo.GA.id > gcta.grm.id
cp demo.GA.bin gcta.grm.bin
# then use it by GCTA
./gcta64 ... --grm gcta ...

(2) The binary format relationship matrix calculated by GCTA or LDAK can be used directly by HIBLUP without any format transformation, just assign its prefix to flag --xrm.

3.5.Format conversion of XRM #

HIBLUP only accept XRM in binary file format, as some software (e.g. DMU, BLUPF90, ASReml) can not recognize it, moreover, it would be hard for programming beginners to prepare those binary files, we therefore provide a format converter for easily linking HIBLUP with other software.

Binary to text

The following command will output the XRM into a square matrix in text format with names of demo.txt and demo.id.txt:

./hiblup --trans-xrm 
         --xrm demo.GA 
         --out demo
#        --triangle

An example for demo.txt with three individuals is as follows:

1.55755	1.04847	0.64622
1.04847	1.58144	0.66971
0.64622	0.66971	1.56794

If the flag --triangle is detected, HIBLUP will only write the lower triangle of the matrix into text file with format of <i j k>, where i, j, k are the row, column index, and its corresponding value respectively. For sparse matrix, HIBLUP will ignore zero elements when outputting "ijk" format. An example file with the same three individuals above is shown below:

1	1	1.55755
2	1	1.04847
2	2	1.58144
3	1	0.646224
3	2	0.669715
3	3	1.56794

Text to binary

If users have their own prior calculated relationship matrix in text format, HIBLUP can transform it into binary file, which can be used by HIBLUP directly, the script is as following:

./hiblup --trans-xrm 
         --xrm-txt demo.txt
         --xrm-id demo.id.txt
         --out demo

Above command requires demo.txt in square format and demo.id.txt. If the file demo.txt is in "ijk" format, please remember to add flag --triangle.

NOTE

HIBLUP can also convert the relationship matrix calculated by GCTA, LDAK to text format:

./hiblup --trans-xrm --xrm gcta.grm --out hiblup

3.6.Calculate inbreeding and relationship coefficient #

Inbreeding and relationship coefficients are derived from either pedigree or genotype information, HIBLUP firstly constructs additive numerator matrix, then derive and output the coefficients in file. The flag for inbreeding coefficient and relationship coefficient is --ibc and --rc, respectively.

Inbreeding coefficients

# derived from pedigree
./hiblup --ibc --pedigree demo.ped --thread 32 --out demo
# or derived from prior calculated PA matrix
./hiblup --ibc --xrm demo.PA --out demo

# derived from genotype
./hiblup --ibc --bfile demo --thread 32 --out demo
# or derived from prior calculated GA matrix
./hiblup --ibc --xrm demo.GA --out demo

The inbreeding coefficients are saved in file demo.ibc:

id        ibc
IND0702  0.00290048
IND0703  0.00262272
IND0704  0.00262272

Relationship coefficients

# derived from pedigree
./hiblup --rc --pedigree demo.ped --thread 32 --out demo
# or derived from prior calculated PA matrix
./hiblup --rc --xrm demo.PA --out demo

# derived from genotype
./hiblup --rc --bfile demo --thread 32 --out demo
# or derived from prior calculated GA matrix
./hiblup --rc --xrm demo.GA --out demo

The relationship coefficients are saved in file demo.rc:

id1     id2     rc
IND0701 IND0701 1
IND0701 IND0702 0.472652
IND0701 IND0703 0.0721178
IND0701 IND0704 0.0721178

NOTE

1. The inbreeding coefficients and relationship coefficients can be calculated at a time, like:

./hiblup --ibc --rc --pedigree demo.ped --out demo
./hiblup --ibc --rc --bfile demo --out demo

2. The genomic inbreeding coefficients and relationship coefficients can be calculated by different methods using the flag --code-method, more details can be seen here.

./hiblup --ibc --rc --bfile demo --code-method 1 --out demo

3.7.Principal component analysis #

HIBLUP has the function of calculating PCA for a population, the genotype or the relationship matrix should be provided.

Calculating PCs from genotype

./hiblup --pca --bfile demo --out demo

Note that user can add flag --step n to control the memory cost, and specify flag --npc n to output several number of PCs instead of all.

Calculating PCs from XRM

./hiblup --pca --xrm demo.GA --npc 5 --out demo

There are two files demo.pc and demo.pcp will be generated.

The file demo.pc contains the calculated PCs, overview of the file:

id       PC1     PC2     PC3     PC4     PC5
IND0701  0.0413326       -0.00664808     -0.00795809     0.0218009       -0.0156735
IND0702  0.00856316      0.0378125       0.000503819     0.0205049       0.00972465
IND0703  -0.00793242     0.0458023       -0.00438572     -0.0173733      -0.0254701
IND0704  -0.00793248     0.0458023       -0.00438575     -0.0173733      -0.0254701

The file demo.pcp contains the Proportion and Cumulative Proportion of variance of PCs, overview of the file:

Components      PC1     PC2     PC3     PC4     PC5
Standard deviation      4.23976    4.10211    3.76024    3.66821    3.60725
Proportion of Variance  0.0224695  0.0210342  0.0176743  0.0168197  0.0162653
Cumulative Proportion   0.0224695  0.0435037  0.0611779  0.0779976  0.094263

3.8.LM, BLUP, PBLUP, GBLUP(rrBLUP), SSGBLUP #

Users can easily implement simple linear model (LM), BLUP, PBLUP, GBLUP (rrBLUP) and SSGBLUP, HIBLUP can automatically switch to one of them according to the data type provided by users. The following table details the required or non-required flags for different model in HIBLUP:

Model--qcovar/--dcovcar--rand--pedigree--bfile
LM×××
BLUPOOO
PBLUPOO×
GBLUPOO×
SSGBLUPOO
Table1: Model switch of HIBLUP by the specified flags
√: required; ×: not required; O: optional.

rrBLUP model is theoretically the same with GBLUP model, users can just add a flag --snp-effect on the base of GBLUP model to obtain the SNP effect for rrBLUP model.

IMPORTANT

It should be noted that, the PBLUP model can be pretty efficient owing to the high performance of FSPAK technology on the very sparse 'LHS' of mixed model equation. However, HIBLUP is designed for genome-based genetic evaluation, in which the 'LHS' is no longer sparse, the 'V' matrix based strategy that is achieved in HIBLUP would be more competent for this case. Although users can fit PBLUP model by using HIBLUP, it may be slower and memory-costing compared with those of the software, e.g., DMU, BLUPF90. Therefore, we recommend using DMU or BLUPF90 when fitting a model only involves pedigree information.

3.9.Fitting interaction for effects #

Environmental interaction (E by E)

Interaction between recorded environments can be fitted as fixed or random effects, please use colon as separator.

If fitted as fixed effects:

./hiblup ... --dcovar 2,2:3:6

Above command represents that the fixed effects of this trait are "sex", the interaction among "sex", "season", "loc".

If fitted as random effects:

./hiblup ... --rand 6,6:7,7

Similarly, the random effect for this trait are "loc", the interaction between "loc" and "dam", and "dam".

For multiple trait model, use space for separator, take 3 traits for an example:

./hiblup ... --dcovar 2:3 0 2,3,2:3 --rand 6 7,6:7 0

which means the fixed effect for the first trait is the interaction between "sex", "season", no fixed effect for the second trait, the fixed effects of the third trait are "sex", "season", and the interaction between "sex", "season"; the environmental random effects for the first trait is "loc", the second trait are "dam"and the interaction between "loc" and "dam", the third trait has no environmental random effect.

Genetic interaction (G by G)

HIBLUP can fit interaction for genetic random effect, for example: additive by additive, additive (A) by dominant (D), dominant by dominant, and so on, the number of genetic random effect at an interaction term is not limited. The interaction of G by G for n genetic random effects in HIBLUP can be mathematically formulated as following:

\begin{equation}
\begin{aligned}
G_{inter} &= \frac{G_1 \bigodot G_2 \bigodot ... \bigodot G_n}{tr(G_1 \bigodot G_2 \bigodot ... \bigodot G_n) / N}
\end{aligned}
\end{equation}

where N is the number of individuals, \bigodot is the Hadamard product (i.e. element-wise multiplica-tion) of all matrix, and tr() is the trace of the matrix.

The only way to fit G by G is to appoint the prefix of the random effects to flag --xrm, please use colon as separator within an interaction term:

# fit A by D only
./hiblup ... --xrm demo.GA:demo.GD
#fit A, D, and A by D
./hiblup ... --xrm demo.GA,demo.GD,demo.GA:demo.GD
#fit A by A, A by D and D by D simultaneously
./hiblup ... --xrm demo.GA:demo.GA,demo.GA:demo.GD,demo.GD:demo.GD

Genetic-Environmental interaction (G by E)

Genetic-Environmental interaction will be included in the model as random effect, please assign a digital value representing its position at phenotype file for environment factors, and assign prefix of the genetic effects to the flag "--rand-gxe".

For single trait model, please use colon as separator within an interaction term, and use comma as separator for multiple interaction terms, for example:

./hiblup --single-trait ... --rand-gxe 6:demo.GA,6:demo.GD
#interaction among multiple effects
./hiblup --single-trait ... --rand-gxe 6:demo.GA:demo.GD
./hiblup --single-trait ... --rand-gxe 6:7:GA:demo

For multiple traits model, please use space among traits, use 0 if there is no G by E interaction to be considered for a trait, take 3 traits for an example:

./hiblup --multi-traits ... --rand-gxe 6:demo.GA 6:demo.GA,7:demo.GA 0

Above command means that the interaction for the first trait is "loc" by additive genetic effect; the interactions for the second trait are "loc" by additive genetic effect and "dam" by additive genetic effect; no interactions are considered for the third trait.

3.10.Single trait model #

Single trait model aims to estimate variance components and its heritability of all random effects for a trait. Example command of running SSGLBUP for single trait model:

./hiblup --single-trait
         --pheno demo.phe
         --pheno-pos 8
         --dcovar 2,3       #fixed effect
         --qcovar 4,5       #covariates
         --rand 6,7         #non-genetic (environmental) random effect
         --pedigree demo.ped    #genetic random effect
         --bfile demo           #genetic random effect
         --add --dom 
         --threads 32 
         --out demo 

There are 5 types of algorithms for variance components estimation:
"AI", "EM", "HE", "EMAI", "HI", where "AI" is AIREML,"EM" is EMREML, and "HE" represents HE regression, "HI" is "HE+AI", which uses the outcome of HE regression as the start values of AIREML, users can choose one of them by flag --vc-method, and change the maximum iteration number for AIREML and EMREML by flag --ai-maxit and --em-maxit respectively. For "HE" regression, if covariates for fixed effects are provided, HIBLUP will first regress the phenotype on these, then perform HE Regression using the residuals, the standard error(SE) of HE is computed by Jackknife resampling strategy at 100 repeats.

However, above command maybe time-consuming when fitting trait by trait, as it repeatedly constructs XRM for each trait. We can construct XRM only once at the beginning, then assign it to --xrm when fitting multiple traits, which can give the same results, but would be much more efficient, the command can be referred as following:

# Step 1: construct XRMs
./hiblup --make-xrm
         --pedigree demo.ped
         --bfile demo
         --add --dom
         --out demo

# Step 2: link XRMs to fit model
./hiblup --single-trait 
         --pheno demo.phe 
         --pheno-pos 8 
         --dcovar 2,3 
         --qcovar 4,5 
         --rand 6,7 
         --xrm demo.HA,demo.HD 
         --add --dom 
         --threads 32 
         --out demo 

Several files will be generated in the work directory:
demo.vars: the estimated variance and heritability of all random effects:

	Var	Var_SE	h2	h2_SE
loc	12.2127	8.9109	0.1035	0.0682
dam	10.6268	4.8605	0.0901	0.041
HA	59.2007	11.7641	0.5019	0.0807
HD	28.9946	8.9294	0.2458	0.083
e	6.9232	1.6957	0.0587	0.0161

demo.beta: the estimated coefficients of all covariates and fixed effects:

Levels	Estimation	SE
mu	32.5285	3.4758
sex_F	6.2933	1.7593
sex_M	26.2353	1.7634
season_Autumn	18.0905	0.9753
season_Spring	-1.9168	0.9904
season_Summer	7.8471	1.0066
season_Winter	8.5078	0.9944
day	0.1547	0.0574
bornweight	1.5703	0.4614

demo.anova: analysis of variance table of all covariates and fixed effects:

Factors	Df	SumSq	MeanSq	F	Pr(>F)
sex	1	49710.3845	49710.3845	7180.311	<2e-16
season	3	24515.0941	8171.698	1180.344	<2e-16
day	1	925.3352	925.3352	133.658	<2e-16
bornweight	1	437.3432	437.3432	63.171	1.30E-14
e	493	3413.1141	6.9232		

demo.rand: the estimated environmental random effects, genetic random effects (the column named "PA", "GA", or "HA" is the list of breeding values for phenotypic and non-phenotypic individuals) and residuals.

Users can specify the start values for all unknown variance to --vc-priors for fast convergence, the way of how to input the variance components can be found at here.

3.11.Repeat records of single trait model #

Running repeat records model is nearly the same with fitting single trait model, the only difference is that repeat records model can fit the permanent environment effect, which can be achieved by assigning individual id at the first column of phenotype file to environmental random effect, take GBLUP model for an example:

./hiblup --single-trait 
         --pheno demo.repeat.phe
         --pheno-pos 2
         --rand 1
         --bfile demo
         --threads 32
         --out demo

3.12.Multiple traits model #

Multiple traits model is used for estimating genetic correlation among traits, there is no limitation of the number of traits used in analysis for HIBLUP, moreover, users can appoint different covariates, fixed and random effect for different traits. PBLUP, GBLUP, SSGBLUP are all available for multiple traits model, HIBLUP can automatically switch to one of them according to the provided file.
Taking GBLUP for an example, the multiple model can be achieved by following command:

./hiblup --multi-trait
         --pheno demo.phe
         --pheno-pos 8 9 10
         --qcovar 4,5 5 4
         --dcovar 2,3 0 2
         --rand 6,7 7 0
         --xrm demo.GA,demo.GD  # same with --bfile demo --add --dom
         --vc-method AI
         --ai-maxit 30
         --threads 32
         --out demo

The way of how to set covariates (--qcovar), fixed (--dcovar) and random (--rand) effect for multiple traits, please refer to here.

Several files will be generated:

demo.vars: the variance, heritability and SE of all random effects for all traits:

Item	Var	Var_SE	h2	h2_SE
tr1_loc	13.73854	9.9309	0.11739	0.07538
tr1_dam	2.29288	3.18932	0.01959	0.02731
tr1_demo.GA	61.35173	9.57171	0.52423	0.06964
tr1_demo.GD	29.93077	6.27996	0.25575	0.06022
tr1_e	9.71917	1.67088	0.08305	0.01696
tr2_dam	3.42172	3.80424	0.02971	0.03308
tr2_demo.GA	63.59	10.52058	0.55219	0.06017
tr2_demo.GD	38.14332	7.27491	0.33122	0.06696
tr2_e	10.00395	1.63578	0.08687	0.01578
tr3_demo.GA	21.03906	6.94184	0.2203	0.0668
tr3_demo.GD	15.25593	10.56861	0.15974	0.10919
tr3_e	59.20689	9.04084	0.61996	0.09801

demo.covars: the co-variance, correlation and SE of all traits for all genetic random effects:

Item	COVar	COVar_SE	r	r_SE
tr1:tr2_demo.GA	46.6899	8.56852	0.747524	0.171364
tr1:tr3_demo.GA	16.3376	5.96333	0.454749	0.185599
tr2:tr3_demo.GA	12.3489	6.21401	0.337612	0.157671
tr1:tr2_demo.GD	26.9273	4.65452	0.796922	0
tr1:tr3_demo.GD	-2.25039	4.95673	-0.105301	0.23413
tr2:tr3_demo.GD	8.23124	5.21494	0.341194	0.259133
tr1:tr2_e	0.24039	1.17428	0.024379	0.119333
tr1:tr3_e	-1.10713	2.87487	-0.0461537	0.11985
tr2:tr3_e	-4.80127	2.85341	-0.197284	0.128375

demo.*.anova: the variance analysis table of covariates, fixed effects for different traits.
demo.*.beta: the estimated coefficients and SE of covariates, fixed effects for different traits.
demo.*.rand: the estimated environmental random effects, genetic random effects (the column named "PA", "GA", or "HA" is the list of breeding values for phenotypic and non-phenotypic individuals) and residuals. for different traits.

Users can specify the start values for all unknown variance and co-variance to --vc-priors and --covc-priors, the way of how to input the genetic components can be found at here.

NOTE

HIBLUP only estimates co-variance for genetic random effect and residual effect, not for environmental random effect, therefore at least one genetic random effect should be included in the model, which can be from pedigree (--pedigree), genotype (--bfile) or XRMs (--xrm). If users insist on estimating correlation for environmental random effects, just make relationship matrix for those random effects by the flag --make-xrm (see the chapter "XRM for environmental random effect" here), then assign it to the option --xrm.

3.13.Solve mixed model equation #

The objective of solving mixed model is to obtain the estimated breeding values of all individuals by using the known genetic parameters. HIBLUP automatically switches to single trait or multiple traits model by the length of specified flag --pheno-pos. Taking GBLUP model for an example, the command for single trait and multiple traits model is as following:

# (1) additive effect for single trait model
./hiblup --mme
         --pheno demo.phe
         --pheno-pos 8
         --rand 6
         --xrm demo.GA   # can be replaced by --bfile demo --add
         --vc-priors v1,v2,v3
         --pcg
         --threads 32
         --out demo

# (2) additive and dominant effect for multiple traits model
./hiblup --mme
         --pheno demo.phe
         --pheno-pos 8 9 10
         --rand 6,7 7 0
         --bfile demo   # can be replaced by --xrm demo.GA,demo.GD
         --add --dom
         --vc-priors t1_v1,t1_v2,t1_v3,t1_v4,t1_v5 t2_v1,
                     t2_v2,t2_v3,t2_v4 t3_v1,t1_v2,t1_v3
         --covc-priors cov1,cov2,cov3 cov4,cov5,cov6 cov7,cov8,cov9
         --pcg
         --threads 32
         --out demo

By default, HIBLUP directly compute the inverse of V matrix, users can try to add flag --pcg to use Preconditioned Conjugate Gradients algorithm to solve the system of linear equation, which is much more faster in certain situation.

IMPORTANT

(1) --vc-priors contains the variance components of all random effects, including environmental random effects, genetic random effects, and residuals, the provided variance components should also be in this order.
For single trait model, please use comma as separator, as shown in the first example command:

v1: the variance of the environmental random effect located
    at the 6th column in phenotype file.
v2: the additive genetic variance.
v3: the residual variance.

For multiple traits model, the variance components are listed trait by trait orderly, and please use comma within a trait, use space among traits, as shown in the second example command:

t1_v1: the variance of the environmental random effect located 
       at the 6th column in phenotype file for the first trait.
t1_v2: the variance of the environmental random effect located 
       at the 7th column in phenotype file for the first trait.
t1_v3: the variance of additive genetic effect for the first trait.
t1_v4: the variance of dominant genetic effect for the first trait.
t1_v5: the variance of residuals for the first trait.
t2_v1: the variance of the environmental random effect located 
       at the 7th column in phenotype file for the second trait.
t2_v2: the variance of additive genetic effect for the second trait.
t2_v3: the variance of dominant genetic effect for the second trait.
t2_v4: the variance of residuals for the second trait.
t3_v1: the variance of additive genetic effect for the third trait.
t3_v2: the variance of dominant genetic effect for the third trait.
t3_v3: the variance of residuals for the third trait.

(2) --covc-priors contains the co-variance components among traits at all genetic random effects, the co-variance components are listed by genetic random effect orderly, and please use comma within a genetic random effect, use space among genetic random effects, the input order can be referred as following:

where each box represents a genetic random effect, the off-diagonals are the co-variances between traits at this genetic random effect, and the diagonals are its variance at different traits. The number of box is equal to the number of XRM plus 1, where "1" is the residual. Users need to assign values at the lower triangle of each box to HIBLUP:

cov1: the co-variance between the first and second trait at "demo.GA".
cov2: the co-variance between the first and third trait at "demo.GA".
cov3: the co-variance between the second and third trait at "demo.GA".
...

The order of input co-variances should strictly follow the route in the above figure.

3.14.Statistical tests of significance for effects #

For fixed effects and covariates, HIBLUP can do statistical test automatically, and list the results in file *.anova, the p-value of F test at the last column could be used as a diagnosis of which fixed effects or covariates are statistically significant for a trait.

For random effect (genetic or environmental), there is no automatic function of statistical test for significance, users can achieve it manually by the Likelihood Ratio Test (LRT) following the steps below:

(1) run full model with the random effect of interest, get the loglike value L1 (which can be found from the LOG information of HIBLUP at the last iteration of REML).

(2) run reduced model in which the random effect of interest is dropped off, get the loglike value L0.

(3) calculate the p-value by Chi-square test in R:

> stat <- 2 * (L1 - L0)
> pvaue <- pchisq(stat, df = 1, lower.tail = FALSE)
> print(pvaue)

3.15.Reliability and PEV #

For single trait model, multiple traits model and solving mixed model equation, HIBLUP can calculate the reliability and prediction error variance (PEV) of all random effects. The mathematical formula for reliability calculation is as follows:

\begin{equation}
\begin{aligned}
r^{2} &= 1-\frac{PEV}{\sigma^{2}}
\end{aligned}
\end{equation}

where \sigma^{2} is the estimated variance from REML analysis for the random effect of interest. The prediction error variance PEV can be obtained from the following equation:

\begin{equation}
\begin{aligned}
PEV &= C_{ii}*\sigma_{e}^{2}
\end{aligned}
\end{equation}

where \sigma_e^{2} is the estimated residual variance. C_{ii} is the i_{th} diagonal element of the inverse matrix of left hand side (LHS) of mixed model equation.

By default, HIBLUP does not calculate and output those statistics, as the reliability estimation usually takes longer than whole REML process, thus please add flag --r2 only if you really want to get the reliability.

Taking single trait model for an example, the script to obtain the reliability is as follows:

./hiblup --single-trait
         --pheno demo.phe
         --pheno-pos 8
         --dcovar 2,3       #fixed effect
         --qcovar 4,5       #covariates
         --rand 7           #non-genetic (environmental) random effect
         --bfile demo       #genetic random effect
         --r2               #to calculate the reliability
         --threads 32
         --out demo 

Then each random effect, including residual, has several additional columns listing its corresponding standard error, PEV and reliability, overview of the file:

ID      dam       dam_SE   dam_PEV  dam_R2    GA        GA_SE    GA_PEV   GA_R2     residuals  residuals_SE residuals_PEV residuals_R2
IND1001 0.496645  3.75992  14.137   0.255863  6.66127   4.27597  18.2839  0.774767  -1.45194   3.18313      10.1323       0.55248
IND1002 0.496645  3.75992  14.137   0.255863  6.66127   4.27597  18.2839  0.774767  2.04383    3.17906      10.1064       0.553624
IND1003 0.674833  3.53678  12.5088  0.341566  2.08201   4.62091  21.3528  0.736963  -0.104053  4.14567      17.1866       0.24091
IND1004 0.674833  3.53678  12.5088  0.341566  -0.682836 4.70199  22.1087  0.727651  0.9083     4.17824      17.4577       0.228936

3.16.Compute SNP effect #

HIBLUP now can compute the additive and dominant effects for all SNPs. There are two ways to compute SNP effects, the first way is to add flag --snp-effect in the command when fitting single trait, multiple traits model, or solving mixed model:

# compute additive SNP effect by SSGBLUP of single trait model
./hiblup --single-trait
         --pheno demo.phe
         --pheno-pos 8
         --bfile demo
         --pedigree demo.ped
         --add
         --snp-effect
         --thread 32
         --out demo

the second way is to derive SNP effect from prior calculated GEBVs:

# compute dominant SNP effect
./hiblup --snp-effect
         --gebv demo.gebv.d.txt
         --bfile demo
         --dom
         --thread 32
         --out demo

The file format of GEBVs can be found at here. The genotype file specified by --bfile is used to construct GRM and compute SNP effect, for the second way, HIBLUP will construct additive or dominant GRM for individuals, to save time, users can provide prior calculated GRM, HIBLUP will use it directly:

# compute additive and dominant SNP effect
./hiblup --snp-effect
         --gebv demo.gebv.ad.txt
         --bfile demo
         --xrm demo.GA,demo.GD
         --add --dom
         --thread 32
         --out demo

The SNP effects of all SNPs are stored in file demo.snpeff.

NOTE

If the GEBVs are calculated by using a weight GRM, please remember to provide the same weights of all SNPs to flag --snp-weight.

3.17.Genomic mating #

Genomic mating is a technology unionizing all available SNPs to predict the expected genetic performance of progeny for all combinations between males and females. To implement genomic mating, genotype of all individuals and additive or dominant effects of all SNPs should be provided. The format of SNP effect file can be found at here. The command to do genomic mating by HIBLUP is as following:

./hiblup --mating
         --bfile demo
         --score demo.snpeff
         --thread 32
         --out demo

Mating results are saved in file demo.mating, brief view of this file:

sir 	 dam 	 g
Ind704 	 Ind710 	 -0.0675817
Ind704 	 Ind711 	 1.06056
Ind704 	 Ind712 	 2.00997
Ind704 	 Ind713 	 -0.0879987
Ind704 	 Ind714 	-0.492576

The last column is the expected genetic performance of progeny for different combinations.

IMPORTANT

(1) The SNP with the same name between demo.bim and demo.snpeff should have the consistent reference (A1) and alternative (A2) allele.
(2) All the individuals in demo.fam should be clearly marked with gender (located at the 5th column, 1=male; 2=female; other=unknown), any individual with unknown gender will be deleted in analysis.

4.Questions #

Q1: What are the symbols (e.g. "C", "F", "R", "GR") in model equation of LOG file?

The meanings of the symbols are below:
C: quantitative covariates
F: discrete covariates, also is called as fixed effect
R: environmental random effect
GR: genetic random effect
PA: pedigree based additive relationship matrix
PD: pedigree based dominant relationship matrix
GA: genome based additive relationship matrix
GD: genome based dominant relationship matrix
HA: pedigree and genome based additive relationship matrix
HD: pedigree and genome based dominant relationship matrix

Q2: How to decrease the memory cost for analysis?

There are two strategies:
1. assign lower steps to the option --step when genotype is used for analysis.
2. add flag --float-prec in the command, it can nearly save half of memory cost, but it should be noted that mathematical operation in float precision may lose the accuracy of estimated parameters for specific cases.

Q3: Can options "--extract", "--exclude", "--remove", "--keep" work for all functions in HIBLUP?

Yes, if the analysis involves individuals and SNPs, all above options can work effectively.

Q4: Can HIBLUP estimate genetic correlation for environmental random effect?

Yes, users can firstly construct XRM for this environmental random effect (see Construct relationship matrix), then assign it to flag --xrm when fitting the multiple traits model.

Q5: Can users assign the relationship matrix in text format to HIBLUP directly?

By now, HIBLUP can only accept relationship matrix in binary format (*.bin and *.id), relationship matrix in text format is unacceptable, users can convert it to binary format by the function --trans-xrm (see Format conversion of XRM).

Q6: How to compute the proportion of phenotypic variance explained (PVE) for significant SNPs by HIBLUP?

It's easy to be achieved by HIBLUP. Firstly, please write the id of significant SNPs into a file snp.txt as a column without header, then calculate the variance components by following steps:

#step1: construct GRM1 using the significant SNPs
./hiblup --make-xrm 
         --bfile demo 
         --extract snp.txt 
         --out grm1 
         --threads 32    # grm1.GA.id, grm1.GA.bin will be generated
#step2: construct GRM2 using the SNPs excluding the significant SNPs
./hiblup --make-xrm 
         --bfile demo 
         --exclude snp.txt 
         --out grm2 
         --threads 32    # grm2.GA.id, grm2.GA.bin will be generated
#step3: compute the variance components for GRM1 and GRM2 by single trait model
./hiblup --single-trait 
         --pheno demo.phe 
         --pheno-pos X 
         --xrm grm1.GA,grm2.GA 
         --out vc 
         --threads 32    
#step4: calculate the PVE for significant SNPs by following equation:
         pve = V(grm1) / ( V(grm1)+V(grm2)+V(e) )