Single-step genetic predictions for US crossbred Holstein-Jersey cattle.

Highlights

The number of genotypes of crossbred animals is increasing in US dairy farms.
Including crossbred data in genomic evaluations is possible.
This study analyzed purebred and crossbred data together.
Single-step genomic predictions for crossbred cows were more accurate than predictions based on SNP effects and breed proportions.

The number of crossbred genotypes in the dairy cattle sector has increased, necessitating the inclusion of crossbred animals in genomic evaluations. This study aimed to investigate the feasibility of including crossbred genotypes in multibreed, single-step genomic BLUP (ssGBLUP) evaluations. The Council of Dairy Cattle Breeding provided over 47 million lactation records registered between 2000 and 2021 in purebred Holstein and Jersey and their crosses. A total of 27 million animals were included in the analysis, of which 1.4 million were genotyped. Milk, fat, and protein yields were analyzed in a 3-trait repeatability model using BLUP or ssGBLUP. The two models were validated using prediction bias and accuracy computed for genotyped cows with no records in the truncated dataset and at least one lactation in the complete dataset.

The genomic predictions of crossbred genotyped cows were slightly more accurate than purebred cows. Multistep evaluations are still the official route to obtaining genomic predictions for dairy cattle in the United States, which comprises a multibreed best linear unbiased predictor (BLUP) followed by a single-breed estimation of single nucleotide polymorphism (SNP) effects. After estimating single-breed SNP effects, direct genomic values (DGV) are computed for genotyped animals as a sum of SNP effects weighted by the genotype content. Genomic PTA are then calculated as a linear combination of DGV and parent average (PA).

However, routine genomic evaluations for dairy cattle do not consider crossbreds and are typically made separately by breed. There are several studies about genetic and genomic predictions for crossbred cattle, such as breed composition (BC) or proportion. In the United States, the number of available genotypes of crossbred cattle quickly increased to 150,000 in 2021. New concepts were proposed in the genomic era: genomic BC (Hulsegge et al., 2013) and breed base representation (BBR; VanRaden and Cooper, 2015). Both methods partition the genotype of a crossbred animal according to the proportion of the genome originating from each breed, and the genomic predictions of the purebreds are usually proportionally combined to evaluate the crossbred animals.

Computing SNP effects based on crossbred reference populations in multistep methods could help increase reliabilities, but this option becomes less straightforward when the breed proportion varies within the population and there are no clear boundaries between classes to create proper training sets. A different approach to obtaining genomic predictions for crossbred animals is to include their genotypes in the single-step GBLUP (ssGBLUP) method, which relies on the use of the inverse of a modified relationship matrix (H), combining the numerator relationship matrix (A) and the genomic relationship matrix (G).

Cesarani et al., 2022, conducted a multibreed ssGBLUP evaluation for Ayrshire, Brown Swiss, Guernsey, Holstein, and Jersey cattle. The authors found that reliabilities from the multibreed model were similar to those from single-breed models, which was surprising due to the unbalanced number of genotyped animals within each breed. However, proper modeling of genetic differences among breeds helped to avoid loss of predictive power when using only purebred animals.

As the number of genotyped crossbred animals in US dairy cattle is rapidly increasing, it would make sense to consider them in the evaluation together with their purebred ancestors. Some studies reported increased reliabilities of this approach in dairy cattle using less than 10k genotyped individuals in ssGBLUP and less than 50k in GBLUP and BayesR. This study aims to expand on the research findings of Cesarani et al., 2022, and include genotypes for crossbreds in a large-scale, joint Holstein-Jersey ssGBLUP evaluation in the United States.

Data used in the official multibreed genomic evaluations for US dairy cattle breeds were provided by the Council on Dairy Cattle Breeding. The analyses considered 305-d milk (MY), fat (FY), and protein (PY) yields for the first 5 lactations recorded from January 1, 2000, to August 2021. All data were preadjusted to have the genetic variance equal across time, breed, and herd and to have the same heritability of 0.20.

Animals were genotyped with 48 different arrays ranging from less than 3k to more than 600k SNPs. Genotypes were imputed, within each breed, to a common set of 79,294 selected SNPs using Findhap v3. Crossbreds were imputed separately, and genotypes for the purebred parents of all breeds were included to improve imputation.

Two evaluation methods were considered: (1) traditional BLUP and (2) ssGBLUP with unknown parent groups (UPG) for A and A22. A total of 16 UPG were considered and defined based on breed (HO or JE), sex, and year of birth. The algorithm for proven and young (APY) was used for ssGBLUP with 45,000 randomly selected animals as the core.

The data were analyzed with a 3-trait repeatability animal model that included herd management, age-parity, inbreeding coefficient, and heterosis as fixed effects; UPG as fixed effect; and herd-sire, animal, and permanent environment as random effects. Heterosis was calculated from the full pedigrees going back as many generations as recorded. For ssGBLUP, all the genotyped animals were used simultaneously in the construction of G, which was blended with 5% of A22 to avoid singularity and include a residual polygenic effect.

The study aimed to validate the predictive ability of a genomic model for crossbred cattle using BLUP and single-step genomic BLUP (ssGBLUP). Three sets were created: purebred Holstein (n = 688,985), purebred Jersey (n = 119,743), and CROSS animals (n = 3,235). The CROSS group only had cows because most of the crossbred animals are genotyped to accelerate commercial herd management. Two datasets were considered: complete (with phenotypes recorded from January 2000 to August 2021) and reduced (up to August 2017). Genotyped cows with phenotypes in the complete but not in the reduced dataset were included in the validation set.

Average predictive abilities across traits estimated with BLUP were 0.33, 0.30, and 0.26 for HO, JE, and CROSS groups, respectively. As expected, genomic information improved the predictability for all traits and groups. The breeding values estimated in the present paper for purebred HO and JE cows were compared with those estimated in Cesarani et al., 2022 to investigate the impact of including crossbred animals in the analysis. A total of 17.6 million and 1.7 million HO and JE animals were shared between the two analyses, and correlations between BV estimated in the two studies ranged from 0.98 (MY for JE) to 1.00. The correlation for young bulls was also larger than 0.99.

In terms of regression coefficients of YADJ on EBV from BLUP, the inclusion of crossbred phenotypes led to poorer results compared with Cesarani et al., 2022. However, values calculated for the two purebreds using ssGBLUP were almost the same with or without the crossbred data, suggesting greater stability of the genomic model.

The average predictive ability and stability computed using BLUP for crossbred animals were lower than for the two purebreds, but the predictive ability computed for MY in the CROSS group was larger than the values for HO (0.30) and JE (0.33). Under ssGBLUP, average values for predictive ability and stability were slightly higher in CROSS than in HO (0.55 and 0.95) and JE (0.50 and 0.93) cows. Predictive abilities consider adjusted phenotypes, which remove fixed effects from the phenotypes. In the present study, using genomic information within the ssGBLUP model could have partially overcome the absence of breed as a fixed effect. Assuming that accuracies are inflated for crossbreds due to incomplete accounting for BC, the inflation can be reduced by better accounting for this effect (Misztal et al., 2022).

The higher accuracies for crossbreds in MY could be explained by the larger phenotypic difference between HO and JE, reflecting a greater genetic difference between the two originating breeds. These breed differences, which can be easily predicted from the genotyped animals, can contribute to larger reliabilities in the crossbred population in a scenario where the genomic predictions of crossbred animals are weighted according to each breed’s DNA proportion (VanRaden et al., 2020).

Higher accuracy reported for crossbred animals is not uncommon in dairy cattle (Winkelman et al., 2015; Khansefid et al., 2020), and other species (Hidalgo et al., 2016). In their study, predictions for crosses were consistently more accurate than for Jersey, except for longevity. Crossbred dairy cattle had higher accuracy when their data were considered in the reference population (Khansefid et al., 2020).

In the present study, the benefits from directly including the genomic information in a single step exceeded any initial disadvantage in pedigree modeling. The average improvement with genomics varied according to the BBR of the crossbred cows: the largest increase was observed for cows with BBR between 75% and 89%. The average improvement using genomics reported by VanRaden et al., 2020, is much lower than the improvements found in the present study.

For dairy cattle, inflation values of 1 ± 0.15 are still acceptable (Tsuruta et al., 2011). According to the Interbull validation, the b1 values estimated with ssGBLUP were all within 1 ± 0.1. The average value was 1.02 ± 0.06, ranging from 0.90 to 1.09, whereas for BLUP, the EBVs were more inflated (0.81 ± 0.09) and with a more extensive range (0.72–0.91). In ssGBLUP, all validation groups showed nonbiased average predictions. The number of genotyped animals considered in the present study was very similar to VanRaden et al., 2020, but larger than other studies.

The genomic era has revolutionized the process of assigning the proportion of a crossbred individual’s genotype to the originating breeds. However, identifying a specific breed origin for each SNP can be challenging. In this study, genotypes of purebred and crossbred were considered together, and G accounted for the relationship among them. Genomic predictions of less numerous breeds and crossbred animals from ssGBLUP could be worsened if there is an imbalanced number of genotypes among breeds.

In the present study, crossbred animals represented less than 1% of the genotyped animals, and most (about 80%) were considered validation animals. However, including crossbred and purebred data in a ssGBLUP model could enhance the prediction of crossbred animals through the H matrix. The impact of including a fixed number of purebred and crossbred animals in the core for APY deserves further investigation.

The genomic setup took about 10 hours, while the EBV computation took around 4 hours. The solving process for ssGBLUP took 3 more hours, resulting in a genomic process carried out in less than one day for these three traits. Further computational improvements could be achieved by indirectly predicting young genotyped animals or using solutions from previous runs.

Crossbred data can be included in multibreed US dairy cattle single-step evaluations without reducing accuracy or increasing inflation of genomic EBV for purebred animals. This evaluation system allows similar gains in accuracy for purebreds and crossbreds, simplifying genetic evaluation pipelines and increasing computing efficiency while delivering predictions for managing commercial crossbred herds.

Search

Single-step genetic predictions for US crossbred Holstein-Jersey cattle.

Highlights

Recent Articles

News