2025 Sockeye International submission | The Salmon Prize Project

2025 Sockeye International

View Competition

Predictions

Ugashik River's Sockeye run

7,151,922

Quesnel's Sockeye run

382,076

Kvichak River's Sockeye run

11,834,985

Stellako's Sockeye run

93,809

Egegik River's Sockeye run

9,267,450

Raft River's Sockeye run

18,563

Igushik River's Sockeye run

2,458,851

Naknek River's Sockeye run

3,793,664

Wood River's Sockeye run

10,861,137

Chilko River's Sockeye run

865,756

All of Columbia River's Sockeye run

215,762

Nushagak River's Sockeye run

11,361,826

Alagnak River's Sockeye run

4,119,876

Stuart River's Late Sockeye run

494,202

Prediction method

Submitted on Jul 08, 2025

Combination of Dynamic Linear Sibling (Cohort) Models and Boosted Regression Trees

Abstract

Two classes of models were used to generate forecast predictions: (1) dynamic linear sibling (or cohort) regression models, and (2) boosted regression trees. The performance of each class of model was compared at the age class and stock level over the recent 10-year period. Sibling DLM's were implemented at the stock-age level, with dynamic linear and dynamic log-log versions evaluated for performance. The preferred model for each stock-age combination was then summed to the total stock prediction for 2025. Boosted regression trees were implemented at the stock level, leveraging sibling, lagged spawning abundance, and lagged return abundance, alongside broad-scale ecosystem indicators (PDO, NPGO etc.). For each stock either the DLM or BRT prediction was selected based on overall retrospective performance, and in one case (Egegik) and ensemble of the DLM and BRT predictions were used.

Supporting Documents

No documents submitted

Prediction Model

Submitted on Jul 08, 2025

Description

Dynamic linear sibling or cohort regression models were used to predict the 2023 return abundance of the 1.2, 1.3, 2.2, and 2.3 age classes (European age designation). Sibling or more precisely cohort regression models use the return abundance of younger members of the same cohort, that experienced the same conditions at ocean entry, to predict older members of the same cohort. For example, a high return abundance of the 1.1 age class for a stock in 2020 (2017 cohort) might suggest that the abundance of the 1.2 age class may be higher than average in the following year 2021, because these fish encountered conditions conducive to high survival at ocean entry in the summer/fall of 2019 and winter of 2020. These sibling regression models implicitly assume a static relationship between the return abundance of younger and older individuals from the same cohort, which is violated by long-term changes in the average age structure at return, arising from changes in maturation schedule or late-stage marine mortality. Dynamic linear sibling regression models allow these relationships to evolve over time, and generate predictions given the estimated current state of the system. The specific model structure predicts the abundance of returning fish from stock s of ocean age a in calendar year t, R(s,t,a), as a function of the return abundance of the same stock in the previous year (t-1) at ocean age a-1. R(s,t,a) = a(t) + b(t)*R(s,t-1,a-1) + e(t) Where a(t) is the value of a dynamic intercept coefficient in year t, describing changes in average production of ocean age a individual over time, and b(t) describes the relationship between the age classes and changes if present over time. Both a(t) and b(t) are estimated as random walks where: a(t) ~ Normal(a(t-1), sigma_a) b(t) ~ Normal(b(t-1), sigma_b) And sigma_a and sigma_b are process variation terms describing the magnitude of average interannual change in a(t) and b(t), and e(t) is the residual variation in the regression relationship: e(t) ~ Normal(0, sigma_e). Boosted regression trees were implemented at the stock level within the tidymodels framework in R, using the xgboost implementation. Features used to predict annual total (i.e. across age classes) run size in year (t) included, run sizes in t-1, t-2, t-3, t-4, t-5, spawning abundance in t-4, t-5, t-6, t-7, the abundance of "sibling" age classes (1.1, 1.2, 1.3, 2.1, 2.2, 2.3) in t-1, and the average value of the Pacific Decadal Oscillation, North Pacific Gyre Oscillation, and the Arctic Oscillation indices averaged across the months January-May in year t-2. The time lags were identified to encompass the dominant age classes for each each stock.