Methodology
Transparent, versioned, reproducible. This section explains the statistical models, outcomes, features, validation framework, and governance used for the UPROB and UHZN indices.
Model 1 - Duration Model
Estimates how long it takes a company to reach unicorn status (≥ $1B).
Duration_i = α + β₁·City_i + β₂·Country_i + β₃·Industry_i + β₄·Investor_i + β₅·log(ATR)_i + ε_i Duration_i: duration (months) until unicorn for company i log(ATR)_i: log average annual funds raised
Implemented with survival models (AFT, Cox) and ML survival GBMs.
Model 2 — Latent Valuation Model
Regression-style model estimating the latent valuation signal determining whether the company crosses the $1B threshold.
UnicornValuation_i* = β₀ + β₁·Age_i + β₂·City_i + β₃·Country_i + β₄·Investor_i + β₅·FundRaised_i + ε_i Age_i: years since founding
Outcomes & Cohorts
- Event: first valuation ≥ $1B
- UPROB: probability of reaching unicorn status in 12/24/36 months (with right-censoring)
- UHZN: survival-analysis median time-to-event with confidence intervals
Features (Examples)
- Age, HQ city, HQ country, sector
- Funding cadence, round size, cumulative capital raised
- Investor quality, syndicate structure
- Hiring velocity, web traffic growth
- Market competitiveness and signal data
Model Families Used
- UPROB: gradient boosting, logistic regression, and calibrated probability models (isotonic / Platt).
- UHZN: Cox, AFT, and GBM-survival models with uncertainty bands.
Validation Framework
- AUC / PR: ranking power
- Brier / ECE: calibration accuracy
- Concordance / IBS: survival model fit
- Temporal out-of-time (OOT) validation
- Stability & robustness tests across data vintages
Governance & Reproducibility
- Every model has a version ID and full changelog
- Each weekly release tied to a specific data vintage snapshot
- Reproducibility includes seeds, configs, preprocessing steps
- Fairness & bias evaluations documented