Methodology

Transparent, versioned, reproducible. This section explains the statistical models, outcomes, features, validation framework, and governance used for the UPROB and UHZN indices.

Model 1 - Duration Model

Estimates how long it takes a company to reach unicorn status (≥ $1B).

Duration_i = α + β₁·City_i + β₂·Country_i + β₃·Industry_i + β₄·Investor_i + β₅·log(ATR)_i + ε_i

Duration_i: duration (months) until unicorn for company i
log(ATR)_i: log average annual funds raised

Implemented with survival models (AFT, Cox) and ML survival GBMs.

Model 2 — Latent Valuation Model

Regression-style model estimating the latent valuation signal determining whether the company crosses the $1B threshold.

UnicornValuation_i* = β₀ + β₁·Age_i + β₂·City_i + β₃·Country_i + β₄·Investor_i + β₅·FundRaised_i + ε_i

Age_i: years since founding

Outcomes & Cohorts

  • Event: first valuation ≥ $1B
  • UPROB: probability of reaching unicorn status in 12/24/36 months (with right-censoring)
  • UHZN: survival-analysis median time-to-event with confidence intervals

Features (Examples)

  • Age, HQ city, HQ country, sector
  • Funding cadence, round size, cumulative capital raised
  • Investor quality, syndicate structure
  • Hiring velocity, web traffic growth
  • Market competitiveness and signal data

Model Families Used

  • UPROB: gradient boosting, logistic regression, and calibrated probability models (isotonic / Platt).
  • UHZN: Cox, AFT, and GBM-survival models with uncertainty bands.

Validation Framework

  • AUC / PR: ranking power
  • Brier / ECE: calibration accuracy
  • Concordance / IBS: survival model fit
  • Temporal out-of-time (OOT) validation
  • Stability & robustness tests across data vintages

Governance & Reproducibility

  • Every model has a version ID and full changelog
  • Each weekly release tied to a specific data vintage snapshot
  • Reproducibility includes seeds, configs, preprocessing steps
  • Fairness & bias evaluations documented