“16.5k stars, AAAI 2026 paper — but an open issue alleges its benchmark numbers come from a data leakage bug.”
Kronos pre-trains a Transformer on 12 billion K-line records from 45 global exchanges — the first open-source foundation model built specifically for financial candlestick data. It tokenizes OHLCV price data into discrete tokens and predicts future candles autoregressively, achieving 93% RankIC improvement over the top general-purpose TSFM per the paper. But an open GitHub issue (#227) alleges the finetuning pipeline leaks future information through per-window normalization, and r/quant users report predictions that diverge wildly from input ranges.
You know that feeling when you try to apply a general-purpose time series model to financial data and it misses the noise patterns, the regime changes, and the cross-asset dynamics that make markets unique? Existing TSFMs like TimesFM and Chronos treat all time series the same — weather, server metrics, stock prices. Financial candlestick data has unique characteristics (OHLCV structure, high noise, non-stationarity) that general models handle poorly. Kronos targets this gap with a finance-specific tokenizer and pre-training on 12B+ K-line records.
Think of it like a language model, but instead of words it reads candlestick bars. Step 1: A hierarchical tokenizer converts each OHLCV bar (Open, High, Low, Close, Volume) into discrete tokens that preserve price dynamics and trade activity. Step 2: A decoder-only Transformer (4.1M to 102.3M params, open-sourced) is pre-trained on 12B+ tokenized K-line records using next-token prediction — same objective as GPT. At inference, you give it historical OHLCV data and a future timestamp range, and it autoregresses forward to generate forecasted candles with temperature and nucleus sampling for probabilistic outputs.
If you're a quant researcher or ML engineer building price forecasting, volatility modeling, or synthetic data pipelines for financial markets, this is directly relevant. Also useful if you study tokenizer design for non-language domains. Not useful if you need cross-asset portfolio signals in a single forward pass, or if you want a production-ready trading system — the authors themselves call it...
Worth exploring as a research artifact and educational reference for finance-specific tokenizer design. The AAAI 2026 acceptance gives it academic credibility. However, tread carefully: the data leakage allegation in issue #227 is unresolved, users report broken predictions in issue #229, the repo has no formal releases, no maintainer activity since January 2026, and 152 open issues. Treat it as experimental — study the tokenizer architecture and paper, but do not rely on its benchmark claims or use it for real trading until the leakage issue is resolved.
View original sourceThis page gives you the hook. The full Snaplyze digest goes deeper so you can move from curiosity to decision with less noise.
Open the full digest to read the deeper breakdown, compare viewpoints, and get the practical next-step playbooks.
Read the full digest for deep-dive insight, Easy Mode, Pro Mode, and practical playbooks you can actually use.
Install Snaplyze