Tech Products advanced 2 min read Apr 13, 2026 · Updated Apr 15, 2026
Public Preview Sign in free for the full digest →

Kronos: Finance Foundation Model with a Data Leakage Bug Open

“16.5k stars, AAAI 2026 paper — but an open issue alleges its benchmark numbers come from a data leakage bug.”

Kronos: Finance Foundation Model with a Data Leakage Bug Open
4 Views
0 Likes
0 Bookmarks
Source · github.com

“I am skeptical of how this would perform vs basic strats like buy & hold, or replacing this signal with a moving average. Instead they compared it against a collection of other models like GARCH which are not meant for generating trading signals. — LowBetaBeaver, r/quant”

You know that feeling when you try to apply a general-purpose time series model to financial data and it misses the noise patterns, the regime changes, and the cross-asset dynamics that make markets unique? Existing TSFMs like TimesFM and Chronos treat all time series the same — weather, server metrics, stock prices. Financial candlestick data has unique characteristics (OHLCV structure, high noise, non-stationarity) that general models handle poorly. Kronos targets this gap with a finance-specific tokenizer and pre-training on 12B+ K-line records.

aifinancetime-seriesfoundation-modelpythonpytorchquantitative-finance

Think of it like a language model, but instead of words it reads candlestick bars. Step 1: A hierarchical tokenizer converts each OHLCV bar (Open, High, Low, Close, Volume) into discrete tokens that preserve price dynamics and trade activity. Step 2: A decoder-only Transformer (4.1M to 102.3M params, open-sourced) is pre-trained on 12B+ tokenized K-line records using next-token prediction — same objective as GPT. At inference, you give it historical OHLCV data and a future timestamp range, and it autoregresses forward to generate forecasted candles with temperature and nucleus sampling for probabilistic outputs.

01
Finance-specific tokenizer — converts continuous OHLCV into hierarchical discrete tokens that preserve price dynamics, unlike general TSFMs that flatten all data types into one representation
02
Zero-shot forecasting — generate predictions on any market without retraining, using probabilistic sampling (temperature + nucleus) for confidence intervals
03
Multi-task support — handles price forecasting, volatility prediction (9% lower MAE per paper), and synthetic K-line generation (22% better fidelity) from a single model
04
Qlib finetuning pipeline — adapt the model to your specific market or strategy using Microsoft Qlib for data prep, with multi-GPU training via torchrun
05
Multiple model sizes — choose between 4.1M (mini, 2048 context), 24.7M (small), and 102.3M (base) params depending on your compute budget and latency needs
06
Batch prediction — forecast multiple assets simultaneously via predict_batch with GPU parallelism
Who it’s for

If you're a quant researcher or ML engineer building price forecasting, volatility modeling, or synthetic data pipelines for financial markets, this is directly relevant. Also useful if you study tokenizer design for non-language domains. Not useful if you need cross-asset portfolio signals in a single forward pass, or if you want a production-ready trading system — the authors themselves call it 'a simplified example and not a production-ready quantitative trading system.'

Worth exploring

Worth exploring as a research artifact and educational reference for finance-specific tokenizer design. The AAAI 2026 acceptance gives it academic credibility. However, tread carefully: the data leakage allegation in issue #227 is unresolved, users report broken predictions in issue #229, the repo has no formal releases, no maintainer activity since January 2026, and 152 open issues. Treat it as experimental — study the tokenizer architecture and paper, but do not rely on its benchmark claims or use it for real trading until the leakage issue is resolved.

Developer playbook
Tech stack, code snippet, sentiment, alternatives.
PM playbook
Adoption angles, user fit, positioning.
CEO playbook
Traction signals, ROI, build vs buy.
Deep-dive insight
Full long-form analysis, no fluff.
Easy mode
Core idea, fast — when you need the gist.
Pro mode
Technical nuance, edge cases, tradeoffs.
Read the full digest
Go beyond the preview

Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.

Underrated tools. Unfiltered takes.

Read the full digest in the Snaplyze app for deep-dive insight, Easy and Pro modes, and the playbooks you can actually use.

Install Snaplyze →