Feed it video, get 70k brain predictions: Meta's TRIBE v2

“Meta's AI predicts your brain's response to any video — and 37 teams already built apps on it.”

In Short

Meta released a model that predicts how 70,000 brain regions respond to any video, audio, or text — zero-shot, no retraining needed for new people. It feeds content through LLaMA 3.2, V-JEPA2, and Wav2Vec-BERT, then maps their outputs onto a cortical surface with ~20,000 vertices. Community built 37+ apps on it in two weeks, but the CC-BY-NC-4.0 license blocks commercial use and the paper isn't on arxiv yet.

aineuroscienceopen-sourcepythonfmri

Why It Matters

The practical pain point this digest is really about.

You know that feeling when you need to know whether content actually engages people — not clicks, not dwell time, but genuine cognitive attention? fMRI studies cost thousands per session, take hours, and give you data from maybe a dozen people. Neuroscience has been stuck studying brain responses one modality, one brain region, one small experiment at a time. TRIBE v2 gives you a computational shortcut: predict the brain response instead of measuring it.

How It Works

The mechanism, architecture, or workflow behind it.

Think of it like a weather forecast for your brain. Instead of predicting rain, it predicts neural activity. You feed in a video (or audio, or text). Three pretrained AI models — one for language (LLaMA 3.2-3B), one for video (V-JEPA2), one for audio (Wav2Vec-BERT 2.0) — each extract features from the content. A Transformer called FmriEncoder then maps those features onto a standard brain surface (fsaverage5 mesh, ~20,000 vertices, ~70,000 voxels). The predictions shift 5 seconds backward to account for the hemodynamic lag — the delay between neural firing and the blood-oxygen signal fMRI actually measures.

Key Takeaways

7 fast bullets that make the core value obvious.

Zero-shot brain prediction — feed any content, get predictions for people the model has never seen, in languages it wasn't explicitly trained on, without retraining
70,000 voxel resolution — 70x jump from TRIBE v1's 1,000 voxels, covering the full cortical surface on the standard fsaverage5 mesh
Trimodal input — handles video, audio, and text simultaneously, or any single modality; text gets auto-converted to speech with word-level timing
Open weights on HuggingFace — 90,599 downloads in the first month, with a Colab notebook that gets you from zero to predictions in minutes
Tribev2 mini variant — a smaller model added March 30 for faster inference when you don't need full resolution
Standard neuroimaging output — predictions land on the fsaverage5 mesh and work directly with Nilearn and PyVista for visualization
2-3x improvement over prior methods — Meta's claim for both movie and audiobook brain response prediction benchmarks

Should You Care?

Audience fit, decision signal, and the original source in one place.

Who It Is For

If you work in computational neuroscience, neuroAI, or multimodal ML research and want a pretrained brain encoding model instead of training your own from scratch. Also relevant if you build content analysis tools and want a 'neural engagement' signal — but the CC-BY-NC-4.0 license means you'll need a separate license for anything commercial. Not useful if you need subject-specific predictions (t...

Worth Exploring?

Absolutely worth exploring if you're in research or non-commercial tooling. The Colab notebook gets you to a working demo in under 10 minutes with zero local setup. But treat it as experimental — 5 commits, no releases, paper not on arxiv, and BrainVista (Feb 2026) already beats it on some long-horizon benchmarks. The ecosystem is moving fast: 14 open PRs suggest active community patches, but the non-commercial license gates the most interesting applications.

View original source

What the full digest unlocks

There is more here than the public preview.

This page gives you the hook. The full Snaplyze digest goes deeper so you can move from curiosity to decision with less noise.

Deep-dive insight that explains what matters and what does not.
Easy mode for quick understanding when you just need the core idea fast.
Pro mode for sharper technical nuance, tradeoffs, and edge cases.
Action playbooks you can use to evaluate, adopt, or skip this tool.

Open the full digest to read the deeper breakdown, compare viewpoints, and get the practical next-step playbooks.

Open the full digest

Open in Snaplyze

Read on Web Source

Feed it video, get 70k brain predictions: Meta's TRIBE v2

Who It Is For

Worth Exploring?

There is more here than the public preview.

Go beyond the preview