“"No refusals or safety hedging — dataset teaches capability, not alignment." — angrygiraffe, dataset card (huggingface.co/datasets/angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k, verified 2026-05-24)”
You know that feeling when you want to fine-tune an open-source model to reason more carefully but generating high-quality chain-of-thought training data costs hundreds or thousands of dollars in API calls and weeks of prompt engineering? Building a diverse SFT corpus covering coding, science, humanities, law, and creative writing — with coherent deliberation in every assistant turn — from scratch is a months-long project. This dataset drops 8,706 ready-to-use examples covering 28 categories, formatted for SFT trainers with <think> blocks pre-included, at zero generation cost to you.
The dataset ships as four JSONL files you download from HuggingFace. Each JSON object has a category tag, a model tag (claude-opus-4-6 or claude-opus-4-7), and a messages array. Every assistant message opens with a <think>...</think> block — typically 150–500 words of deliberation — followed by the actual response. You load it with HuggingFace's datasets library and point your SFT trainer (TRL, Unsloth, or Axolotl) at it with train_on_responses_only enabled so gradients flow through the <think> block and response but not through user turns. The system prompts are domain-specific expert personas (5,814 unique prompts) rather than generic boilerplate, which pushes fine-tuned models toward domain-calibrated depth.
If you are fine-tuning an open-source LLM — Qwen, Llama, Mistral — and need a broad SFT corpus with synthetic chain-of-thought across diverse domains, this covers 28 categories without generation cost. You need familiarity with HuggingFace datasets, an SFT training framework (TRL, Unsloth, Axolotl), and GPU access. This is not for you if you need legally clear training data for commercial deployment, production-grade quality assurance, or real chain-of-thought distillation — for the last goal, lordx64's dataset uses actual extended-thinking API traces and is structurally stronger.
Worth exploring for experimental fine-tuning where broad domain coverage matters more than verified quality — 4,445 monthly downloads and 6+ downstream models signal active community use. Treat it as prototype material: no human review, no benchmark results, no reproducibility info, and a real legal risk under Anthropic's ToS for commercial use. If genuine reasoning distillation is the goal, lordx64's dataset (real API traces, 8,124 samples) is the structurally sounder choice.
Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.