“Train agent skills like you train neural networks - GitHub README”
You know that feeling when your agent keeps making the same domain-specific mistakes, but fine-tuning the model is too expensive or unavailable? You can hand-edit prompts, but that turns into guesswork once the agent has tool calls, files, and test feedback. SkillOpt attacks that workflow by treating the skill text as the thing you train. The tradeoff is clear: you need scored tasks and a validation split before the loop has anything trustworthy to learn from.
Think of it like editing a recipe after each cooking test, but you only keep edits that make the next test meal better. SkillOpt runs a frozen target model on a batch of tasks with the current skill text. A separate optimizer model reads successes and failures, proposes add, delete, or replace edits, and clips the edit count with a text-based learning-rate budget. A held-out selection split decides whether the edited skill becomes the new current skill. The final artifact is a `best_skill.md` file that you can inspect and reuse.
If you build agents with repeatable scored tasks, SkillOpt gives you a disciplined way to improve skill text. It fits QA, spreadsheet, document, math, and embodied task settings where you can score rollouts. Skip it for subjective workflows unless you already have a reliable evaluator.
Treat this as experimental research code with an active repo, not a production-proven dependency. The paper reports strong results, but GitHub issues still ask for split details, baseline semantics, and fair baseline optimizer disclosure. Explore it if you can tolerate reproduction work and you have verifier-backed tasks.
Deep-dive insight, Easy and Pro modes, plus action playbooks — the full breakdown is one tap away.