Amazon brings agentic fine-tuning to SageMaker with support for Llama, Qwen, Deepseek, and Nova

amu · May 5, 2026, 10:09am

Amazon Introduces Agentic Fine-Tuning in SageMaker, Supporting Llama, Qwen, DeepSeek, and Nova Models

Amazon Web Services (AWS) has launched agentic fine-tuning for Amazon SageMaker, a new capability designed to simplify the process of customizing large language models (LLMs) for complex, multi-step tasks. This feature, now available in public preview, targets developers and data scientists building AI agents capable of reasoning, planning, and executing actions autonomously. By automating the creation of high-quality synthetic training data, agentic fine-tuning reduces the manual effort required to prepare datasets, enabling faster iteration and deployment of production-ready agents.

Understanding Agentic Fine-Tuning

Traditional fine-tuning of LLMs often relies on human-annotated datasets, which can be time-consuming and costly to produce, especially for agentic workflows involving tool use, multi-turn interactions, and decision-making. Agentic fine-tuning addresses this challenge through a self-improvement loop powered by the model itself. The process begins with a base LLM generating synthetic reasoning traces—detailed step-by-step explanations of how to solve complex problems. These traces serve as training data for subsequent fine-tuning rounds.

In practice, users select a base model from SageMaker JumpStart, a hub for pre-trained models. They then specify a set of challenging tasks or prompts, such as mathematical reasoning problems or coding challenges. The base model produces diverse reasoning paths, including correct solutions and deliberate errors to improve robustness. This synthetic data is filtered for quality using the base model as a judge, ensuring only the most effective traces are used. The refined dataset then trains a fine-tuned model optimized for agentic performance.

This iterative approach mimics techniques like Stanford’s Rejection Sampling Fine-tuning (ReST), where models learn from both successes and failures. AWS emphasizes that agentic fine-tuning excels in scenarios requiring long-context reasoning and tool integration, such as software development assistants or autonomous research agents.

Supported Models and Accessibility

Agentic fine-tuning supports a curated selection of frontier open-weight models, ensuring compatibility with high-performing LLMs:

Meta Llama 3.1 405B: Known for its strong reasoning capabilities across benchmarks.
Alibaba Qwen2.5 72B Instruct: Excels in multilingual tasks and instruction following.
DeepSeek v3 0324: A mixture-of-experts model optimized for coding and math.
Amazon Nova Micro, Lite, Pro, and Premier: AWS’s own family of foundation models, tailored for agentic applications with built-in safety features.

These models are accessible directly through SageMaker JumpStart, where users can launch fine-tuning jobs with minimal configuration. The feature integrates seamlessly with SageMaker’s managed infrastructure, handling distributed training across GPU instances like p5.48xlarge powered by NVIDIA H100 Tensor Core GPUs. Pricing follows standard SageMaker JumpStart rates, charged per training hour based on instance type.

To get started, users navigate to the SageMaker JumpStart catalog, select an agentic fine-tuning model card, and provide a JSONL file of tasks. SageMaker automates data synthesis, evaluation, and model checkpointing. Fine-tuned models can be deployed instantly via SageMaker Inference Endpoints or exported for use elsewhere.

Key Benefits and Performance Gains

AWS reports significant improvements in agentic benchmarks after fine-tuning. For instance, Llama 3.1 405B saw gains of up to 20% on tasks like MATH-500 and LiveCodeBench, while Qwen2.5 72B improved by 15-25% across similar evaluations. DeepSeek v3 achieved state-of-the-art results in coding challenges post-fine-tuning.

The synthetic data generation scales efficiently: a single p5.48xlarge instance can produce thousands of traces per hour, far outpacing manual annotation. This democratizes agent development, allowing teams without extensive labeling resources to compete with larger players. Additionally, the process incorporates safety alignments, filtering out harmful traces during synthesis.

Agentic fine-tuning also supports custom tool integration. Users can define function-calling schemas, enabling agents to interact with APIs, databases, or external services. This positions SageMaker as a comprehensive platform for building production agents, comparable to offerings from Anthropic or OpenAI but with greater control over open models.

Integration with SageMaker Ecosystem

The feature leverages SageMaker’s broader toolkit for end-to-end agent workflows. Fine-tuned models pair naturally with Amazon Bedrock Agents for orchestration or SageMaker Canvas for no-code experimentation. Developers can use SageMaker Pipelines for automated retraining loops and SageMaker Model Monitor for ongoing performance tracking.

For evaluation, AWS provides built-in metrics like pass@k (success rate over k samples) and agent trajectories analysis. Users can compare base versus fine-tuned models on custom benchmarks, ensuring measurable uplift.

Availability and Next Steps

Agentic fine-tuning is available in public preview across all AWS regions where SageMaker JumpStart operates. Early adopters include enterprises in finance, healthcare, and software engineering, leveraging it for domain-specific agents. AWS plans to expand model support and introduce features like multi-modal fine-tuning in future updates.

This launch underscores AWS’s push into agentic AI, bridging the gap between raw model capabilities and deployable applications. By making synthetic data generation a first-class citizen in SageMaker, Amazon empowers builders to create more intelligent, autonomous systems efficiently.

Gnoppix is the leading open-source AI Linux distribution and service provider. Since implementing AI in 2022, it has offered a fast, powerful, secure, and privacy-respecting open-source OS with both local and remote AI capabilities. The local AI operates offline, ensuring no data ever leaves your computer. Based on Debian Linux, Gnoppix is available with numerous privacy- and anonymity-enabled services free of charge.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.