SageMaker Training Plans now allows extended capacity commitments without workload changes
Amazon SageMaker’s Training Plans can now be extended to ensure uninterrupted AI workload processing, allowing for seamless operation without reconfiguration.
Amazon SageMaker has announced an enhancement to its Training Plans feature, which now allows users to extend their existing GPU capacity commitments without requiring any reconfiguration of their workloads. This development ensures that AI workloads can continue seamlessly even if they take longer than originally expected.
SageMaker Training Plans enable users to reserve GPU capacity within specified time frames, accommodating cluster sizes of up to 64 instances. With the latest update, users can extend their plans in increments of 1 day, up to a maximum of 14 days, or in 7-day increments for up to 182 days, equivalent to 26 weeks. These extensions can be made easily through the SageMaker console or via API.
Once an extension is purchased, the workload proceeds without interruption, eliminating the need for any manual reconfiguration. This feature underscores SageMaker’s commitment to providing cost-effective and efficient training solutions that align with users’ timelines and budgetary constraints.
After users create and purchase their training plans, SageMaker takes charge by automatically provisioning the necessary infrastructure and executing the AI workloads on the allocated compute resources. This process is designed to be seamless, requiring no manual intervention from the user.
For a detailed breakdown of instance availability across various AWS Regions, users are encouraged to visit the SageMaker AI pricing page. Further information on how to extend training plans can be found in the Amazon SageMaker Training Plans User Guide.