HummingLM
Training a 10B Parameter Foundation Model on AWS Trainium
Overview
HummingLM is a 10B parameter foundation model that transforms hummed melodies into studio-quality songs. Built on AWS Trainium, it achieves 54% cost savings compared to traditional GPU infrastructure while training 2x faster.
The model powers Splash Music's platform, which has generated over 750 million streams globally. It represents a breakthrough in generative audio, enabling anyone to create professional music from simple vocal inputs.
This work was published at AAAI 2026 in the Explainable AI in Medicine (EAIM) Workshop, demonstrating novel approaches to neural synthesis and model interpretability.
Technical Highlights
AWS Trainium Architecture
Custom training pipeline optimized for Trainium chips, leveraging NeuronX distributed training for efficient scaling across multiple nodes.
Neural Synthesis
Novel architecture combining transformer-based melody understanding with diffusion-based audio generation for high-fidelity output.
Cost Optimization
Achieved 54% infrastructure cost reduction through efficient batch processing, mixed-precision training, and optimized data pipelines.
Production Scale
Deployed on Amazon SageMaker HyperPod for inference, serving millions of requests with sub-second latency globally.