Applied Scientist – Agentic AI | AWS Agentic AI Team
Job ID
10430758
Agentic AI
Join AWS and Help Build the Future of Agentic AI
Amazon Web Services (AWS), the world's leading cloud computing platform, is seeking a highly talented Applied Scientist – Agentic AI to join its cutting-edge research team focused on developing next-generation intelligent automation systems.
This is an exciting opportunity for researchers and AI scientists who are passionate about advancing the frontier of Artificial Intelligence, autonomous agents, multimodal reasoning, reinforcement learning, and intelligent decision-making systems.
As part of the AWS Agentic AI team, you will conduct groundbreaking research, collaborate with world-class engineers and scientists, and develop innovative AI models that solve complex real-world challenges at global scale.
About the AWS Agentic AI Team
The AWS Agentic AI organization is focused on building advanced AI systems capable of autonomous reasoning, planning, decision-making, and execution.
The team develops intelligent AI agents that can:
- Understand complex environments.
- Reason across multiple data sources.
- Interact with APIs and external systems.
- Plan and execute tasks autonomously.
- Learn from interactions and feedback.
- Improve operational efficiency through intelligent automation.
By leveraging Amazon's vast computational resources, large-scale datasets, and cloud infrastructure, the team is pushing the boundaries of what intelligent AI systems can achieve.
Job Overview
As an Applied Scientist specializing in Agentic AI, you will conduct advanced research and develop innovative machine learning models that contribute directly to AWS's next generation of intelligent automation solutions.
You will collaborate with scientists, engineers, product teams, and business stakeholders to transform cutting-edge research into practical AI applications that create measurable value for customers worldwide.
This role offers the unique opportunity to publish research, contribute to scientific advancements, and see your innovations deployed at global scale.
Key Responsibilities
Advanced AI Research
- Conduct original research in emerging Artificial Intelligence domains.
- Design and develop innovative approaches to intelligent automation.
- Explore new methods for autonomous reasoning and decision-making.
- Advance the state of the art in Agentic AI systems.
Model Development & Experimentation
- Build, train, and evaluate advanced machine learning models.
- Develop solutions for complex sequential decision-making challenges.
- Design intelligent systems capable of autonomous task execution.
- Optimize AI models for scalability, performance, and real-world deployment.
Cross-Functional Collaboration
- Work closely with:
- Research Scientists
- Applied Scientists
- Software Engineers
- Product Managers
- AWS Business Teams
- Translate research findings into production-ready solutions.
- Collaborate across disciplines to solve high-impact customer problems.
Scientific Innovation
- Publish research papers in leading peer-reviewed conferences and journals.
- Contribute to patents and intellectual property development.
- Participate in scientific workshops and industry research communities.
- Share technical knowledge and research findings with internal teams.
Production Impact
- Transform advanced AI research into practical customer solutions.
- Develop technologies that improve intelligent automation capabilities.
- Leverage AWS's large-scale infrastructure and computational resources.
- Create AI systems that deliver real-world business value.
Research Areas of Interest
AWS is particularly interested in candidates with expertise in one or more of the following fields:
Autonomous Agents
- Agentic AI Systems
- Intelligent Task Execution
- Autonomous Decision-Making
- Multi-Agent Systems
API Orchestration
- Tool Use and Function Calling
- AI Workflow Automation
- Intelligent API Coordination
Planning & Reasoning
- Strategic Planning Algorithms
- Goal-Oriented AI Systems
- Hierarchical Planning
- Cognitive Architectures
Large Multimodal Models
- Vision-Language Models (VLMs)
- Multimodal Foundation Models
- Cross-Modal Reasoning
- Visual Understanding Systems
Reinforcement Learning (RL)
- Reinforcement Learning Algorithms
- Sequential Decision-Making
- Policy Optimization
- Learning-Based Control Systems
Basic Qualifications
To be considered for this position, candidates should meet the following requirements:
Education
One of the following:
- PhD in Computer Science, Machine Learning, Artificial Intelligence, Computer Engineering, or a related technical field
OR
- Master's Degree in a relevant discipline with 3+ years of professional experience in Computer Science, Machine Learning, Artificial Intelligence, or a related field
Research Experience
- Demonstrated research contributions through:
- Publications in top-tier peer-reviewed conferences
- Journal publications
- Patents
- Significant scientific achievements
Programming Skills
Experience in one or more of the following programming languages:
- Python
- Java
- C++
- Related programming languages
Technical Expertise
Experience in one or more of the following areas:
- Algorithms
- Data Structures
- Parsing Techniques
- Numerical Optimization
- Data Mining
- Distributed Computing
- Parallel Computing
- High-Performance Computing (HPC)
Preferred Qualifications
Candidates with the following experience will stand out:
Software Development Experience
- Professional software engineering experience.
- Building scalable production systems.
- Collaborating within large engineering organizations.
Operating Systems
- Experience working in Unix/Linux environments.
- Familiarity with Linux-based AI development workflows.
Advanced AI Research
Experience in:
- Generative AI
- Large Language Models (LLMs)
- Vision-Language Models (VLMs)
- Autonomous AI Agents
- Reinforcement Learning
- Intelligent Planning Systems
Why Join AWS Agentic AI?
By joining AWS Agentic AI, you will:
- Work on some of the world's most advanced Artificial Intelligence challenges.
- Access massive computational infrastructure and datasets.
- Collaborate with leading AI researchers, engineers, and innovators.
- Develop technologies that impact millions of customers globally.
- Publish research and contribute to scientific advancement.
- Transform cutting-edge ideas into real-world AI products.
- Help shape the future of autonomous intelligent systems.
Inclusive Workplace & Equal Opportunity
AWS is committed to building a diverse and inclusive workplace where employees from all backgrounds can thrive.
The company provides equal employment opportunities and supports candidates requiring workplace accommodations during the recruitment, hiring, or onboarding process.
AWS believes innovation is driven by diverse perspectives and is dedicated to creating an environment where every employee can contribute their best work.
Advance the Future of Intelligent Automation
If you are passionate about Artificial Intelligence research, Agentic AI, Reinforcement Learning, Multimodal Models, and intelligent autonomous systems, this role offers an exceptional opportunity to contribute to technologies that are redefining the future of AI.
Apply today and join AWS in building the next generation of Agentic AI solutions.
Job ID 10430758 Agentic AI Join AWS and Help Build the Future of Agentic AI Amazon Web Services (AWS), the […]
Department
Client Solutions
About the Company
We are an innovative technology services organization focused on helping businesses design, implement, and scale advanced Artificial Intelligence solutions. Our expertise spans Generative AI, Large Language Models (LLMs), AI-powered automation, predictive analytics, and custom AI integrations. By combining technical excellence with strategic client collaboration, we deliver practical AI solutions that drive measurable business outcomes across multiple industries.
AI Product Manager Job Overview
We are looking for a highly motivated AI Product Manager to lead the successful delivery of AI products and solutions across our client portfolio. In this role, you will act as the bridge between business stakeholders, sales teams, and technical experts, ensuring AI initiatives move efficiently from concept to production.
The ideal candidate will work closely with the Business Development team during the sales process, helping evaluate feasibility, define project scope, and ensure client expectations align with delivery capabilities. You will oversee AI implementations ranging from ChatGPT Enterprise adoption programs to custom AI agents and API-based solutions.
This position is ideal for professionals who excel in product strategy, stakeholder communication, AI solution delivery, and cross-functional collaboration.
Key Responsibilities
1. Business Development Collaboration & Solution Scoping
- Partner with the sales team throughout the client acquisition process.
- Evaluate technical feasibility and recommend realistic project structures.
- Participate in strategic discussions regarding AI implementation opportunities.
- Assess business requirements, data availability, risks, timelines, and deployment readiness.
- Identify when specialized engineering expertise is required and coordinate technical resources accordingly.
- Ensure all proposed AI solutions are practical, achievable, and aligned with delivery capabilities.
2. AI Use Case Discovery & Prioritization
- Conduct structured discovery sessions with client stakeholders.
- Identify valuable AI use cases across departments and business functions.
- Analyze workflows, operational challenges, and available data sources.
- Develop and maintain prioritized AI opportunity backlogs.
- Evaluate opportunities based on feasibility, business value, and data readiness.
- Provide clear recommendations on which AI initiatives should move forward and which require additional preparation.
3. Product Definition & End-to-End Delivery
- Manage the complete AI product lifecycle from planning through production deployment.
- Define product requirements, acceptance criteria, prompts, and AI agent architectures.
- Translate business objectives into actionable technical specifications.
- Create comprehensive product documentation covering inputs, outputs, constraints, and success metrics.
- Identify potential risks related to compliance, scalability, edge cases, and data quality.
- Collaborate closely with engineering teams during iterative development cycles.
- Support prompt engineering and AI agent optimization efforts.
4. Success Measurement & Performance Optimization
- Define measurable success metrics for every AI deployment.
- Align AI performance goals with client business objectives.
- Collaborate with subject matter experts to establish domain-specific evaluation frameworks.
- Monitor adoption rates, business impact, and product performance after launch.
- Identify opportunities for continuous improvement and optimization.
5. Stakeholder Communication & Client Management
- Act as the primary connection between business leaders and technical teams.
- Facilitate client workshops and strategic planning sessions.
- Communicate complex AI concepts in a clear and understandable manner.
- Manage stakeholder expectations regarding capabilities, limitations, and trade-offs.
- Prepare executive-level reports highlighting project progress, risks, and business impact.
- Ensure alignment across all project participants throughout delivery.
6. AI Center of Excellence Development
- Contribute to the creation of AI governance frameworks and best-practice methodologies.
- Develop repeatable processes for AI prioritization, implementation, and production readiness.
- Support enterprise AI transformation initiatives.
- Stay informed on emerging AI technologies, including:
- Large Language Models (LLMs)
- AI Agents
- Vector Databases
- AI Monitoring Tools
- Orchestration Frameworks
- Apply industry insights to strengthen client AI strategies.
Required Qualifications
To be successful in this role, candidates should possess:
- 3+ years of experience in Product Management, Technical Program Management, or a related field.
- Direct exposure to Artificial Intelligence (AI) or Machine Learning (ML) products.
- Strong understanding of modern AI systems, including the capabilities and limitations of LLMs and AI agents.
- Ability to participate confidently in technical discussions without requiring a software engineering background.
- Excellent written and verbal communication skills.
- Experience balancing multiple client engagements or projects simultaneously.
- Strong analytical thinking and problem-solving abilities.
- Ability to assess feasibility and make informed product decisions.
Preferred Qualifications
Candidates with the following experience will be highly valued:
- Hands-on experience with:
- ChatGPT Enterprise
- OpenAI API
- Anthropic
- Similar Large Language Model platforms
- Familiarity with:
- Prompt Engineering
- AI Agent Design
- LangChain
- LlamaIndex
- AI Orchestration Frameworks
- Consulting or client-facing delivery experience.
- Understanding of enterprise data infrastructure.
- Knowledge of compliance requirements and AI governance frameworks.
Compensation & Benefits
We offer a competitive and rewarding compensation package, including:
- Competitive base salary
- Performance-based incentives linked to client outcomes
- Equity opportunities in a rapidly growing AI services company
- Exposure to diverse industries and enterprise AI initiatives
- Opportunities to work on cutting-edge AI products and real-world use cases
- Optional travel opportunities for on-site client engagements
- Collaborative and mission-driven work environment
- Continuous learning and professional growth opportunities in the AI industry
Why Join Us?
Join a forward-thinking team dedicated to transforming how organizations leverage Artificial Intelligence. As an AI Product Manager, you'll play a critical role in delivering innovative AI solutions that create measurable impact while working alongside experts passionate about the future of AI technology.
Apply today and help shape the next generation of enterprise AI solutions.
Job Features
Department Client Solutions About the Company We are an innovative technology services organization focused on helping businesses design, implement, and […]
Job ID: P-78
Help Build the Infrastructure Behind the Future of AI
At Databricks, we are committed to helping data and AI teams solve some of the world's most challenging problems—from advancing healthcare innovation to enabling next-generation transportation technologies. Our mission is powered by building and operating the industry's leading Data Intelligence Platform, enabling organizations to transform data into actionable insights and AI-driven outcomes.
We are seeking a Senior Software Engineer (Backend) – AI/ML Environments to join our growing ML/AI Environments team. This is a unique opportunity to build foundational AI infrastructure that empowers researchers, data scientists, and machine learning engineers to create, train, and deploy AI models efficiently and at scale.
As part of one of Databricks' fastest-growing businesses, Mosaic AI, you'll help build the systems powering the next generation of enterprise AI.
About the ML/AI Environments Team
The ML/AI Environments team develops the infrastructure that enables AI researchers and engineers to create customized training and serving environments for machine learning workloads.
The team's mission is to make AI development:
- Easy to configure
- Reliable to operate
- Reproducible across environments
- Scalable for enterprise workloads
Working at the intersection of AI infrastructure, developer productivity, and cloud-native systems, you'll collaborate closely with research, product, and engineering teams to deliver capabilities that directly impact thousands of Databricks customers.
Your Impact
As a Senior Backend Engineer, you'll play a critical role in shaping how machine learning practitioners build and interact with AI applications on the Databricks platform.
Your work will influence everything from environment creation and dependency management to performance optimization and operational visibility.
Key Responsibilities
Build AI Environment Infrastructure
- Build the infrastructure that enables ML and AI users to configure training and serving environments easily, reliably, and reproducibly.
- Develop systems that simplify environment setup while ensuring consistency across machine learning workflows.
- Support scalable AI development experiences across a wide range of use cases.
Partner Across AI Infrastructure Teams
- Collaborate with other AI infrastructure teams to deliver features that enhance customer productivity.
- Contribute to platform-wide improvements that increase efficiency and reliability.
- Work on initiatives that help customers maximize the value of the Databricks platform.
Examples include:
- Improving virtual environment setup performance for short-duration training and data processing jobs.
- Enhancing observability capabilities to simplify debugging and failure analysis.
- Optimizing infrastructure workflows for AI development and deployment.
Drive Customer-Focused Innovation
- Engage directly with turnkey customers and product managers.
- Help identify product improvements and new feature opportunities.
- Translate customer feedback into scalable engineering solutions.
Shape the Future of AI Development
- Influence how developers, machine learning engineers, and data scientists build and interact with AI on Databricks.
- Contribute to platform experiences that accelerate AI adoption across organizations.
- Help define the next generation of AI infrastructure tooling.
What We're Looking For
Backend & Infrastructure Engineering Experience
- 5+ years of experience in backend engineering, infrastructure engineering, or related systems-focused software development roles.
- Proven experience building production-grade systems and platform infrastructure.
Programming Expertise
Strong programming skills in one or more of the following languages:
- Python
- Scala
- Java
Distributed Systems Knowledge
Experience working with:
- Distributed systems
- Scalable APIs
- Cloud-native infrastructure
- High-performance backend services
Platform Engineering Fundamentals
Familiarity with:
- Service-oriented architecture (SOA)
- Deployment pipelines
- System observability
- Infrastructure reliability practices
Product Ownership Mindset
- Strong product and ownership mentality.
- Focus on building the right solution rather than simply delivering technical implementations.
- Ability to balance customer needs with engineering excellence.
Dependency Management & Environment Technologies
Strong understanding of dependency management technologies, including:
- Virtual environments
- Containerization technologies
Experience supporting reproducible development and deployment workflows is highly valued.
Why This Role Matters
The future of AI depends on infrastructure that is reliable, scalable, and easy to use.
In this role, you'll help build the foundation that enables:
- Machine Learning Engineers
- Data Scientists
- AI Researchers
- Enterprise Development Teams
to develop, train, and serve AI applications more effectively.
Your contributions will directly impact the AI development lifecycle across thousands of organizations worldwide.
Compensation & Pay Transparency
Databricks is committed to fair and equitable compensation practices.
Local Pay Range
$166,000 — $220,000 USD
Compensation is determined based on several factors, including:
- Relevant technical expertise
- Professional experience
- Certifications and specialized training
- Geographic location
- Job-related skills
The total compensation package may also include:
- Annual performance bonuses
- Equity awards
- Comprehensive employee benefits
Databricks anticipates utilizing the full salary range based on candidate qualifications and experience.
About Databricks
Databricks is the Data and AI company trusted by more than 10,000 organizations worldwide.
Leading enterprises—including Comcast, Condé Nast, Grammarly, and more than 50% of Fortune 500 companies—rely on the Databricks Data Intelligence Platform to unify data, analytics, and artificial intelligence.
Databricks was founded by the original creators of:
- Apache Spark™
- Delta Lake
- MLflow
- Lakehouse Architecture
Headquartered in San Francisco, Databricks continues to lead innovation across data engineering, machine learning, analytics, and AI infrastructure.
Benefits
Databricks offers a comprehensive benefits package designed to support employees throughout every stage of their career and personal journey. Benefits and perks may vary by region and location.
Diversity, Equity & Inclusion
Databricks is committed to fostering a diverse and inclusive workplace where every individual has the opportunity to succeed.
Employment decisions are made without regard to age, race, ethnicity, religion, disability, family status, gender identity, sexual orientation, veteran status, political affiliation, socio-economic background, or any other protected characteristic.
Compliance
If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.
Apply for Senior Software Engineer (Backend) – AI/ML Environments Jobs at Databricks
If you're passionate about backend engineering, distributed systems, cloud infrastructure, AI platform development, and building tools that empower the next generation of machine learning innovation, this is an opportunity to make a meaningful impact at one of the world's leading Data and AI companies.
Job Features
Job ID: P-78 Help Build the Infrastructure Behind the Future of AI At Databricks, we are committed to helping data […]
Location: San Francisco, California
Job ID: P-984
Build the Infrastructure Powering the Next Generation of Generative AI
Are you passionate about building large-scale machine learning platforms that enable the development, training, evaluation, and deployment of cutting-edge Generative AI applications?
Databricks Mosaic AI is seeking a Senior Machine Learning Engineer – GenAI Platform to help shape the future of enterprise AI development. This role offers the opportunity to work across the entire machine learning lifecycle while building the foundational infrastructure that powers next-generation Generative AI products.
From distributed systems and GPU orchestration to user-facing platform experiences, you'll play a critical role in developing technology that enables organizations to securely build and deploy custom AI models at scale.
About Mosaic AI
Founded in late 2020 by a team of machine learning engineers and researchers, Mosaic AI was created with a vision to make advanced AI development more accessible, secure, and customizable for enterprises.
The Mosaic AI platform enables organizations to:
- Fine-tune custom AI models
- Train large-scale machine learning systems
- Deploy Generative AI applications securely
- Maintain complete ownership and control of their data and models
- Operate across all major cloud providers
In 2023, Mosaic AI introduced pretrained transformer models that established a new benchmark for commercially usable open-source Large Language Models (LLMs), achieving more than 3 million downloads worldwide.
Since joining Databricks in July 2023, Mosaic AI has continued its mission of helping organizations unlock the full value of artificial intelligence while maintaining security, flexibility, and scalability.
The Opportunity
The Generative AI landscape is evolving rapidly, and Databricks Mosaic AI is building the platform that powers every stage of the ML development lifecycle.
As a Senior Machine Learning Engineer, you'll help develop customer-facing capabilities supporting:
- Data Generation
- Model Training
- Model Evaluation
- Model Serving
- AI Agent Development
This role is ideal for engineers who enjoy combining product thinking with deep technical expertise and who thrive when building systems that directly impact customers.
What You'll Be Building
You will contribute to a platform that powers Generative AI use cases across the complete machine learning workflow.
Your work may span:
Customer-Facing Product Experiences
Designing interfaces and platform capabilities that simplify AI development for customers.
Distributed Backend Systems
Building scalable infrastructure capable of supporting large-scale AI workloads.
GPU-Oriented Infrastructure
Developing low-level systems that efficiently orchestrate compute resources for model training and serving.
End-to-End ML Lifecycle Platforms
Creating reliable systems that support data preparation, training, evaluation, deployment, and monitoring.
Key Responsibilities
Drive End-to-End Product Development
- Play a key role in the end-to-end design and implementation of the Generative AI platform.
- Build solutions that support training and serving use cases for Generative AI models.
- Contribute to both frontend product experiences and backend infrastructure.
Collaborate with Customers & Researchers
- Work closely with customers to understand real-world AI challenges.
- Partner with internal machine learning researchers to identify platform improvements and future opportunities.
- Translate customer feedback into scalable product capabilities.
Own the Full Product Lifecycle
- Demonstrate strong end-to-end ownership across design, development, deployment, and maintenance.
- Convert product requirements into intuitive user experiences and scalable distributed systems.
- Lead implementation efforts from concept to production.
Build Core Platform Infrastructure
- Design and develop foundational systems supporting customer-facing AI products.
- Ensure infrastructure remains scalable as customer workloads continue to grow.
- Contribute to platform reliability, performance, and operational efficiency.
Ensure Enterprise-Grade Reliability
- Maintain the security, reliability, and scalability of backend systems.
- Build resilient distributed architectures capable of supporting mission-critical AI workloads.
- Continuously improve operational excellence across the platform.
Required Qualifications
Software Engineering Experience
- 4+ years of hands-on programming experience.
- Proficiency in at least one modern programming language, including:
- Python
- Scala
- Go
- C++
Distributed Systems Expertise
- Strong understanding of distributed systems design principles.
- Experience building and operating large-scale distributed applications.
- Ability to design highly scalable backend infrastructure.
Machine Learning Platform Development
Experience building ML platform systems supporting:
- Data Preparation
- Model Training
- Model Evaluation
- Model Serving
across the machine learning model development lifecycle.
Additional Preferred Experience
- Direct experience developing machine learning models is considered a plus but is not required.
Product Ownership & System Design
- Strong sense of end-to-end product ownership.
- Ability to balance robust system design with excellent product usability.
- Experience translating customer needs into technical solutions.
Communication & Collaboration
- Excellent communication and stakeholder management skills.
- Ability to explain complex technical concepts to both technical and non-technical audiences.
- Comfortable working with cross-functional internal and external teams.
Growth Mindset
We value engineers who:
- Are curious about every aspect of the company's success.
- Enjoy learning new technologies.
- Take ownership beyond their immediate responsibilities.
- Continuously seek opportunities to improve systems, products, and customer experiences.
Why Join Databricks Mosaic AI?
Joining Databricks Mosaic AI means helping build one of the industry's most advanced Generative AI platforms while working alongside world-class researchers, engineers, and AI practitioners.
You'll have the opportunity to:
- Work on cutting-edge Generative AI technologies.
- Build infrastructure supporting enterprise-scale AI applications.
- Influence product direction and platform strategy.
- Solve challenging distributed systems problems.
- Contribute directly to the future of AI development and deployment.
Compensation & Pay Transparency
Databricks is committed to fair and equitable compensation practices.
Local Pay Range
$166,000 — $225,000 USD
Compensation packages are determined based on several factors, including:
- Job-related skills
- Relevant experience
- Certifications and training
- Geographic location
The total rewards package may also include:
- Annual performance bonuses
- Equity awards
- Comprehensive employee benefits
Databricks anticipates utilizing the full salary range based on candidate qualifications and experience.
About Databricks
Databricks is the Data and AI company trusted by more than 10,000 organizations worldwide.
Industry leaders including Comcast, Condé Nast, Grammarly, and more than 50% of Fortune 500 companies rely on the Databricks Data Intelligence Platform to unify data, analytics, and artificial intelligence.
Founded by the original creators of:
- Apache Spark™
- Delta Lake
- MLflow
- Lakehouse Architecture
Databricks continues to lead innovation across data engineering, analytics, machine learning, and Generative AI.
Headquartered in San Francisco, the company operates globally with offices around the world.
Benefits
Databricks offers comprehensive benefits and employee programs designed to support personal well-being, professional growth, and long-term career success. Benefits may vary by region.
Diversity, Equity & Inclusion
Databricks is committed to creating an inclusive workplace where everyone can excel.
Employment opportunities are offered without regard to age, race, ethnicity, disability, religion, gender identity, sexual orientation, veteran status, socio-economic background, political affiliation, marital status, or any other protected characteristic.
Compliance
If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.
Apply for Senior Machine Learning Engineer – GenAI Platform Jobs at Databricks
If you're excited about building scalable Generative AI platforms, distributed machine learning systems, AI infrastructure, and next-generation ML development tools, this is your opportunity to join a team shaping the future of enterprise AI.
Job Features
Location: San Francisco, CaliforniaJob ID: P-984 Build the Infrastructure Powering the Next Generation of Generative AI Are you passionate about […]
Location: Mountain View, California | San Francisco, California
Lead the Future of Large-Scale AI Training Infrastructure
At Databricks, we empower organizations to solve some of the world's most complex challenges through data and artificial intelligence. From advancing medical research to enabling next-generation transportation systems, our mission is driven by building and operating the industry's leading Data Intelligence Platform.
We are looking for a Senior Engineering Manager, AI Runtime (AIR) to lead one of the most strategic engineering organizations within Databricks. This role offers the opportunity to define the future of managed GPU training infrastructure and AI model development at scale while leading a world-class engineering team.
About AI Runtime (AIR)
The AI Runtime (AIR) team powers enterprise-scale training and fine-tuning of deep learning and Large Language Models (LLMs) through on-demand GPU infrastructure.
Organizations rely on AIR to train cutting-edge models across a variety of use cases, including:
- Foundation Models
- Large Language Models (LLMs)
- Transformer-Based Architectures
- Drug Discovery Models
- Custom Deep Learning Systems
- Enterprise AI Applications
AIR provides customers with the infrastructure needed to build and operate state-of-the-art frontier AI models efficiently and reliably.
Your Leadership Opportunity
As a Senior Engineering Manager, you will oversee both the customer-facing product experience and the foundational infrastructure behind AI Runtime.
You will guide strategic investments across managed GPU training, distributed systems, scalability, reliability, and platform innovation while partnering closely with product, research, infrastructure, and platform teams.
This role combines technical leadership, product strategy, organizational development, and customer impact at one of the most exciting intersections of AI and cloud infrastructure.
Key Responsibilities
Lead and Scale a High-Performing Engineering Team
- Lead, mentor, and grow a high-performing engineering team responsible for the Custom Training product and its foundational infrastructure.
- Oversee distributed training orchestration, cluster lifecycle management, fault tolerance, and training efficiency initiatives.
- Foster a culture of technical excellence, innovation, and customer obsession.
Define the AI Runtime Vision
- Define and own the product and technical roadmap for AIR.
- Balance customer experience, platform functionality, scalability, and foundational infrastructure investments.
- Drive long-term strategic direction for managed GPU training capabilities.
Deliver End-to-End Product Innovation
- Collaborate closely with product, research, platform, infrastructure teams, and customers.
- Drive projects from ideation and prioritization through launch and ongoing operations.
- Ensure successful execution of complex cross-functional initiatives.
Drive Architecture for GPU Training at Scale
- Lead architectural decisions for large-scale managed GPU training systems.
- Design solutions that support growing customer workloads and emerging AI technologies.
- Ensure extensibility, performance, and operational excellence across the platform.
Champion Customer Success
- Engage directly with customers to understand challenges and opportunities.
- Advocate for customer needs within engineering decision-making processes.
- Translate technical investments into measurable product outcomes.
Strengthen Reliability & Observability
- Build observability frameworks for long-running distributed training jobs.
- Define checkpointing strategies, operational runbooks, and failure recovery mechanisms.
- Improve resilience for multi-node training environments.
Build Exceptional Teams
- Partner closely with recruiting efforts.
- Attract, hire, and develop top engineering talent.
- Create an environment that supports growth, innovation, and leadership development.
Required Qualifications
Professional Experience
- 8+ years of software engineering experience.
- 3+ years of engineering management experience.
- Proven track record building and operating managed GPU training infrastructure at scale (100s/1000s GPUs).
Distributed Training Expertise
Deep familiarity with:
- PyTorch
- DeepSpeed
- Composer
- Megatron-LM
Experience with parallelism strategies including:
- Fully Sharded Data Parallel (FSDP)
- Tensor Parallelism
- Pipeline Parallelism
Training Reliability & Resilience
Experience implementing:
- Checkpointing systems
- Elastic training architectures
- Automated failure recovery for long-running AI training jobs
GPU Performance Optimization
Strong understanding of:
- NCCL
- GPU interconnect topologies
- Memory optimization techniques
- Large-scale distributed GPU environments
Platform & Product Leadership
- Experience building platform products with clearly defined Service Level Agreements (SLAs).
- Proven ownership of customer experience beyond backend infrastructure responsibilities.
- Ability to align technical execution with business and customer outcomes.
Cross-Functional Leadership
- Strong leadership across platform, product, and research organizations.
- Demonstrated success delivering complex initiatives in ambiguous environments.
- Ability to influence stakeholders across multiple teams and functions.
Communication & Collaboration
- Excellent collaboration, communication, and stakeholder management skills.
- Ability to effectively partner with engineering, product, infrastructure, and research teams.
Education
- BS/MS in Computer Science, Electrical Engineering, or related technical field.
Compensation & Pay Transparency
Databricks is committed to fair and equitable compensation practices.
Local Pay Range
$228,600 — $314,250 USD
Actual compensation packages are determined based on factors including:
- Relevant experience
- Technical expertise
- Certifications and training
- Job-related skills
- Geographic location
In addition to base compensation, eligible employees may receive:
- Annual performance bonuses
- Equity awards
- Comprehensive employee benefits
Databricks anticipates utilizing the full salary range based on candidate qualifications and experience.
Why Join Databricks?
At Databricks, we are building state-of-the-art AI solutions that redefine how users interact with data and our products.
As part of the AI Runtime organization, you'll:
- Lead mission-critical AI infrastructure initiatives.
- Work on cutting-edge GPU training systems powering frontier AI models.
- Influence the future of enterprise AI and Large Language Model development.
- Collaborate with world-class researchers, engineers, and product leaders.
- Solve some of the most challenging distributed systems problems in the industry.
If you're passionate about scaling AI infrastructure and leading exceptional engineering teams, we'd love to hear from you.
About Databricks
Databricks is the Data and AI company trusted by more than 10,000 organizations worldwide.
Leading organizations including Comcast, Condé Nast, Grammarly, and over 50% of Fortune 500 companies rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics, and AI.
Databricks was founded by the original creators of:
- Apache Spark™
- Delta Lake
- MLflow
- Lakehouse Architecture
Headquartered in San Francisco, Databricks continues to drive innovation across data, analytics, and artificial intelligence.
Benefits
Databricks offers comprehensive employee benefits and perks designed to support health, wellness, financial security, and professional growth. Benefits may vary by region and location.
Diversity, Equity & Inclusion
Databricks is committed to building an inclusive workplace where every employee can thrive.
Employment decisions are made without regard to age, race, ethnicity, disability, gender identity, sexual orientation, religion, family status, veteran status, socio-economic background, political affiliation, or any other protected characteristic.
Compliance
If access to export-controlled technology or source code is required for performance of job duties, it is within Employer's discretion whether to apply for a U.S. government license for such positions, and Employer may decline to proceed with an applicant on this basis alone.
Apply for Senior Engineering Manager, AI Runtime Jobs at Databricks
Join a team building the infrastructure behind the world's most advanced AI models and help shape the future of enterprise-scale machine learning and GPU training platforms.
Job Features
Location: Mountain View, California | San Francisco, California Lead the Future of Large-Scale AI Training Infrastructure At Databricks, we empower […]
Senior Applied AI Engineer
Location - Belgrade, Serbia
Build the Future of AI-Powered Products at Databricks
Databricks is seeking a talented Senior Applied AI Engineer to join its innovative Applied ML/AI team. This role offers the opportunity to work on advanced machine learning systems that improve the usability, intelligence, and efficiency of Databricks products, including AutoML and other customer-facing solutions.
As a Senior Applied AI Engineer, you will leverage machine learning algorithms, optimization techniques, deep learning models, and statistical methods to solve complex business challenges. Your work will directly influence how organizations maximize the value of their data while benefiting from highly scalable and cost-efficient AI-powered products.
About the Role
The Applied ML/AI team at Databricks focuses on building intelligent solutions that enhance product performance and customer experiences. The team develops and deploys machine learning technologies spanning:
- Classification
- Regression
- Forecasting
- Recommendation Systems
- Deep Learning Models
- Foundation Models
- Feature Augmentation
- Auto-Tuning Systems
From traditional statistical approaches to state-of-the-art AI architectures, the team tackles some of the most challenging machine learning problems faced by modern businesses.
This position provides the opportunity to work on high-impact initiatives that help Databricks customers unlock greater value from their data and AI investments.
Key Responsibilities
Develop Advanced AI & Machine Learning Solutions
- Build features and develop end-to-end systems within a small team of experienced engineers and data scientists.
- Drive the development and deployment of state-of-the-art ML/AI models and systems that enhance Databricks products, services, and infrastructure.
- Apply machine learning and optimization algorithms to improve AutoML and other customer-facing products.
Influence Product & Technology Strategy
- Shape the future direction of Databricks’ applied machine learning investments.
- Collaborate with engineering and product teams across the organization to identify and deliver impactful AI-driven solutions.
Design Scalable ML Infrastructure
- Architect and implement robust machine learning infrastructure.
- Develop scalable model training and serving systems.
- Support seamless deployment and integration of AI/ML models into production environments.
Innovate in Forecasting & Modeling
- Explore and develop novel machine learning techniques in forecasting.
- Advance modeling capabilities through experimentation with statistical, deep learning, and foundational AI approaches.
Contribute to the AI Community
- Present research, innovations, and technical insights at industry conferences.
- Participate in open-source initiatives that strengthen Databricks’ leadership within the AI and machine learning ecosystem.
Required Qualifications
Professional Experience
- 2–8 years of machine learning engineering experience in high-growth, fast-paced technology companies.
- Experience developing AI and machine learning systems at scale in production environments.
- Proven ability to build and deploy machine learning solutions that deliver measurable business impact.
Technical Expertise
- Strong understanding of computer systems and statistical methodologies.
- Demonstrated success in ML modeling beyond standard out-of-the-box library usage.
- Experience working with advanced machine learning algorithms and optimization techniques.
- Strong software engineering and coding skills.
- Familiarity with testing, code reviews, deployment workflows, and software development best practices.
Analytical & Mathematical Skills
- Broad knowledge of mathematical modeling or a strong willingness to expand expertise beyond traditional machine learning methodologies.
- Ability to solve complex technical problems using data-driven approaches.
Preferred Qualifications
- Experience deploying, scaling, and monitoring machine learning models in production environments.
- Understanding of infrastructure challenges associated with training and serving predictions in Tier 0 environments.
What Makes This Opportunity Unique?
Working at Databricks means contributing to products used by thousands of organizations worldwide while solving some of the most challenging problems in artificial intelligence and machine learning.
You will have the opportunity to:
- Work on cutting-edge AI and machine learning technologies.
- Develop large-scale production AI systems.
- Collaborate with leading AI researchers, engineers, and data scientists.
- Influence the future of AutoML and intelligent product experiences.
- Build solutions that directly impact customer success and business outcomes.
Why Join Databricks?
At Databricks, we are building state-of-the-art AI solutions that redefine how users interact with data and our products. You’ll have the opportunity to shape the future of AI-driven products at Databricks, work with cutting-edge models, and collaborate with a world-class team of AI and ML experts.
If you're excited about pushing the boundaries of AI in real-world applications, we’d love to hear from you!
About Databricks
Databricks is the Data and AI company trusted by more than 10,000 organizations globally. Industry leaders including Comcast, Condé Nast, Grammarly, and more than 50% of Fortune 500 companies rely on the Databricks Data Intelligence Platform to unify data, analytics, and artificial intelligence.
Headquartered in San Francisco with offices worldwide, Databricks was founded by the original creators of:
- Lakehouse Architecture
- Apache Spark™
- Delta Lake
- MLflow
The company continues to lead innovation across data engineering, analytics, and artificial intelligence.
Benefits & Perks
Databricks offers a comprehensive benefits package designed to support employees both personally and professionally. Benefits and perks may vary by region and location.
Diversity, Equity & Inclusion
Databricks is committed to fostering a diverse, equitable, and inclusive workplace where every employee can thrive and contribute their best work.
The company maintains inclusive hiring practices and provides equal employment opportunities regardless of age, race, ethnicity, disability, gender identity, sexual orientation, religion, marital status, veteran status, socio-economic background, political affiliation, or any other protected characteristic.
Compliance Notice
Some positions may require access to export-controlled technology or source code. Where applicable, employment may be subject to obtaining required government authorizations. Databricks reserves the discretion to determine eligibility based on applicable legal and regulatory requirements.
Apply for Senior Applied AI Engineer Jobs at Databricks
If you are passionate about machine learning, artificial intelligence, forecasting, optimization algorithms, scalable ML systems, and building production-grade AI solutions, this is your opportunity to join one of the world's leading Data and AI companies and make a lasting impact.
Job Features
Location – Belgrade, Serbia Build the Future of AI-Powered Products at Databricks Databricks is seeking a talented Senior Applied AI […]
Location - Bengaluru, India
Join Databricks and Shape the Future of AI-Powered Search
Are you passionate about building large-scale AI systems that transform how users discover and interact with data? Databricks is looking for an experienced Staff Engineer – Applied AI Search & Discovery to lead the development of next-generation search technologies powered by Machine Learning (ML), Natural Language Processing (NLP), and Large Language Models (LLMs).
As part of the Applied AI team, you will work on innovative search and discovery solutions that help thousands of organizations efficiently find critical assets across the Databricks ecosystem. This is a unique opportunity to influence the future of AI-driven search experiences at one of the world's leading Data and AI companies.
About the Role
The Applied AI team at Databricks is dedicated to advancing intelligent search and discovery capabilities across the platform. Databricks customers generate vast amounts of content, including data tables, notebooks, dashboards, SQL queries, machine learning models, pipelines, data rooms, and other digital assets. Many enterprise customers manage hundreds of millions of assets, making effective search a mission-critical capability.
As a Staff Engineer, you will play a key role in improving search quality, enhancing ranking algorithms, advancing query understanding, expanding asset coverage, and building scalable evaluation frameworks. Your expertise will directly impact how users discover information and accelerate productivity across the Databricks platform.
Key Responsibilities
Lead AI-Powered Search Innovation
- Design, develop, and deploy advanced machine learning-based search relevance systems.
- Build scalable search and discovery solutions integrated into Databricks products and services.
- Drive innovation in search ranking, retrieval systems, and semantic search technologies.
Develop Intelligent ML and NLP Pipelines
- Create automated machine learning workflows for search optimization.
- Implement data preprocessing, query understanding, query rewriting, ranking, retrieval, and evaluation pipelines.
- Enable rapid experimentation and continuous model improvement.
Apply Large Language Models (LLMs)
- Leverage cutting-edge LLM technologies to enhance search relevance and user experience.
- Improve semantic understanding of user intent and content relationships.
- Develop intelligent retrieval strategies for large-scale enterprise environments.
Collaborate Across Teams
- Partner with Product Managers, AI Researchers, Data Scientists, and Engineering teams.
- Contribute to technology-driven product roadmaps and business initiatives.
- Drive strategic decisions that improve search and discovery experiences for customers.
Build Search Evaluation Frameworks
- Develop robust offline and online evaluation methodologies.
- Measure ranking effectiveness and search quality improvements.
- Establish metrics and experimentation frameworks to guide product development.
Required Qualifications
Education
- Bachelor's degree in Computer Science, Artificial Intelligence, Machine Learning, or a related technical field.
- Master's degree or PhD preferred.
Experience
- 10+ years of experience building and deploying large-scale search relevance systems.
- Proven track record in production environments or high-impact research organizations.
- Hands-on experience applying Large Language Models (LLMs) to search and retrieval challenges.
- Strong expertise in machine learning, information retrieval, and AI-driven search technologies.
Technical Skills
Experience in one or more of the following areas:
- Query Understanding
- Natural Language Processing (NLP)
- Text Mining
- Recommendation Systems
- Personalization Algorithms
- Search & Discovery Platforms
- Conversational AI
- Semantic Search
- Information Retrieval
- Machine Learning Engineering
Additional Qualifications
- Strong foundation in computer science fundamentals, algorithms, and distributed systems.
- Experience contributing to widely adopted open-source software projects.
- Excellent problem-solving, communication, and technical leadership skills.
Why Join Databricks?
At Databricks, you'll work on some of the most challenging and impactful AI problems in the industry. Our team is building state-of-the-art technologies that redefine how organizations interact with data, analytics, and artificial intelligence.
What You'll Gain
- Opportunity to build cutting-edge AI and search technologies.
- Work alongside world-class AI, ML, and data engineering experts.
- Influence products used by thousands of organizations worldwide.
- Solve large-scale search and discovery challenges involving millions of assets.
- Accelerate your career in one of the fastest-growing AI companies globally.
If you're excited about pushing the boundaries of AI, Machine Learning, Search Relevance, and Large Language Models, we encourage you to apply and become part of our mission.
About Databricks
Databricks is the Data and AI company trusted by more than 10,000 organizations worldwide, including leading enterprises and over half of the Fortune 500. The Databricks Data Intelligence Platform helps organizations unify data, analytics, and artificial intelligence to drive innovation and business growth.
Headquartered in San Francisco, Databricks was founded by the original creators of Apache Spark™, Delta Lake, MLflow, and the Lakehouse architecture. Today, Databricks continues to lead the industry in modern data and AI solutions.
Employee Benefits
Databricks offers a comprehensive and competitive benefits package designed to support employees' health, financial well-being, professional development, and work-life balance. Benefits may vary by location and region.
Diversity, Equity & Inclusion
Databricks is committed to creating an inclusive workplace where individuals from all backgrounds can thrive. We celebrate diversity and ensure equal employment opportunities for all qualified applicants regardless of race, ethnicity, gender identity, sexual orientation, disability, age, religion, veteran status, or any other protected characteristic.
Equal Opportunity Employer
Databricks is proud to be an Equal Opportunity Employer and is committed to fair and inclusive hiring practices throughout the recruitment process.
Compliance Notice
Certain positions may require access to export-controlled technology or source code. Where applicable, employment may be subject to obtaining necessary government authorizations and approvals. Databricks reserves the right to determine eligibility based on applicable legal and regulatory requirements.
Take the next step in your career and help build the future of AI-powered search, machine learning innovation, and intelligent data discovery at Databricks.
Job Features
Location – Bengaluru, India Join Databricks and Shape the Future of AI-Powered Search Are you passionate about building large-scale AI […]
