AI Engineer – Multimodal Intelligence — Apple Human Intelligence Team | Sunnyvale, CA

Full Time
Sunnyvale, California, United States
Posted 9 hours ago

Category: Artificial Intelligence | Computer Vision | Multimodal LLMs | Agentic AI

Employment Type: Full-Time

Location: Sunnyvale, California, United States

Posted: May 29, 2026

Role Number: 200665673-3956

Applications: Accepted on an ongoing basis


About the Role

Apple’s VCV organisation — a centralised applied research and engineering group driving real-time, on-device Computer Vision and Machine Perception across every Apple product — is hiring an AI Engineer for Multimodal Intelligence to join its Human Intelligence team.

This team sits at a rare intersection: deep research meets shipping product. You will work across the full stack of multimodal LLMs — from data collection and curation through modelling, evaluation, and deployment — while directly influencing Apple’s sensor and silicon roadmap in partnership with hardware, software, and ML teams.

If you’re excited about the frontier of foundation models, multimodal LLMs, and agentic AI systems, and want your research to directly shape the next generation of Apple products, this is your opportunity.


What You Will Be Doing

Multimodal LLM Development You will develop, train, and fine-tune multimodal LLMs spanning image, video, text, and audio modalities — owning the full pipeline from data curation through deployment.

Encoder & Generative Model Design You will design and build video/audio encoders, tokenisers, and generative models that power multimodal understanding and generation across Apple’s product ecosystem.

Agentic AI Systems You will design and implement agentic AI systems capable of reliable reasoning — enabling natural, proactive, and personalised human interactions across Apple devices.

End-to-End ML System Architecture You will architect ML systems that transition seamlessly from research prototypes to production-grade technologies at scale, ensuring research breakthroughs actually ship.

Cross-Functional Hardware Collaboration You will collaborate closely with hardware, software, and ML teams to influence sensor and silicon roadmaps, helping deliver pioneering on-device AI experiences.

Code Quality & Engineering Rigour You will critically evaluate and improve ML codebases — ensuring correctness, efficiency, and long-term maintainability across the team’s research and production code.

Research Direction & Innovation You will actively contribute to the team’s research roadmap, identifying opportunities for innovation in multimodal and agentic AI that directly shape future Apple product features.


Minimum Qualifications

  • Education: Master’s degree (or equivalent practical experience) in Computer Science, Computer Vision, Machine Learning, or a related technical field
  • 3+ years of relevant academic or industry experience in Machine Learning, Computer Vision, or Artificial Intelligence
  • Demonstrated deep learning experience with multimodal systems (vision, language, video, etc.)
  • Proficiency in Python and a modern deep learning framework such as PyTorch or JAX
  • Experience with foundation models (language or multimodal), including training, fine-tuning, and deployment
  • Direct experience developing, training, and fine-tuning multimodal LLMs
  • Strong foundations in optimisation, probability, and linear algebra as applied to ML and computer vision

Preferred Qualifications

Candidates with the following will be highly competitive:

  • PhD (or equivalent practical experience) in Computer Science, Machine Learning, Computer Vision, or a related AI-focused field
  • Demonstrated expertise in training and fine-tuning multimodal LLMs at scale, and developing industry-scale agentic products
  • Proven technical leadership — architecting complex ML systems and leading projects from conception through product deployment
  • Experience applying foundation models to build autonomous or semi-autonomous agents, including planning, task decomposition, and multi-step reasoning
  • Strong publication record at top-tier venues such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, or COLM
  • Experience with large-scale distributed training and model parallelism
  • Strong communication skills with the ability to present research findings to both technical and non-technical audiences

Compensation & Benefits

  • Base Salary Range: $147,400 – $272,100 USD (dependent on skills, qualifications, experience, and location)
  • Eligibility for discretionary restricted stock unit (RSU) awards
  • Ability to purchase Apple stock at a discount through the Employee Stock Purchase Plan (ESPP)
  • Comprehensive medical and dental coverage
  • Retirement benefits
  • Discounts on Apple products and access to free Apple services
  • Education reimbursement for formal learning related to career advancement at Apple, including tuition support
  • Potential eligibility for discretionary bonuses, commission payments, and relocation assistance

Note: Apple benefit, compensation, and employee stock programs are subject to eligibility requirements and the terms of the applicable plan or program.


About Apple

The Human Intelligence team operates in one of the most exciting eras of AI — advancing the state of the art in Computer Vision and Machine Learning across every aspect of multimodal LLMs, from data to deployment. Apple is an equal opportunity employer committed to inclusion and diversity, and does not discriminate on the basis of race, colour, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or any other legally protected characteristic. Apple participates in E-Verify in locations where required by law, is a drug-free workplace, and will not discriminate or retaliate against applicants who discuss their compensation.

Job Features

Job Category

AI Engineer

Apply For This Job

A valid email address is required.
A valid phone number is required.
loader
Scroll to Top