We're looking for a Solution Architect – Kernels and Performance, Core ML to join our team in Lisbon, Portugal, in a hybrid working mode. In this role, you will drive low-level performance engineering of AI workloads, optimizing both model training and inference across advanced accelerator architectures like TPU and GPU. You will work on cutting-edge ML models, toolchains and frameworks, enabling scalable, efficient deployment of AI solutions in production. This position combines deep system-level engineering with architectural leadership, directly impacting next-generation AI performance.
Responsibilities
- Design and optimize high-performance kernels using low-level languages like Pallas, Mosaic and Triton for TPU and GPU architectures
- Architect infrastructure such as benchmarking suites, autotuning frameworks and performance analysis tools to support kernel development and testing
- Develop regression testing strategies and comprehensive documentation to maintain quality and facilitate adoption across developer communities
- Collaborate with ML researchers, framework developers (JAX, PyTorch) and compiler engineers (XLA) to address performance bottlenecks and implement effective solutions
- Track advancements in hardware architectures, compiler technologies and AI models to identify optimization opportunities and guide roadmap decisions
- Advocate best practices for integrating optimized kernels into open-source libraries and production systems
Requirements
- Bachelor’s degree in Computer Science or equivalent practical experience
- 12+ years of overall industry experience in software engineering or related fields
- Minimum 5 years of experience in C++ or Python development
- At least 3 years of experience testing, maintaining or launching software products
- Minimum 1 year of experience in software design and architecture