Zaposlitveni oglasi » Software engineer, GPU inference
Software engineer, GPU inference -- brisan oglas
- objavljeno ::
Opis delovnega mesta
Soniox is pushing the boundaries of real-time speech AI, and we're looking for an engineer to help us scale the world's most advanced language models across a low-latency, high-throughput, production-grade inference stack.
In this role, you'll work at the intersection of deep learning, systems engineering, and performance optimization, helping us squeeze every FLOP out of our GPUs, reduce latency to the millisecond, and keep our systems running at global scale.
Od kandidatov zahtevamo
In this role, you will:
- Work closely with researchers, engineers, and product teams to bring cutting-edge AI models into real-world production.
- Architect and optimize our inference infrastructure to deliver low-latency, high-reliability performance across thousands of concurrent requests.
- Identify and eliminate system bottlenecks, improving throughput and GPU utilization across the fleet.
- Introduce and implement tools and techniques to monitor, debug, and improve model inference at scale.
- Tune our VM fleet to maximize compute, memory, and network efficiency down to the last GPU cycle.
- Support advanced research workflows by building robust, scalable systems that enable rapid experimentation.
You might thrive in this role if you:
- Have a strong intuition for optimizing modern ML architectures for inference performance.
- Are deeply familiar with PyTorch, CUDA, NCCL, and GPU internals, or excited to become an expert quickly.
- Understand HPC fundamentals and have worked with technologies like InfiniBand, NVLink, or MPI.
- Have experience building and scaling distributed systems in production, ideally performance-critical ones.
- Have rebuilt or refactored systems due to 10x+ scale increases and know what to watch out for.
- Are a self-starter who thrives in fast-moving environments and finds clarity amidst ambiguity.
- Care about reliability, simplicity, performance, and take ownership from design to deployment.
- Have at least 5 years of professional software engineering experience.
Kandidatom ponujamo
What we offer
- The chance to work on foundational AI that redefines how humans and machines communicate.
- Global impact: your work will touch millions (and soon billions) of people across languages and cultures.
- End-to-end ownership in a lean, engineering-driven team with no bureaucracy.
- Collaboration with world-class talent in research, engineering, and product.
- A fast-growing startup environment where you shape both the technology and the company's future.
- Competitive compensation with equity ownership.
- Flexible work setup with emphasis on in-person collaboration.
- Regular team events, offsites, and a strong learning-driven culture.
Klasifikacija delovnega mesta
- Lokacija:
- Ljubljana
- Plačilo:
- €3000 - €6000 gross and equity, plus performance bonus EUR / mesec
- Delovni čas:
- redna zaposlitev
Zahtevana znanja
- PyTorch, CUDA, NCCL, and GPU internals.
- napredno znanje