

Role Overview
We are looking for an ML Engineer with deep expertise in developing voice AI systems using PyTorch, Hugging Face libraries, and leading open-source frameworks such as SpeechBrain or ESPnet. You will play a critical role in architecting, training, and deploying models for speech recognition, text-to-speech, and LLM alignment using DPO/PPO methodologies.
In this position, you’ll be part of a tight-knit team of researchers and engineers who value creativity, ownership, and an open-source mindset. You will have a direct impact on the company’s core voice AI products, shaping the future of voice-driven applications.
Key Responsibilities
1. Model Development & Training
- Design, train, and optimize deep learning models for speech recognition and TTS (Text-to-Speech) using frameworks like SpeechBrain, ESPnet, PyTorch, and Hugging Face.
- Experiment with DPO/PPO for LLM alignment tasks, including data collection strategies and hyperparameter tuning.
2. Research & Prototyping
- Explore state-of-the-art voice AI solutions, reading and implementing recent research papers to keep our technology at the forefront of the industry.
- Rapidly prototype ideas and effectively communicate findings to the team.
3. Deployment & Integration
- Collaborate with backend engineers to integrate trained models into production environments.
- Ensure models run efficiently at scale, with low latency and high accuracy.
4. Open Source & Collaboration
- Maintain a strong GitHub open-source contribution track record by continuously improving internal projects and giving back to the broader community.
- Mentor team members on best practices in modern deep learning and open-source community engagement.
5. Startup Mindset
- Work in a dynamic, agile environment with high autonomy and a bias for action.
- Adapt to rapidly changing requirements and contribute to a culture of innovation.
Qualifications:
- Education: Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Mathematics, or a related field (equivalent experience considered).
- Technical Skills:
- 1+ years of hands-on experience with PyTorch and Hugging Face libraries.
- Proven track record working with SpeechBrain or ESPnet for speech recognition or TTS.
- Experience with reinforcement learning algorithms like DPO/PPO for LLM alignment.
- Familiarity with GPU-based training, model optimization, and distributed computing.
- Open Source Contributions:
- Demonstrated strong GitHub portfolio, showcasing active involvement in relevant ML/voice AI projects.
- Evidence of published or open-source contributions, PR merges, or community collaborations.
- Startup Experience:
- Minimum 1-2 years in a startup environment, comfortable with rapid prototyping, shifting priorities, and lean product cycles
- Soft Skills:
- Excellent communication skills; able to articulate complex technical concepts to non-technical stakeholders.
- Collaborative mindset and passion for solving challenging problems with minimal supervision.
What We Offer:
- Competitive salary and meaningful equity in a high-growth startup.
- Opportunity to work remotely (depending on role/location) with flexible working hours
- A learning budget to attend conferences, workshops, or courses for upskilling.
- A chance to help shape the future of voice AI in a rapidly evolving industry
- A collaborative and empowering work culture focused on innovation and excellence
Thanks,
Srinath Tankasala
