Senior MLOps/DevOps Engineer – ML Platform – Cork, Ireland

Job title:

Senior MLOps/DevOps Engineer – ML Platform – Cork, Ireland

Company

Qualcomm

Job description

Job Description:Company: QT Technologies Ireland LimitedJob Area: Engineering Group, Engineering Group
Systems EngineeringGeneral Summary:About The RoleQualcomm offers flexible work options tailored to our employee’s needs. These include a combination of work from home and working in our brand new, state of the art office in Penrose Dock, Cork.Well-being and life balance are fundamental to Qualcomm as an employer. We recognise and understand that employees have missed spending quality time with loved ones and extended family.As such, Cork Qualcomm policy allows our employees to blend short-term remote working with annual leave.We are seeking a highly skilled and experienced Senior MLOps Engineer to join our team and contribute to the development and maintenance of our ML platform.As a Senior MLOps Engineer, you will be responsible for architecting, deploying, and optimizing the ML platform that supports training of Machine Learning Models using NVIDIA DGX clusters and the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana.You will work closely with cross-functional teams, including data scientists, software engineers, and infrastructure specialists, to ensure the smooth operation and scalability of our ML infrastructure. Your expertise in MLOps and knowledge of GPU clusters will be vital in enabling efficient training and deployment of ML models.Responsibilities will include:

  • Architect, develop, and maintain the ML platform to support training and inference of ML models.
  • Design and implement scalable and reliable infrastructure solutions for NVIDIA DGX clusters.
  • Collaborate with data scientists and software engineers to define requirements and ensure seamless integration of ML workflows into the platform.
  • Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment.
  • Monitor and troubleshoot system performance, identifying and resolving issues to ensure the availability and reliability of the ML platform.
  • Implement and maintain CI/CD pipelines for automated model training, evaluation, and deployment using technologies like ArgoCD and Argo Workflow.
  • Implement and maintain monitoring stack using Prometheus and Grafana to ensure the health and performance of the ML platform.
  • Stay updated with the latest advancements in MLOps, distributed computing, and GPU acceleration technologies, and proactively propose improvements to enhance the ML platform.
  • Provide technical guidance and mentorship to junior team members.

What are we looking for:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Proven experience as an MLOps Engineer or similar role, with a focus on large-scale ML infrastructure and GPU clusters.
  • Strong expertise in configuring and optimizing NVIDIA DGX clusters for deep learning workloads.
  • Proficient in using the Kubernetes platform, including technologies like Helm, ArgoCD, Argo Workflow, Prometheus, and Grafana.
  • Solid programming skills in languages like Python, and experience with relevant ML frameworks (e.g., TensorFlow, PyTorch).
  • In-depth understanding of distributed computing, parallel computing, and GPU acceleration techniques.
  • Familiarity with containerization technologies such as Docker and orchestration tools.
  • Experience with CI/CD pipelines and automation tools for ML workflows (e.g., Jenkins, GitHub, ArgoCD).
  • Strong problem-solving skills and the ability to troubleshoot complex technical issues.
  • Excellent communication and collaboration skills to work effectively within a cross-functional team.

We would love to see:

  • Experience with training and deploying models for Automated Driving.
  • Knowledge of ML model optimization techniques and memory management on GPUs.
  • Familiarity with ML-specific data storage and retrieval systems .
  • Understanding of security and compliance requirements in ML infrastructure.

What’s on OfferApart from working in an open, relaxed and collaborative space, you will enjoy:

  • Salary, stock and performance related bonus
  • Maternity/Paternity Leave
  • Employee stock purchase scheme
  • Matching pension scheme
  • Education Assistance
  • Relocation and immigration support (if needed)
  • Life, Medical, Income and Travel Insurance
  • Subsidised memberships for physical and mental well-being
  • Bicycle purchase scheme
  • Employee run clubs, including, running, football, chess, badminton + many more

Minimum Qualifications: • Bachelor’s degree in Engineering, Information Systems, Computer Science, or related field and 2+ years of Systems Engineering or related work experience.
OR
Master’s degree in Engineering, Information Systems, Computer Science, or related field and 1+ year of Systems Engineering or related work experience.
OR
PhD in Engineering, Information Systems, Computer Science, or related field.*References to a particular number of years experience are for indicative purposes only. Applications from candidates with equivalent experience will be considered, provided that the candidate can demonstrate an ability to fulfill the principal duties of the role and possesses the required competencies.Although this role has some expected minor physical activity, this should not deter otherwise qualified applicants from applying. If you are an individual with a physical or mental disability and need an accommodation during the application/hiring process, please call Qualcomm’s toll-free number found for assistance. Qualcomm will provide reasonable accommodations, upon request, to support individuals with disabilities as part of our ongoing efforts to create an accessible workplace.Qualcomm is an equal opportunity employer and supports workforce diversity.Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.To all Staffing and Recruiting Agencies: Our Careers Site is only for individuals seeking a job at Qualcomm. Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited. Qualcomm does not accept unsolicited resumes or applications from agencies. Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location. Qualcomm is not responsible for any fees related to unsolicited resumes/applications.If you would like more information about this role, please contact .

Expected salary

Location

Cork

Job date

Thu, 02 May 2024 00:28:07 GMT

To help us track our recruitment effort, please indicate in your email/cover letter where (jobsnear.org) you saw this job posting.

Share

Supervising Pharmacist Louth

Job title: Supervising Pharmacist Louth Company Shelbourne Talent Solutions Job description and effective delivery of…

1 min ago

Commercial Account Manager

Job title: Commercial Account Manager Company Baltic Recruitment Job description recruiting for a Permanent Commercial…

25 mins ago

Medical Review Excellence Manager

Job title: Medical Review Excellence Manager Company Pfizer Job description Pfizer careers are like no…

28 mins ago

General Surgery SHO – NCHD

Job title: General Surgery SHO - NCHD Company Recruiter Group Job description your career. Essential…

1 hour ago

IT Project Manager

Job title: IT Project Manager Company Michael Page Job description Leading UK Retailer require an…

1 hour ago

Research Technician

Job title: Research Technician Company Vrije Universiteit Brussel Job description Are you our research technician…

2 hours ago
For Apply Button. Please use Non-Amp Version

This website uses cookies.