Job Title: OpenShift & OpenStack Implementation Engineer (GPU & Non-GPU)
Duration: 12 months (extendable)
Location: UAE
Job Brief
We are seeking a highly skilled OpenShift & OpenStack Implementation Engineer with expertise in both GPU and non-GPU environments. The role involves designing, deploying, and optimizing OpenShift container platforms and integrating OpenStack components for high-performance, scalable enterprise solutions. The ideal candidate will have extensive hands-on experience with Azure integration, data center environments, third-party firewalls, and enterprise-grade DR and security frameworks.
This position plays a critical role in ensuring resilient, compliant, and optimized infrastructure deployments within strict project timelines.
Key Responsibilities
1. Platform Design & Deployment
Architect and implement production-grade OpenShift clusters (GPU & Non-GPU) using UPI, IPI, and Bare Metal Agent-based Installer (ABI).
Deploy OpenShift Virtualization with successful VM provisioning and integration with existing infrastructure.
Implement OpenStack with Cinder storage integration (external storage), CommVault backup, and Neutron networking with third-party firewall integration.
2. GPU Integration & Optimization
Configure GPU nodes with NVIDIA, AMD, or Intel drivers.
Ensure optimal GPU utilization for AI/ML workloads without performance degradation.
Deploy GPU Operators and integrations (e.g., ROCm).
3. Infrastructure Resilience & DR
Design and implement Disaster Recovery strategies meeting defined RTO/RPO SLAs.
Execute DR drills with minimal downtime and no data loss.
Configure cluster high availability and validate against failure scenarios.
4. Security, Compliance & Monitoring
Implement CIS compliance, RBAC, encryption, and LDAP/AD integration.
Integrate logging and log forwarding (Loki Stack to SIEM).
Deploy HashiCorp Vault for secret management and ODF encryption (with and without KMS).
5. Backup & Restore Operations
Implement backup strategies for platform (ETCD), applications (OADP), and ACM/ACS with MinIO Object Storage.
Validate restore processes within SLA.
6. Integration & Automation
Deploy OpenShift Operators (Quay, ODF, CommVault, Cinder, Firewall integrations).
Implement automation scripts using Ansible and Terraform for cloud/on-prem deployment.
Maintain Quay Enterprise registries for production and DR clusters.
7. Cross-Functional Collaboration
Work closely with DevOps, Network & Security, and Managed Services teams.
Coordinate with vendors for advanced OS/software support.
8. Documentation & Knowledge Transfer
Produce High-Level Designs (HLD), Low-Level Designs (LLD), and test documents.
Maintain complete project documentation and recovery playbooks.