Job Title: Cloud Administrator – Linux, OpenShift, Kubernetes, IaC
Duration: 12 months (Contract)renewable
Location: Abu Dhabi
Job Brief:
We are seeking a highly skilled Cloud Administrator with expertise in Linux, OpenShift, Kubernetes, and Infrastructure as Code (IaC). The role involves architecting, deploying, and optimizing container platforms (including GPU and non-GPU environments), implementing OpenStack services, and ensuring high availability, security, and resilience of enterprise cloud infrastructure. The successful candidate will also drive automation using Terraform and Helm, support backup and disaster recovery, and collaborate with cross-functional teams to ensure stable and scalable service delivery.
Key Responsibilities:
Design, deploy, and manage OpenShift clusters (GPU and non-GPU) in enterprise environments.
Implement OpenShift Virtualization and support VM provisioning with full integration into existing infrastructure.
Deploy OpenStack with components such as Cinder storage, CommVault, and Neutron integrated with third-party firewalls.
Configure and optimize GPU nodes (NVIDIA, AMD, Intel) for AI/ML workloads.
Define, automate, and manage infrastructure using Terraform, Helm, and other IaC tools.
Implement and test disaster recovery strategies, ensuring RTO and RPO within SLAs.
Ensure cluster resilience, high availability, and fault tolerance across environments.
Execute and validate backup/restore operations (ETCD, Velero/OADP, MinIO).
Implement security and compliance best practices, including CIS benchmarks, RBAC, encryption, and AD/LDAP integration.
Monitor performance and optimize system health dashboards for real-time insights.
Troubleshoot and resolve incidents, conduct root cause analysis, and implement preventive measures.
Produce detailed documentation including LLDs, test documents, and recovery playbooks.
Collaborate closely with DevOps, infrastructure, and security teams for successful project delivery.
Mandatory Skills:
10+ years of IT experience with strong Unix/Linux administration expertise.
5+ years of hands-on experience with OpenShift and OpenStack implementation.
Strong knowledge of Kubernetes cluster management, PODs, containers, and upgrades.
Expertise in Infrastructure as Code (Terraform, Helm, Ansible).
Experience in GPU operator deployment and integration for accelerated workloads.
Proficiency in backup and recovery tools (Velero/OADP, ETCD, MinIO).
Strong troubleshooting skills in networking, performance profiling, and storage.
Hands-on experience with RedHat, CentOS, Ubuntu, and Suse Linux.
Preferred Skills:
Certifications in RHEL/SuSE/OEL (mandatory), OpenShift, OpenStack, or Terraform.
Experience with CommVault, ODF, Quay Enterprise, and third-party firewall integrations.
Knowledge of ODF encryption, DR implementation, and SIEM integration for logging.
Familiarity with scripting (Ansible, Terraform) and onsite datacenter operations.
Background in ITIL methodologies, SAN/storage technologies, and enterprise backup solutions.
Strong documentation and knowledge-sharing skills with the ability to produce LLDs and operational playbooks.