For A Client Of TeamLease Digital
Design, build, and maintain HPC clusters and compute environments.
Install and configure servers, storage systems, networking, and accelerators (GPU/FPGA).
Plan scalable architectures for compute-intensive workloads.
Manage Linux-based HPC systems.
Monitor cluster health, node availability, hardware performance, and uptime.
Perform patching, upgrades, and preventive maintenance.
Configure and manage schedulers such as Slurm, PBS, LSF, Torque.
Optimize queue policies, fair-share scheduling, and resource utilization.
Troubleshoot failed jobs and scheduling issues.
Tune CPU, memory, storage, and network performance.
Analyze bottlenecks in parallel applications.
Improve MPI/OpenMP/CUDA job efficiency.
Administer parallel/distributed file systems like Lustre, GPFS, BeeGFS, NFS.
Ensure high-speed data access and storage reliability.
Manage backups, quotas, and data lifecycle.
Configure high-speed interconnects such as InfiniBand, RoCE, Ethernet.
Diagnose latency, bandwidth, and node communication issues.
Job Details
Employment Type Contractual