Job Description

1. HPC Infrastructure Design & Deployment

  • Design, build, and maintain HPC clusters and compute environments.

  • Install and configure servers, storage systems, networking, and accelerators (GPU/FPGA).

  • Plan scalable architectures for compute-intensive workloads.

2. Cluster Administration

  • Manage Linux-based HPC systems.

  • Monitor cluster health, node availability, hardware performance, and uptime.

  • Perform patching, upgrades, and preventive maintenance.

3. Job Scheduling & Resource Management

  • Configure and manage schedulers such as Slurm, PBS, LSF, Torque.

  • Optimize queue policies, fair-share scheduling, and resource utilization.

  • Troubleshoot failed jobs and scheduling issues.

4. Performance Optimization

  • Tune CPU, memory, storage, and network performance.

  • Analyze bottlenecks in parallel applications.

  • Improve MPI/OpenMP/CUDA job efficiency.

5. Storage & File Systems Management

  • Administer parallel/distributed file systems like Lustre, GPFS, BeeGFS, NFS.

  • Ensure high-speed data access and storage reliability.

  • Manage backups, quotas, and data lifecycle.

6. Networking & Interconnects

  • Configure high-speed interconnects such as InfiniBand, RoCE, Ethernet.

  • Diagnose latency, bandwidth, and node communication issues.

Job Details

Employment Type Contractual

Education
Graduate
Job Id
5097384
State
Maharashtra
Country
India

Key Skills

About Company

For a Client of TeamLease Digital

0 Similar Jobs

Jobs By Cities


View all

View Less