HPC Systems Administrator
Position Summary
We are seeking an HPC Systems Administrator to join our growing organization. Reporting to the Director of the Center for Research Computing, the HPC Systems Administrator works with the HPC team to perform specialized functions for systems installation, management, problem-solving, and solution design, and serves as the primary backup for the lead HPC Systems Engineer. Additional technical functions include the implementation and support of HPC research environments, including databases, containers, HPC and hybrid/cloud compute and storage services, and security and access controls. The incumbent will participate on the HPC Systems and User-Facing team to proactively and reactively identify and solve operational and software problems running on our HPC systems and collaborate with Rice Information Security to properly secure the environment and any related information services, whether cloud-based or on-premise.
Additionally, while this is primarily a systems-facing role, the incumbent may participate in the training of scholars and students on campus in the use of the HPC and research computing facilities to support research, education, and outreach to industrial and governmental partners.
The ideal candidate has experience managing HPC systems in research environments and the ability to collaborate with colleagues across the Rice IT organization to provide best-in-class HPC services.
Workplace Requirements
This position is an on-site (in-person) role. A hybrid work arrangement may be considered after the probationary period. Per Rice policy 440, work arrangements may be subject to change.
Hiring Range
This is a full-time, benefits-eligible position, and the proposed salary range is $80,000 to $92,500 annually, depending on qualifications and experience. *Exempt (salaried) positions under FLSA are not eligible for overtime.*
Special Instructions to Applicants
Applicants should attach a resume and cover letter in PDF format to the Supporting Documents section of the application.
Minimum Requirements
- Bachelor's degree
- In lieu of the education requirement, additional related experience above and beyond what is required, on an equivalent year-for-year basis, may be substituted
- 2+ years of hands-on Linux system administration building and operating HPC clusters.
- In lieu of the experience requirement, additional related education above and beyond what is required, on an equivalent year-for-year basis, may be substituted.
Skills
- Managing Linux clusters in production-oriented research/HPC environments (Slurm / Open OnDemand; RunAI)
- Managing container environments for HPC services/workflows (Docker/Kubernetes)
- Scripting and automation (Python, Bash, Ansible)
- Managing HPC networking (InfiniBand, Omni-Path, NDR)
- Managing and monitoring shared HPC resources
- Working well independently and as a team member
- Supporting and documenting HPC environments
Preferences
- Experience supporting accelerator (GPU) ecosystems for AI/ML and scientific workloads
- Experience with automated management of Linux HPC clusters (Warewulf, Terraform)
- Experience building and managing containers with Docker, Podman, and/or Kubernetes
- Experience working with secure systems for regulated data
- Experience integrating on-premise HPC with public cloud services (GCP, AWS, Azure) to migrate or burst workloads while managing cost/performance tradeoffs
Essential Functions
- Manage day-to-day reliability and performance of HPC services
- Build and maintain automation for predictable operations and rapid recovery
- Enable HPC and containerized workflows aligned with research needs
- Provide advanced support and documentation of HPC services and systems
- Configure and manage system and network security
- Manage installation and maintenance of HPC hardware and operating systems
- Troubleshoot issues with HPC systems and services
- Monitor and handle incoming service requests and trouble tickets
- Respond to security vulnerabilities, incidents, and outages in a timely manner
- Monitor resource usage to identify enhancements to system capabilities and performance
- Recommend upgrades according to growth statistics and disk space forecasts
- Evaluate new technologies and integrate new systems into the computing environment
- Document infrastructure for users, support and consulting personnel, and developers
- Occasional after-hours or weekend work may be requested for critical incidents or emergency situations
- Perform all other duties as assigned
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process
Express interest in this position
Let Rice University know you're interested in HPC Systems Administrator
Get similar job alerts
Receive notifications when similar positions become available

















