HPC Systems Engineer
HPC Systems Engineer
University Overview
The University of Pennsylvania, the largest private employer in Philadelphia, is a world-renowned leader in education, research, and innovation. This historic, Ivy League school consistently ranks among the top 10 universities in the annual U.S. News & World Report survey. Penn has 12 highly-regarded schools that provide opportunities for undergraduate, graduate and continuing education, all influenced by Penn's distinctive interdisciplinary approach to scholarship and learning. As an employer Penn has been ranked nationally on many occasions with the most recent award from Forbes who named Penn one of America's Best Large Employers in 2023.
Penn offers a unique working environment within the city of Philadelphia. The University is situated on a beautiful urban campus, with easy access to a range of educational, cultural, and recreational activities. With its historical significance and landmarks, lively cultural offerings, and wide variety of atmospheres, Philadelphia is the perfect place to call home for work and play.
The University offers a competitive benefits package that includes excellent healthcare and tuition benefits for employees and their families, generous retirement benefits, a wide variety of professional development opportunities, supportive work and family benefits, a wealth of health and wellness programs and resources, and much more.
Posted Job Title
HPC Systems Engineer
Job Profile Title
Systems Administrator Senior
Job Description Summary
The Penn Advanced Research Computing Center (PARCC) core facility is seeking a highly qualified and motivated High Performance Computing (HPC) Systems Engineer to join the team. PARCC's main cluster (Betty), delivers HPC, data-intensive science and Artificial Intelligence (AI) resources to researchers at the University of Pennsylvania. The HPC Systems Engineer contributes to the strategic planning, design, testing, organization, and implementation of cutting-edge technology projects for the facility, and leads the systems team.
Job Responsibilities
- Collaborate with senior staff to design, plan, test, and implement advanced hardware solutions for HPC-AI environments.
- Deploy and configure physical hardware using HPC deployment tools and orchestration frameworks (e.g., Ansible).
- Ensure high availability and minimal downtime of HPC resources to meet the needs of the research community.
- Optimize, monitor, and troubleshoot HPC file systems for performance and reliability.
- Conduct system benchmarking and develop automated testing to ensure a robust and efficient HPC infrastructure.
- Maintain job scheduling systems and enforce storage allocation policies to ensure equitable use of shared resources.
- Administer and configure the Slurm scheduler in alignment with institutional research policies.
- Participate in planning sessions related to network and security operations; collaborate with the university's central networking group (ISC).
- Apply HPC networking configurations and security protocols to optimize resource utilization and protection.
- Maintain a secure, stable, and evolving system/software environment to support dynamic research requirements.
- Implement and manage data security controls, including user- and group-based access.
- Operate comprehensive monitoring systems for rapid issue detection and long-term performance analysis.
- Automate user account lifecycle processes, including creation, maintenance, and removal.
- Install and maintain HPC tools to facilitate processes, for example, coldfront.
- Manage hardware and software inventory in coordination with vendors.
- Establish a collaboration with other groups to provide and keep information about all assets.
- Provide technical guidance on new projects involving HPC-AI computing within the institution.
- Develop custom tools as needed, and contribute relevant innovations to open-source communities when appropriate.
- Evaluate, implement, and test emerging technologies with potential benefits for the HPC-AI research community.
- Continuously assess emerging tools and technologies for integration into current and future HPC cluster environments.
- Actively mentor and support the training of new and existing staff under the incumbent's supervision.
- Participate in departmental and university-sponsored training programs to enhance knowledge and skills; supervisor-approved commercial training may be substituted where appropriate.
Qualifications
- Bachelor's degree and 3-5 years of experience as a systems engineer at an academic institution or equivalent combination of education and experience
- Expertise in Infiniband networking
- Experience configuring job and resource management applications (Slurm)
- Experience deploying HPC portal (OpenOnDemand, Cryo-em, coldfront)
- Familiarity with scientific software deployment (spack, easybuild)
- Expertise with cluster management software (xCAT, BCM,)
- Experience deploying file systems, troubleshooting, and maintenance
Application Requirement
A Cover Letter and Resume/CV are required to be considered for this position. Please upload your Cover Letter where it asks you to upload your Resume/CV; multiple documents are allowed.
Job Location - City, State
Philadelphia, Pennsylvania
Department / School
Provost's Center
Pay Range
$83,500.00 - $110,000.00 Annual Rate Salary offers are made based on the candidate's qualifications, experience, skills, and education as they directly relate to the requirements of the position, and in alignment with salary ranges based on external market data for the job's level. Internal organization and peer data at Penn are also considered.
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process
Express interest in this position
Let University of Pennsylvania know you're interested in HPC Systems Engineer
Get similar job alerts
Receive notifications when similar positions become available













