Principal Storage Architect & Team Lead (Research & HPC Data Platforms)
The Opportunity
Stanford University operates one of the most sophisticated academic storage ecosystems in the world, with aggregate capacity exceeding 100PB and 5 billion files. We are seeking a world-class technical leader to oversee our primary research storage platforms. These services span the gamut from the 15PB flash-based Lustre scratch filesystem on our Sherlock HPC cluster to archival storage on the Elm HSM platform.
Why Stanford?
You aren't just managing a storage cluster; you are architecting a data and storage ecosystem that supports Nobel-caliber research across all disciplines.
Primary Responsibilities
- Technical Leadership: Lead and mentor a specialized team of systems engineers, balancing high-level architectural design with hands-on operations and escalation support.
- Tiered Storage Architecture: Oversee the integration of Lustre HSM on the Elm platform, managing data movement policies between parallel filesystems and MinIO object storage.
- Platform Ownership: Drive the scaling, reliability, security, compliance, operations, and lifecycle management of our primary research computing storage platforms, including for high-risk data.
- Performance Engineering: Tune I/O for large-scale High Performance Computing and AI workloads.
- Community Stewardship: Represent Stanford within the Lustre community and other key community groups, contributing to the upstream roadmap and maintaining a vendor-neutral storage strategy.
Required Qualifications
- Education: Bachelor's degree and ten years on increasingly technical work experience or a combination of education and relevant experience.
- Expertise at Scale: 10+ years of hands-on experience architecting, building, and managing Lustre and ZFS or similar filesystems at the 20PB+ scale.
- Management: Proven experience leading technical teams in a High-Performance Computing (HPC) or Research Computing environment.
- Object Storage & HSM: Deep technical fluency in MinIO and Lustre HSM (copytools, policy engines like RobinHood) or similar tools.
- Kernel & Network Mastery: Expert-level knowledge of the Linux kernel and large-scale InfiniBand/Ethernet fabric tuning.
- "Hands-On" Requirement: Must be comfortable in the "weeds"--capable of debugging issues such as kernel panics, LNet congestion, and metadata bottlenecks alongside the team.
Physical Requirements*:
- Constantly perform desk-based computer tasks.
- Frequently sit, grasp lightly/fine manipulation.
- Occasionally stand/walk, writing by hand.
- Rarely use a telephone, lift/carry/push/pull objects that weigh up to 10 pounds.
* Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of the job.
Working Conditions:
- May work extended hours, evenings, and weekends.
Work Standards:
- Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations.
- Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety; communicates safety concerns; uses and promotes safe behaviors based on training and lessons learned.
- Subject to and expected to stay in sync with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide, http://adminguide.stanford.edu.
Unlock this job opportunity
View more options below
View full job details
See the complete job description, requirements, and application process





%20Jobs.jpg&w=128&q=75)







