RINL Research Computing Center
About the Department
The University of Chicago Research Computing Center (RCC), a unit in the Office of Research and National Laboratories (RNL), provides high-end research computing resources to researchers at the University of Chicago. It is dedicated to enabling research by providing access to centrally managed High Performance Computing (HPC), storage, and visualization resources. These resources include hardware, software, high-level scientific and technical user support, and the education and training required to help researchers make full use of modern HPC technology and local and national supercomputing resources. The Office of Research and National Laboratories oversees the conduct of sponsored research, research program development, multi-institutional research institutes, national laboratory board, and contract management functions. RNL supports the development and coordination of research-related communications and educational programs at The University of Chicago. RNL oversees the management of two Department of Energy contracts for Argonne National Laboratory and Fermi National Accelerator Laboratory. When combined with the Lab R&D budgets, the office oversees approximately $1.4 billion in sponsored research. RNL works closely with individual scholars, departments, and divisions to encourage, seed, and coalesce research across the University, Argonne, and Fermilab campuses.
The University of Chicago is seeking a highly qualified Senior HPC System Administrator to join the system and operation team that builds and manages RCC HPC systems and facility operations. The individual in this position will be involved in the procurement and management of HPC hardware and software.
The job designs automated, scalable, and rapidly deployable solutions to infrastructure development and server configuration. Works independently to install, configure, and maintain operating systems. Uses best practices and systems knowledge to monitor and alert systems, utility software, and firewalls. Guides maintenance for production servers as well as Windows and Linux servers.
- Installs, configures, and maintains large computer clusters/servers and software.
- Oversees day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components. Manages the system's network switch, parallel file system and HPC software stack and tools.
- Configures the scheduling and queuing system.
- Diagnoses and resolves system operational problems quickly and effectively. Coordinates with vendors to resolve hardware and software problems. Assists users with access and other help desk ticket requests or issues.
- Uses scripting/programming skills to enable system-level automation, problem detection, security maintenance and patch management.
- Builds and deploys open source software and software from vendors/partners.
- Provides reliable and efficient backups/restores for all managed systems.
- Documents system administration procedures for routine and complex tasks.
- Maintains and monitors the security of the HPC systems and servers.
- Plans and installs necessary patches and upgrades for servers and their associated storage, network, communications, and peripheral sub-systems. Installs and maintains an appropriate level of intrusion detection, monitoring, and auditing software as required.
- Tracks compliance and maintains documentation for hardware, software, and service inventories for management reports.
- Performs other related work as needed.
Minimum requirements include a college or university degree in related field.---
Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline.---
Certifications:---Preferred QualificationsTechnical Skills or Knowledge:
- Installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.).
- Configuring, installing and troubleshooting MPI and OpenMP.
- Operating system deployment tools (e.g. XCAT, ROCKS).
- Configuring, administering, and supporting network storage subsystems (e.g. IBM, NetAppl DataDirect Network, LSI, etc.).
- Hands-on experience of at least one distributed file system (Spectrum Scale-GPFS, Lustre, BeeGFS, Gluster, IMRIX, PVFS, etc.).
- Direct experience working with Infiniband (must at least be able to demonstrate a working knowledge of Infiniband concepts, OFED layers, sub-net managers).
- Configuring, installing, tuning and maintaining scientific application software on large-scale systems.
- Experience supporting HPC compilers and libraries.
- Experience with systems automation tools such as Ansible or Puppet.
- Configuring, installing, maintaining and/or using performance monitoring and optimization tools.
- Work well with faculty and researchers.
- Identify and gain expertise in appropriate new technologies and/or software tools.
- Function as part of an interactive team while demonstrating self-initiative to achieve project's goals and Research Computing Center's mission.
- Strong analytical skills and problem-solving ability.
- Resume/CV (required)
- Cover Letter (preferred)
When applying, the document(s) MUST
be uploaded via the My Experience
page, in the section titled Application Documents
of the application.Job Family
Information TechnologyRole Impact
Individual ContributorFLSA Status
MonthlyScheduled Weekly Hours
YesRequires Compliance with University Covid-19 Vaccination Requirement
YesDrug Test Required
NoHealth Screen Required
NoMotor Vehicle Record Inquiry Required
Employees must comply with the University's COVID-19 vaccination requirements. More information about the requirements can be found on the University of Chicago Vaccination GoForward.
The University of Chicago is an Affirmative Action/Equal Opportunity/Disabled/Veterans Employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national or ethnic origin, age, status as an individual with a disability, protected veteran status, genetic information, or other protected classes under the law. For additional information please see the University's Notice of Nondiscrimination.
Staff Job seekers in need of a reasonable accommodation to complete the application process should call 773-702-5800 or submit a request via Applicant Inquiry Form.
We seek a diverse pool of applicants who wish to join an academic community that places the highest value on rigorous inquiry and encourages a diversity of perspectives, experiences, groups of individuals, and ideas to inform and stimulate intellectual challenge, engagement, and exchange.
All offers of employment are contingent upon a background check that includes a review of conviction history. A conviction does not automatically preclude University employment. Rather, the University considers conviction information on a case-by-case basis and assesses the nature of the offense, the circumstances surrounding it, the proximity in time of the conviction, and its relevance to the position.
The University of Chicago's Annual Security & Fire Safety Report (Report) provides information about University offices and programs that provide safety support, crime and fire statistics, emergency response and communications plans, and other policies and information. The Report can be accessed online at:http://securityreport.uchicago.edu.Paper copies of the Report are available, upon request, from the University of Chicago Police Department, 850 E. 61st Street, Chicago, IL 60637.
This job has expired.