Johns Hopkins, founded in 1876, is America's first research university and home to nine world-class academic divisions working together as one university.

The Johns Hopkins Data Science and AI Institute (DSAI) is a new pan-institutional initiative at Johns Hopkins to advance artificial intelligence and its applications, in part through investments in the software engineering, data science, and machine learning space. DSAI is focused on revolutionizing discovery by advancing artificial intelligence that evolves collaboratively with human intelligence, combining the strengths of each for the betterment of society and the world in which we live. DSAI will bring together the mathematical, computational, and ethical foundations of AI with the domains of Health & Medicine, Scientific Discovery, Engineered Systems, Security & Safety, and People, Policy & Governance.

DSAI seeks a Research Software Engineer - Clinical NLP Specialty with strong academic background and relevant experience in industry or academia focused on designing and building state-of-the art clinical NLP systems. This position supports research initiatives in the development and novel application of NLP and large language models to extract insights from unstructured clinical text using techniques such as named entity recognition (NER), negation detection, structured data extraction, diagnosis prediction, risk stratification, temporal reasoning and phenotyping. The successful candidate will play a critical role in designing, implementing, rigorously evaluating, deploying and maintaining robust and scalable NLP pipelines and models to extract meaningful information from unstructured clinical text in secure environments, with the goal of enabling high-impact solutions across a range of biomedical domains. Experience with large language models - such as fine-tuning, prompt engineering, model evaluation, and adapting foundation models for domain-specific clinical tasks - is desirable, particularly in contexts that demand privacy, robustness, and interpretability. The clinical NLP RSE will work closely with clinicians, informatics researchers, data scientists and other RSEs to ensure NLP systems meet application goals with methodological rigor and scientific reproducibility.

Specific Duties & Responsibilities

The successful candidates will participate in ground-breaking research projects that need advanced software solutions requiring expertise in software engineering not commonly found in scientific collaborations.
The projects will require development of state-of-the art clinical NLP solutions using the latest deep learning libraries trained on state-of-the-art hardware in secure healthcare computing environments.
Projects will involve analysis of massive data sets either in the cloud or on premises.
Projects will require development of novel NLP software pipelines for processing of unstructured clinical notes.
Some projects may require deep engagement, possibly leading to co-authorship on scientific publications, while others may involve a more casual consulting engagement.
They may require software solutions developed from scratch or refactoring existing solutions to make them conform to industry standards (quality, efficiency, reusability, robustness, portability, documentation, etc.).
It is a high-level goal of DSAI to translate the efforts for the individual projects into frameworks and template patterns for sustainable scientific infrastructure benefiting future projects.

Special knowledge, skills, and abilities

Strong NLP, LLM, machine learning and deep learning skills.
Practical experience building NLP models and pipelines in a secure, HIPPA compliant healthcare environment.
Expert-level knowledge of multiple modern NLP and LLM libraries and models.
Hands-on experience adapting and fine-tuning large language models for domain-specific clinical applications, with attention to data efficiency, interpretability, and reproducibility.
Demonstrated expertise in prompt engineering, evaluation, and benchmarking of large language models, including applying responsible AI principles in clinical or sensitive-data contexts
Expert-level knowledge of the Python programming language.
Familiarity with or willingness to learn C++ or other languages as may be needed.
Familiarity with software containerization technologies such as Docker and Singularity.
Familiarity with the Databricks platform.
Fluency in the Linux operating system and related tools.
Familiarity with modern software engineering best practices, such as Git source control, peer code review, test-driven development, build automation and continuous integration / continuous delivery.
Familiarity with cloud development and deployment.
Demonstrated leadership and self-direction.
Willingness to teach others both informally and in short course format.
Willingness to continually learn new tools and techniques as needed.
Excellent verbal and written communication.

Minimum Qualifications

Masters in a quantitative discipline such as computer science, engineering, physics or bioinformatics, with strong scientific computing and/or mathematics background.
Three year's experience working in software development in large clinical NLP projects in industry or academia.
Additional education may substitute for required experience, and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula.

Preferred Qualifications

PhD in a quantitative discipline.
Five (5) years’ experience as above in clinical NLP.
Experience in CUDA GPU programming.
Experience authoring open-source Python packages in PyPI.
Experience in open-source project governance.
Experience in open-source community adoption initiative.

Research Software Engineer – Clinical NLP (Data Science & AI Institute)

Johns Hopkins University

Johns Hopkins University, Baltimore, MD, USA