Mary Wood bio photo

Mary Wood

Data scientist with a passion for health research

Email Twitter Google Scholar LinkedIn Github ORCID

Phase Genomics, Seattle, WA

Senior Data Scientist, January 2024 - February 2025
Data Scientist III, January 2023 – December 2023
Data Scientist II, July 2022 - December 2022
Data Scientist, July 2021 - June 2022

  • Developed and optimized machine learning models (Pytorch, Keras, Hugging Face) to predict cancer-associated structural variants from proximity ligation sequencing data, improving the mean average precision for detecting chromosomal translocations and inversions by 15%
  • Led the development of machine learning and dataset creation toolkits, enhancing team efficiency by building reusable utilities for MLFlow logging, exploratory data analysis, and model evaluation
  • Streamlined cloud-based workflows using AWS Lambda and Terraform, automating variant prediction and reporting processes
  • Spearheaded efforts to improve single nucleotide variant (SNV) detection by retraining Google’s Deep Variant on proximity ligation sequencing data, increasing precision by more than 10% over the original model
  • Mentored junior data scientists, helping to define project goals, resolve project challenges, and review code to maintain standard

Bioinformatics Research Analyst, December 2020 - June 2021

  • Assembled, scaffolded, and phased genomes for diverse species, and conducted bioinformatics analyses to identify genomic interactions
  • Consulted with customers to optimize experimental design and data interpretation

Portland VA Research Foundation, Portland, OR

Research Associate, January 2018 – December 2020

I was the primary developer and data analyst for the team of an early career Principal Investigator, giving me the opportunity to take the lead on study design, independently learn analysis techniques for new projects, and coordinate with graduate students to help support their work in addition to my own.

  • Developed and benchmarked open-source, extensively unit-tested, pip installable Python software for neoepitope prediction, and mentored student interns in contributing new features to the package
  • Trained random forest and support vector machine models using sklearn to aid in the prediction of immunotherapy outcomes by characterizing proteasomal cleavage sites based on protein sequence context
  • Analyzed large-scale sequencing datasets from cancer patients to identify predictors of response to immunotherapy, demonstrating the limitations of traditional metrics such as tumor mutational burden

Oregon Health and Science University, Portland, OR

Computational Biology Intern, January 2017 – December 2017

  • Developed a neoepitope prediction method that improved upon existing methods by incorporating phasing of variants in shared haplotypes
  • Identified population-level distributions of neoepitopes in genomic data from The Cancer Genome Atlas and evaluated criteria for improved prioritization of predicted neoepitopes
  • Used and adapted of a variety of bioinformatics software for sequence alignment, variant calling, HLA type prediction, neoepitope prediction, and more

University of Oregon, Eugene, OR

Graduate Teaching Fellow, Department of Biology, January 2015 – December 2016

  • Led two weekly laboratory sections of 16-32 students each for undergraduate biology courses
  • Met with students in office hours and by appointment to provide guidance on assignments and improve understanding of course material
  • Evaluated and provided constructive feedback on student papers, presentations, tests, and lab reports

Undergraduate Research Assistant, January 2012 – December 2014

  • Consulted with Principal Investigators and graduate students to maintain essential operations of the laboratory
  • Mentored new employees on laboratory regulations and organized sessions to work on collaborative projects
  • Presented at several laboratory meetings each year on scientific topics of interest, such as vector-borne diseases