Selected publications

For a full list please visit Google Scholar

Automated detection of glaucoma with interpretable machine learning using clinical data and multi-modal retinal images

Glaucoma, the leading cause of irreversible blindness worldwide, is a disease that damages the optic nerve. Current machine learning (ML) approaches for glaucoma detection rely on features such as retinal thickness maps; however, the high rate of segmentation errors when creating these maps increase the likelihood of faulty diagnoses. This paper proposes a new, comprehensive, and more accurate ML-based approach for population-level glaucoma screening. Our contributions include: (1) a multi-modal model built upon a large data set that includes demographic, systemic and ocular data as well as raw image data taken from color fundus photos (CFPs) and macular Optical Coherence Tomography (OCT) scans, (2) model interpretation to identify and explain data features that lead to accurate model performance, and (3) model validation via comparison of model output with clinician interpretation of CFPs. We also validated the model on a cohort that was not diagnosed with glaucoma at the time of imaging but eventually received a glaucoma diagnosis. Results show that our model is highly accurate (AUC 0.97) and interpretable. It validated biological features known to be related to the disease, such as age, intraocular pressure and optic disc morphology. Our model also points to previously unknown or disputed features, such as pulmonary capacity and retinal outer layers.

Parmita Mehta, Christine A Petersen, Joanne C Wen, Michael R Banitt, Philip P Chen, Karine D Bojikian, Catherine Egan, Su-In Lee, Magdalena Balazinska, Aaron Y Lee & Ariel Rokem (last two authors contributed equally)

American Journal of Ophthalmology

Evaluating the reliability of human brain white matter tractometry

The validity of research results depends on the reliability of analysis methods. In recent years, there have been concerns about the validity of research that uses diffusion-weighted MRI (dMRI) to understand human brain white matter connections in vivo, in part based on reliability of the analysis methods used in this field. We defined and assessed three dimensions of reliability in dMRI-based tractometry, an analysis technique that assesses the physical properties of white matter pathways: (1) reproducibility, (2) test-retest reliability and (3) robustness. To facilitate reproducibility, we provide software that automates tractometry ( In measurements from the Human Connectome Project, as well as clinical-grade measurements, we find that tractometry has high test-retest reliability that is comparable to most standardized clinical assessment tools. We find that tractometry is also robust: showing high reliability with different choices of analysis algorithms. Taken together, our results suggest that tractometry is a reliable approach to analysis of white matter connections. The overall approach taken here both demonstrates the specific trustworthiness of tractometry analysis and outlines what researchers can do to demonstrate the reliability of computational analysis pipelines in neuroimaging.

John Kruper, Jason Yeatman, Adam Richie-Halford, David Bloom, Mareike Grotheer, Sendy Caffarra, Gregory Kiar, Iliana Karipidis, Ethan Roy & Ariel Rokem

Aperture (in press)

Multidimensional analysis and detection of informative features in diffusion MRI measurements of human white matter

The white matter contains long-range connections between different brain regions and the organization of these connections holds important implications for brain function in health and disease. Tractometry uses diffusion-weighted magnetic resonance imaging (dMRI) data to quantify tissue properties along the trajectories of these connections. In the present work, we developed a method based on the sparse group lasso (SGL) that takes into account tissue properties measured along all of the bundles, and selects informative features by enforcing sparsity, not only at the level of individual bundles, but also across the entire set of bundles and all of the measured tissue properties. The sparsity penalties for each of these constraints is identified using a nested cross-validation scheme that guards against over-fitting and simultaneously identifies the correct level of sparsity. SGL makes it possible to leverage the multivariate relationship between diffusion properties measured along multiple bundles to make accurate predictions of subject characteristics while simultaneously discovering the most relevant features of the white matter for the characteristic of interest.

Adam Richie-Halford, Jason Yeatman, Noah Simon & Ariel Rokem

PLoS Computational Biology: 17(6): e1009136

Groupyr: Sparse Group Lasso in Python

Groupyr is a scikit-learn compatible implementation of the sparse group lasso linear model. It is intended for high-dimensional supervised learning problems where related covariates can be assigned to predefined groups.

Adam Richie-Halford, Manjari Narayan, Jason Yeatman, Noah Simon & Ariel Rokem

Journal of Open Source Software, 6(58), 3024

Combining citizen science and deep learning to amplify expertise in neuroimaging

Combining citizen science and deep learning can generalize and scale expert decision making; this is particularly important in disciplines where specialized, automated tools do not yet exist. In Braindr, expert-labeled data were amplified by citizen scientists through a simple web interface. A deep learning algorithm was then trained to predict data quality, based on citizen scientist labels. Deep learning performed as well as specialized algorithms for quality control (AUC = 0.99).

Anisha Keshavan, Jason Yeatman & Ariel Rokem

Frontiers in Neuroinformatics, 13: 29

Cloudknot: A Python library to run your existing code on AWS Batch

In the quest to minimize time-to-first-result, many computational scientists are turning to cloud-based distributed computing with commercial vendors like Amazon to run their computational workloads. Yet cloud computing remains inaccessible to many researchers. Cloudknot takes as input a Python function, Dockerizes it for use in an Amazon ECS instance, and creates all the necessary AWS Batch constituent resources to submit jobs. You can then use cloudknot to submit and view jobs for a range of inputs.

Adam Richie-Halford & Ariel Rokem

Proceedings of the 17th Python in Science Conference (2018): 8 - 14

A browser-based tool for visualization and analysis of diffusion MRI data

Human neuroscience research faces several challenges with regards to reproducibility. While scientists are generally aware that data sharing is important, it is not always clear how to share data in a manner that allows other labs to understand and reproduce published findings. Here we report a new open source tool, AFQ-Browser, that builds an interactive website as a companion to a diffusion MRI study. Because AFQ-Browser is portable—it runs in any web-browser—it can facilitate transparency and data sharing. Moreover, by leveraging new web-visualization technologies to create linked views between different dimensions of the dataset (anatomy, diffusion metrics, subject metadata), AFQ-Browser facilitates exploratory data analysis, fueling new discoveries based on previously published datasets. In an era where Big Data is playing an increasingly prominent role in scientific discovery, so will browser-based tools for exploring high-dimensional datasets, communicating scientific discoveries, aggregating data across labs, and publishing data alongside manuscripts.

Jason Yeatman, Adam Richie-Halford, Josh Smith, Anisha Keshavan & Ariel Rokem

Nature Communications: 9, Article number: 940

Hack weeks as a model for data science education and collaboration.

As scientific disciplines grapple with more datasets of rapidly increasing complexity and size, new approaches are urgently required to introduce new statistical and computational tools into research communities and improve the cross-disciplinary exchange of ideas. In this paper, we introduce a type of scientific workshop, called a hack week, which allows for fast dissemination of new methodologies into scientific communities and fosters exchange and collaboration within and between disciplines. We present implementations of this concept in astronomy, neuroscience, and geoscience and show that hack weeks produce positive learning outcomes, foster lasting collaborations, yield scientific results, and promote positive attitudes toward open science.

Daniela Huppenkothen, Anthony Arendt, David W. Hogg, Karthik Ram, Jacob T. VanderPlas, & Ariel Rokem

PNAS, 115: 8872-8877

A model of ganglion axon pathways accounts for percepts elicited by retinal implants

Degenerative retinal diseases such as retinitis pigmentosa and macular degeneration cause irreversible vision loss in more than 10 million people worldwide. Retinal prostheses, now implanted in over 250 patients worldwide, electrically stimulate surviving cells in order to evoke neuronal responses that are interpreted by the brain as visual percepts (‘phosphenes’). However, instead of seeing focal spots of light, current implant users perceive highly distorted phosphenes that vary in shape both across subjects and electrodes. We characterized these distortions by asking users of the Argus retinal prosthesis system (Second Sight Medical Products Inc.) to draw electrically elicited percepts on a touchscreen. Using ophthalmic fundus imaging and computational modeling, we show that elicited percepts can be accurately predicted by the topographic organization of optic nerve fiber bundles in each subject’s retina, successfully replicating visual percepts ranging from ‘blobs’ to oriented ‘streaks’ and ‘wedges’ depending on the retinal location of the stimulating electrode. This provides the first evidence that activation of passing axon fibers accounts for the rich repertoire of phosphene shape commonly reported in psychophysical experiments, which can severely distort the quality of the generated visual experience. Overall our findings argue for more detailed modeling of biological detail across neural engineering applications.

Michael Beyeler, Devyani Nanduri, James D Weiland, Ariel Rokem, Geoffrey M Boynton & Ione Fine

Scientific Reports, 9:9199

`pulse2percept`: A Python-based simulation framework for bionic vision

By 2020 roughly 20 million people worldwide will suffer from photoreceptor diseases such as retinitis pigmentosa and age-related macular degeneration, and a variety of retinal sight restoration technologies are being developed to target these diseases. One technology, analogous to cochlear implants, uses a grid of electrodes to stimulate remaining retinal cells. Two brands of retinal prostheses are currently approved for implantation in patients with late stage photoreceptor disease. Clinical experience with these implants has made it apparent that the vision restored by these devices differs substantially from normal sight. To better understand the outcomes of this technology, we developed pulse2percept, an open-source Python implementation of a computational model that predicts the perceptual experience of retinal prosthesis patients across a wide range of implant configurations. A modular and extensible user interface exposes the different building blocks of the software, making it easy for users to simulate novel implants, stimuli, and retinal models. We hope that this library will contribute substantially to the field of medicine by providing a tool to accelerate the development of visual prostheses.

Michael Beyeler, Geoffrey M. Boynton, Ione Fine & Ariel Rokem

Proceedings of the 16th Python in Science Conference (2017): 81 - 88