Information is Beautiful - Silver
Why are So Many Babies Born Around 8am
It is common to aggregate data over time when striving for simplicity, but sometimes the story is revealed in the details: minutes, hours, days, and weeks.
Visualizing Data for Analysis & Decision Making
As a data vizualization and data strategy consultant, I change how people (literally) look at their data, resulting in new discoveries, deeper understanding, and better decision-making.
This includes advising data analysts, scientists, product designers, and software engineers, collaborating with data viz teams, and prototyping data visualization solutions to data analysis challenges. I've created visualizations for Google Research, Scientific American, National Geographic, Stanford, Yale, UC Berkeley, CDC, Santa Clara public health department, ACLU, Tailscale, Kickstarter, and even the Dalai Lama. My perspective in data visualization is grounded in my work forecasting/analyzing revenue data and advising executive stakeholders as a Senior Financial Analyst at Google.
Why are So Many Babies Born Around 8am
It is common to aggregate data over time when striving for simplicity, but sometimes the story is revealed in the details: minutes, hours, days, and weeks.
Activation Atlas
The Activation Atlas helps us see through the eyes of a model, not just how it responds to a specific input but to provide a map of what is "seeable" at each layer. This work was led by Shan Carter; I contributed to the interactives and .
Metagenomics
Originally created for researchers at UC Berkeley to study the genetics of micro-ecosystems like the gut of an infant, the inside of a nuclear reactor, or between a dolphin's teeth, this visualization was also embraced by the art & design community.
My favorite part were the bold colors. They met a scientific need: despite being close numerically, the values 0 and 1 were shown in black and bright colors, respectively, due to the scientific context that "none" and "at least 1" were conceptual opposites.
This talk introduces the three simple, flexible "tools" of data visualization: making color meaningful, using small multiples, and making order matter. Futhermore, it shows *why* these tools are so powerful to reveal information that might have been visible, but not noticeable or attention-grabbing. Most importantly, it demonstrates how practioners can use these principles to bring their own domain expertise or understanding of the question into their visualizations, and to make their implicit assumptions explicit.
In this talk you'll learn why your commute is worse in November than June, why so many babies are born around 8:30am, and why it's almost always a terrible idea to look at monthly data.
Published on Stack Overflow's blog, this article shows how "by aggregating our data in an effort to simplify it, we lose the signal and the context we need to make sense of what we’re seeing."
Simpson's Paradox is an extreme version of a notoriously confusing effect in which aggregate numbers seem to contradict more detailed statistics. A "comet chart" is a special type of scatterplot to reveal "changes in mix". In addition to showing the metric of interest on the y-axis, it also shows the size of each subgroup on the x-axis. It's immediately obvious if comets are all "streaking together", going all different directions, or if there is some correlation between where the comets are spacially and how they are changing.
Visualizing Mix Effects and Simpson's Paradox, in collaboration with Martin Wattenberg
Comet Chart & Gerrymandering: Slides
Telling the story of using math to create shapes that convey feelings.
While some data visualization specialists are focussed on effective communication and storytelling with data, much of my work is aimed at helping domain experts better use data to inform decisions or make new discoveries. For example, as a member of the Google Applied Science team I contributed to research on drug discovery, materials science, automating insight from biological images, fusion, impoving scientific computing with machine learning, and more. While I sometimes consulted on scientific communication, my focus was not about communicating results. Rather, it was about working with the scientists during the discovery and analysis process: advising, prototyping, performing viz-based analysis, and building analysis tools. Success was defined as visualizations changing the scientific decisions (which model is chosen, what experiment to perform, how to change experimental conditions, etc.), directly leading to a discovery, or using scientific data to inform a "real-world" decision.
Over the course of three years, I worked closely with scientists and ML experts as they built models to create new methods in the effort to find new anti-malarial drugs.
In the scientific context, authorship is reserved for contributions to the scientific results. So, my inclusion as an contributing author indicates the critical role visualization played in this work.
At Observable, I led this project to illustrate how we can literally change how we look at our data to more effectively analyze data heavily influenced by hour-of-day, day-of-week, and season-of-year patterns. This work was done in collaboration with Ian Johnson and Mike Freeman. Through a series of 6 "stories" we showcase techniques that can be applied to any dataset with values measured each hour or day.
When infected with a disease, we poop fragments of the viral RNA. Sampling wastewater and isolating/sequencing the extracted genes therefore provides unique insight into community-level transmission of infectious diseases, including Covid-19. Because "everybody contributes" their poop to the sewage system, tracking viral fragments in wastewater provides data that doesn't rely on individual people going to doctors to get official tests.
This LA Times article demonstrates the impact that this data has already had on the decisions of public health officials and of doctors making decisions about which drugs they should use to treat patients based on which variant is predominant.
Other pathogens: Monkeypox, RSV, Influenza, etc
Comparing variants by location
My work is not just about specific charts, but also about identifying new best practices and changing how researchers think about using viz more effectively in their own work.
"That was probably one of the most valuable hours that I spent in my entire time at [Google] X, completely shifted my thinking on how I do figures and what is important. Zan's skillset is unique in my experience."
- Sylvia Smullin - Physicist and ML Researcher
The design emphasized the importance of the two features that made this dataset special: quantitative accuracy and gene expression assayed by cell type.
This visualization played a role in the analysis leading to A multiregional proteomic survey of the postnatal human brain published in Nature Neurosience
My first attempt was a beautiful, interactive 3D surface. Unfortunately, it was useless for analyzing the data. Instead, these 48 line charts show 10x more data and focussed attention on the most important attributes of the data. Most importantly, the tool informed key research decisions and led to discovering an unexpected scientific phenomenon.
My contributions led to being an author in scientific research published in published in Proceedings of the National Academy of Sciences (PNAS), the world's second most cited scientific journal.
Discovery of complex oxides via automated experiments and data science
A common problem in ML is understanding the interconnected relationships between 3 to 20 potential features in a dataset to identify what combinations of features most affect a "score" variable. My key data visualization insight was identifying that if we discretized the continuous score function into "bad, meh, good, better, and best" categories we could much more effectively use color and small multiples to scan the space of pairwise interactions. This was especially important in this study where the ML algorithms informed growing conditions for algae, so measuring the effectiveness of a particular set of parameters was costly and time-consuming because it required actually growing the algae in the lab.
While my primary role on the Applied Sciences team was creating visualizations for analysis & discovery, I also advised researchers across the team on how to more effectively communicate their results.
The "curse of knowledge" bias is especially challenging in transitioning from analysis to communication: once you know what's important about the data in the chart, it's hard to imagine that someone else looking at the same chart wouldn't see the same thing.
Optimization of Molecules via Deep Reinforcement Learning
A Bayesian experimental autonomous researcher for mechanical design.
Machine learning on DNA-encoded libraries: A new paradigm for hit-finding
Quantum Optimization with a Novel Gibbs Objective Function and Ansatz Architecture Search
Investigating Quantum Approximate Optimization Algorithms under Bang-bang Protocols
In March or 2020, I founded and led the Data Viz Hub: a group of Googlers with data viz skills lending their expertise to projects across Google related to Covid-19. For example, our group contributed to the Google Search and News teams' visualizations of covid statistics.
Additionally, Adam Pearce and I collaborated to create visualizations and thought questions to contextualize the data in the Google's Community Mobility Reports.
I developed practice exercises for the data visualization chapter of this college-level textbook: Statistics: Unlocking the Power of Data.
These jittering lines still make me smile, even long after I first explored animation as a student at the School for Poetic Computation.
Using voronoi polygons to capture click events in Weather Lines sparked my curiosity. The result was an "explorable" that reveals the relationship among points, the bisecting lines that form triangles, circles, and the resulting polygons.
Everyone who's watched West Wing knows that Greenland is not as big as it seems in mercator projections. But, do you have a good intuition for the relative sizes of Saudi Arabia vs. Alaska or Europe vs. Antarctica?
This set of charts explores the relationship between perceptual uniformity and perceptual distance for 11 common colormaps, and it illustrates an alternative way to achieve uniformity by "straightening" the colormap.
Long ago I shared two rainbow colored "tiny tools" that I'd created to make it easier to choose formatting strings, and they ended up being surprisingly helpful to many people.
I lost a bet. It is colder in January than June in San Francisco, every single hour of the day.
Young babies love looking at high contrast cards, and the designs make viz-loving new parents smile.
PS - for V2, would like to print on sturdier materials. Contact me if you know how to order custom prints on materials a baby can put in their mouth.
Keynote speaker for Google's global quantative user experience conference
SciPy Conference: Data Visualization for Scientific Analysis, featured on the Storytelling with Data podcast Right Place, Right Graph
Joint Statistical Meeting (JSM): Recent Advances in Information Visualization panel
University of Washington's eSciences Seminar Series - Distinguished Speaker
Moss Landing Marine Labs Seminar Series: Same Data, Different Forms
Panel Discussion with Kerry Rodden and Kai Chang on Creating Custom, but Generalizable, Charts
Maryland Institute College of Art's graduate program in Data Visualization
AI for ALL
USF Digital Literacy Course
TechChange Technology for Data Visualization
UC Davis SIAM Mathematics in Industry Speaker Series
Outlier Conf: Led discussion section on domain-specific charts
Pro bono consulting for ACLU.
Intro to Data Visualization for CS teachers in the SF Public School system.
Guest teacher at Lowell High School, Lick-Wilmerding High School, Burton High School, AP Giannini Middle School, and BB&N Lower School.
Drawing with Data Workshop
Bay Area D3 Meetups including "Math to D3" talk.
Stamen Practitioners Series: A Conversation with Zan Armstrong
Jon Schwabish's Policy Viz Podcast
Scott Murray's Interactive Data Visualization for the Web
Ian Johnson's: How do you learn D3.js
You can reach me by email.