How We Built a COVID-19 Research Dashboard That Doctors Actually Used

This is the story of a system that was built twice. The first time, it replaced a failing research tool with a modern progressive analysis engine. The second time — extended overnight — it became a public dashboard that won an international data challenge and was used by frontline doctors during a pandemic.

What Existed Before

Primer AI had a product called Science. It ingested academic paper abstracts from PubMed, arXiv, and Crossref into Elasticsearch, then ran analysis tasks through a linear processing pipeline. The queue mechanism was built on top of Elasticsearch itself — a homegrown solution that was fragile and would fail frequently under load. The pipeline was rigid: tasks ran sequentially, and if one step failed, everything stalled.

John Bohannon, Primer's head of data science, tasked John Jansen — who was consulting at Primer as an engineer — with rebuilding the entire system.

The Rebuild

The vision, driven by Sean Gourley, was progressive loading: a user enters a query, and results appear incrementally as each analysis completes. Search results first, then entity extraction, then topics, then trends — each panel populating independently. This required throwing out the linear pipeline entirely and building something fundamentally different.

John kept Elasticsearch for document storage — it was good at that job — but replaced everything else. The homegrown ES-based queue was ripped out in favour of RQ (Redis Queue), giving proper distributed task execution with failure handling, retry logic, and the ability to fan out multiple analysis tasks in parallel rather than running them sequentially.

For the progressive delivery, JJ adopted Rails' Turbo Frames and the broader Hotwire framework — server-rendered HTML fragments delivered independently to different parts of the page as each background task completed. This was an unconventional choice for a Python application, but it worked beautifully: the backend could push partial results to the frontend without requiring a JavaScript framework, WebSocket complexity, or polling.

The combination created a generic, scalable on-demand analysis pipeline. Query comes in, tasks fan out to RQ workers in parallel, each worker runs its analysis and renders a Turbo Frame, the frames stream back to the page as they complete. The same pipeline could handle any kind of text analysis by swapping out the task implementations — it was a platform, not a one-off tool.

The system still worked exclusively on abstracts rather than full paper content. This was pragmatic — abstracts were freely available, consistently structured, and contained enough signal for useful analysis without the complexity of PDF extraction and paywall navigation.

Then COVID Hit

In early 2020, the coronavirus pandemic created an unprecedented flood of scientific research. Thousands of papers were being published weekly, and nobody could keep up. Primer already had a rebuilt system that could analyse scientific literature beautifully on demand. The question was whether it could scale from a tool used by a handful of researchers to a public dashboard serving thousands of concurrent users.

The answer needed to be yes, and it needed to be yes by tomorrow.

The Overnight Extension

The on-demand model — where every user query triggers fresh analysis — would collapse under public-scale traffic. John made the critical architectural decisions under extreme time pressure.

An aggressive caching layer was added so that pre-computed analysis results could be served instantly rather than computed on every request. Scheduled and repeating saved searches were implemented — the system would automatically re-run key COVID-19 queries on a schedule, updating the cached results as new papers were ingested from PubMed, arXiv, and Crossref. Papers were categorised into research areas aligned with the White House call to action to the AI community — patient care, forecasting, therapeutics, epidemiology.

The progressive loading architecture survived the transition intact. Different analysis panels on the dashboard updated independently as fresh data flowed through the pipeline. Social media and news signals were layered on top, so users could see not just what was being published, but what was being discussed.

The critical insight: because the rebuild had created a generic analysis platform rather than a rigid single-purpose tool, extending it to handle scheduled queries and caching was a matter of days, not months. The architecture absorbed the new requirements without being rewritten.

The Result

The COVID-19 Primer dashboard won 1st Prize in the Professional Track of the CGDV "Flattening the Curve: COVID-19 Data Challenge" (CGDV Data Challenge). An emergency medicine doctor at UC San Francisco described it as "actually a really amazing way to cut through the noise" and used it to inform hospital treatment protocols. The platform was featured in Axios.

And the dashboard is still live at covid19primer.com — a system built under pressure in 2020, still serving in 2026.

What This Tells You About Architecture

The lesson here isn't about COVID, or NLP, or even Primer. It's about the difference between replacing a system and extending one.

The original Science product had a linear pipeline with a homegrown queue. When requirements changed, it couldn't adapt — it could only break. The rebuild created a generic platform with proper separation of concerns: document storage in Elasticsearch, task execution in RQ, progressive delivery via Hotwire. When the crisis hit, that architecture absorbed the new requirements — caching, scheduling, web-scale traffic — without fundamental restructuring.

This is what Dreamware means when we talk about building for the problem, not the moment. Good architecture anticipates change not by predicting the future, but by being structured enough to absorb it.

What This Means for NZ Businesses

Most NZ businesses don't need to build pandemic-scale dashboards. But the underlying principles apply at every scale. If your data processing uses a homegrown queue held together with tape and hope, it will fail when you need it most. If your systems can't handle a sudden spike in demand, they're not production-ready. If adding a new feature requires rewriting the pipeline, your architecture is too rigid.

The difference between a system that survives contact with reality and one that doesn't is almost always in the architecture decisions. Dreamware brings the experience of having made those decisions under the most extreme conditions — and getting them right.