Every year, millions of new scientific papers are published—far more than any one person can read.
For scientists, the challenge isn’t access to knowledge but the overwhelming task of finding, interpreting, and connecting it. Breakthroughs happen at the edge of what’s known, yet researchers spend most of their time just finding the edges instead of pushing past them.
Consensus(opens in a new window), a research assistant used by more than 8 million people, was built to change that. Founded by Christian Salem and Eric Olson, the platform searches, reads, and synthesizes peer-reviewed literature across more than 220 million papers. Its newest capability, Scholar Agent, is a multi-agent system built on GPT‑5 and the Responses API. It mirrors how researchers actually work, helping them get from question to conclusion in minutes instead of weeks.
But the goal isn’t just faster research—it’s a faster path to discovery. “Science advances when it’s more accessible,” Salem says. “Our job is to give researchers everywhere the ability to find, trust, and act on evidence.”
From search engine to agentic assistant
The first version of Consensus worked like a vertical search engine for science: it indexed academic papers, retrieved relevant results, and generated summaries grounded in citations. But search alone wasn’t enough.
“Research isn’t just finding papers,” Salem says. “It’s interpreting results, comparing findings, and connecting ideas. The more time scientists spend searching, reading, and interpreting past knowledge for the right study, the less time they have to discover and createdo real research.”
So the team began re-architecting Consensus around a new concept: a multi-agent system called “Scholar Agent” that works the way a human researcher does.
Built on GPT‑5 and the Responses API, the system now runs a coordinated workflow of agents:
- Planning Agent breaks down the user’s question and decides which actions to take next
- Search Agent combs Consensus’s paper index, a user’s private library, and the citation graph
- Reading Agent interprets papers individually or in batches
- Analysis Agent synthesizes results, determines structure and visuals, and composes the final output
Each agent has a narrow scope, which keeps reasoning precise and minimizes hallucinations. The architecture also allows Consensus to decide when not to answer; if no relevant studies meet its quality threshold, the assistant simply says so.
“By dividing the workflow across agents, we reduce error and make the system far more disciplined,” Salem says. “No one agent has too much responsibility, which turns out to be key for reliability.”
This approach is what the team calls context engineering: assembling the right evidence before generation begins. Every answer comes with a “research context pack”—a structured bundle of papers, metadata, and key findings that trace back to original studies.
“We don’t want researchers wasting time double-checking every claim,” Salem says. “If the system can’t ground an answer in real evidence, it won’t make one up.”
Building with Responses API
Consensus migrated from Chat Completions to the Responses API to support its multi-agent routing. The switch improved both reliability and cost efficiency, giving the team finer control over sub-agent calls. With GPT‑5 long-context reasoning and reliable tool-calling, the choice was clear.
Early evaluations confirmed the bet: GPT‑5 outperformed GPT‑4.1, Sonnet 4, and Gemini 2.5 Pro on tool-calling accuracy and planning stability. That allowed the Consensus team to focus less on prompt gymnastics and more on building agent behaviors that map directly to research workflows.
A consumer bet in a world of institutions
From the beginning, Consensus approached the market differently than expected. Rather than selling through institutions, the team focused on the people doing the research itself: students, faculty, and clinicians who needed answers today. That direct-to-researcher focus shaped both the product’s design and its rapid growth.
“Everyone said you can’t go direct-to-consumer in academia, but AI has changed that,” Salem says. “People don’t wait for approval anymore—they use what works.”
That decision shaped the product’s tone and growth curve. Consensus feels more like a modern consumer app than a traditional academic tool: fast onboarding, intuitive design, conversational interface. Adoption spread through word of mouth across campuses and labs.
Graduate students and PhD candidates became the first power users, followed by faculty and private researchers. Then came clinicians, who began using Consensus to surface the latest evidence in their fields.
“We didn’t set out to build for doctors,” Salem says. “But they need the same thing researchers do: fast access to reliable evidence.”
The company recently signed the Mayo Clinic’s medical library and just launched ‘Medical Mode,’ a new feature designed for practitioners searching for clinical evidence.
In the past year, Consensus has expanded rapidly, growing to more than 8 million researchers worldwide and increasing revenue by 8x.
That growth hasn’t changed the product’s priorities. Every feature still revolves around verifiable, low-hallucination answers. The team has invested heavily in evaluation pipelines that test accuracy, citation traceability, and stylistic consistency across agents.
Consensus’s architecture is intentionally modular, designed so new agents can slot in as models expand and improve—agents that replicate experiments, generate figures, or run statistical analyses.
“We’re building the assistant researchers actually need in a rapidly changing world,” Salem says. “The models keep getting better, the system grows with them, and science moves faster.”

