I really like the distinction between DeepSearch and DeepResearch proposed in this piece by Han Xiao: https://jina.ai/news/a-practical-guide-to-implementing-deeps...
> DeepSearch runs through an iterative loop of searching, reading, and reasoning until it finds the optimal answer. [...]
> DeepResearch builds upon DeepSearch by adding a structured framework for generating long research reports
Given these definitions, I think DeepSearch is the more valuable and interesting pattern. It's effectively RAG built using tools in a loop, which is much more likely to answer questions effectively than more traditional RAG where there is only one attempt to find relevant documents to include in a single prompt to an LLM.
DeepResearch is a cosmetic enhancement that wraps the results in a "report" - it looks impressive but IMO is much more likely to lead to inaccurate or misleading results.
More notes here: https://simonwillison.net/2025/Mar/4/deepsearch-deepresearch...
> DeepResearch is a cosmetic enhancement that wraps the results in a "report" - it looks impressive but IMO is much more likely to lead to inaccurate or misleading results.
I think that if done well deep research can be more than that. At a minimum, I would say that before "deep search" you'd need some calls to an LLM to figure out what to look for, what places would be best to look for (i.e. sources, trust, etc), how to tabulate the data gathered and so on. Just as deep search is "rag w/ tools in a loop", so can (should) be deep research.
Think of the analogy of using aider straight up going to code or using it to first /architect and then code. But for any task that lends itself to (re)searching. At least it would catch useless tangents faster.
At the end of the day, what's fascinating about LLM based agents is that you can almost always add another layer of abstraction on top. No matter what you build, you can always come from another angle. That's really cool imo, and something Hassabis has hinted lately in some podcasts.
Right - I'm finding the flawed Deep Research tools useful already, but what I really want is much more control over the sources of information they use.
Sadly I think that’s why non-open source commercial deep (re)search implementations are going to be largely useless. Even if you’re using a customized end point for search like Kagi, the sources are mostly garbage and no one except maybe Google Books has the resources and legal cover to expand that deep search into books, which are much better sources.
Exactly - like my whole codebase, or repositories of proprietary documents
So I’ve started a thing with Jim’s, and the first effort I am doing is erring the “tone” meaning I’m building a project template that will keep the bots focused.
I think that one part of the deep loop needs to be a check-in on expectations and goals…
So instead of throwing a deep task: I find that bots work better in small iterative chucks of objectives..
I haven’t formulated it completely yet but as an example ive been working extensively with cursors whole anthropic abstraction ai as a service:
So many folks suffer from “generating” quagmire;
And I found that telling the bot to “break any response into smaller chunks to avoid context limitations” works incredibly well…
So when my scaffold is complete the goal is to use Fabric Patterns for nursery assignments to the deep bots.. whereby they constantly check in.
Prior to “deep” things I found this to work really well by telling the bots about obsessively development_diray.md and .json tracking of actions (even still their memory is super small, and I envisioned a multi layer of agents where the initial agents actions feed the context of agents who follow along and you have a waterfall of context between agents so as to avoid context loss on super deep iterative research…
(I’ll type out something more salient when I have a KVM…
(But I hope that doesn’t sound stupid)
What are fabric patterns?
Basically agentic personas or modus operandai
You tell the agent “grok this persona to accomplish the task
I think the person who wrote that probably meant these, specifically: https://github.com/danielmiessler/fabric/tree/main/patterns
Yes, as I am the one who wrote that.
> DeepResearch is a cosmetic enhancement that wraps the results in a "report" - it looks impressive but IMO is much more likely to lead to inaccurate or misleading results.
Yup, I got the same impression reading this article - and the Jina one, too. Like with langchain and agents, people are making chained function calls or a loop sound like it is the second coming, or a Nobel prize-worthy discovery. It's not - it's obvious. It's just expensive to get to work reliably and productize.
It is obvious now, but I will defend langchain which is catching a stray here.
Folks associated with that project were some of the first people to talk about patterns like this publicly online. With GPT-3-Davinci it was not at all obvious that chaining these calls would work well, to most people, and I think the community and team around langchain and associated projects did a pretty good job of helping to popularize some of the patterns such as they are now obvious.
That said, I agree with your overall impression.
I thought DeepResearch has the AI driving the process because it's been trained to do so vs DeepSearch is something like langchain + prompt engineering?
There are three different launched commercial products called "Deep Research" right now - from Google Gemini, OpenAI and Perplexity. There are also several open source projects that use the name "Deep Research".
DeepResearch (note the absence of the space character) is the name that Han Xiao proposes for the general pattern of generating a research-style report after running multiple searches.
You might implement that pattern using prompt engineering or using some custom trained model or through other means. If the eventual output looks like a report and it ran multiple searches along the way it fits Han's "DeepResearch" definition.
Why is the AI industry so follow-the-leader in naming stuff? I've used at least three AI services called "Copilot".
Maybe for the same reason JavaScript is named Java Script and looks like Java (instead of being Scheme which it almost was, twice)? That is, purposeful name collision with an existing tool/buzzword that's very popular with non-technical management and corporate executives.
I mean, these companies have enough issues with branding and explaining the differences between AI products, even those they provide (GPT-4, o1, o3-mini, etc. - do most openAI users know the differences between them or what they each specialise at?).
I guess they will take any opportunity to follow the leader here if they worry that they're at risk of similar branding issues here too.
> DeepResearch is a cosmetic enhancement that wraps the results in a "report"
No, that's not what Xiao said here. Here's the relevant quote
> It often begins by creating a table of contents, then systematically applies DeepSearch to each required section – from introduction through related work and methodology, all the way to the conclusion. Each section is generated by feeding specific research questions into the DeepSearch. The final phase involves consolidating all sections into a single prompt to improve the overall narrative coherence.
(I also recommend that you stare very hard at the diagrams.)
Let me paraphrase what Xiao is saying here:
A DeepSearch is a primitive — it does mostly the same thing a regular LLM query does, but with a lot of trained-in thinking and searching work, to ensure that it is producing a rigorous answer to your question. Which is great: it means that DeepSearch is more likely to say "I don't know" than to hallucinate an answer. (This is extremely important as a building block; an agent needs to know when a query has failed so it can try again / try something else.)
However, DeepSearch alone still "hallucinates" in one particular way: it "hallucinates understanding" of the topic, thinking that it already has a complete mental toolkit of concepts needed to solve your problem. It will never say "solving this sub-problem seems to require inventing a new tool" and so "branch off" to another recursed DeepSearch to determine how to do that. Instead, it'll try to solve your problem with the toolkit it has — and if that toolkit is insufficient, it will simply fail.
Which, again, is great in some ways. It means that a single DeepSearch will do a (semi-)bounded amount of work. Which means that the costs of each marginal additional DeepSearch call are predictable.
But it also means that you can't ask DeepSearch itself to:
• come up with a mathematical proof of something, where any useful proof strategy will implicitly require inventing new math concepts to use as tools in solving the problem.
• do investigative journalism that involves "chasing leads" down a digraph of paths; evaluating what those leads have to say; and using that info to determine new leads.
• "code me a Facebook clone" — and have it understand that doing so involves iteratively/recursively building out a software architecture composed of many modules — where it won't be able to see the need for many of those modules at "design time", but will only "discover" the need to write them once it gets to implementation time of dependent modules and realizes that to achieve some goal, it must call into some code / entire library that doesn't exist yet. (And then make a buy-vs-build decision on writing that code vs pulling in a dependency... which requires researching the space of available packages in the ecosystem, and how well they solve the problem... and so on.)
A DeepResearch model, meanwhile, is a model that looks at a question, and says "is this a leaf question that can be answered directly — or is this a question that needs to be broken down and tackled by parts, perhaps with some of the parts themselves being unknowns until earlier parts are solved?"
A DeepResearch model does a lot of top-level work — probably using DeepSearch! — to test the "leaf-ness" of your question; and to break down non-leaf questions into a "battle plan" for solving the problem. It then attempts solutions to these component problems — not by calling DeepSearch, but by recursively calling itself (where that forked child will call DeepSearch if the sub-problem is leaf-y, or break down the sub-problem further if not.)
A DeepResearch model will then takes the derived solutions for dependent problems into account in the solution space for depending problems. (A DeepResearch model may also be trained to notice when it's "worked into a corner" by coming up with early-phase solutions that make later phases impossible; and backtracking to solve the earlier phases differently, now with in-context knowledge of the constraints of the later phases.)
Once a DeepResearch model finds a successful solution to all subproblems, it takes the hierarchy of thinking/searching logs it generated in the process, and strips out all the dead-ends and backtracking, to present a comprehensible linear "success path." (Probably it does this as the end-step of each recursive self-call, before returning to self, to minimize the amount of data returned.)
Note how this last reporting step isn't "generating a report" for human consumption; it's a DeepResearch call "generating a report" for its parent DeepResearch call to consume. That's special sauce. (And if you think about it, the top-level call to this whole thing is probably going to use a non-DeepResearch model at the end to rephrase the top-level DeepResearch result from a machine-readable recurse-result report into a human-readable report. It might even use a DeepSearch model to do that!)
---
Bonus tangent:
Despite DeepSearch + DeepResearch using a scientific-research metaphor, I think an enlightening comparison is with intelligence agencies.
DeepSearch alone does what an individual intelligence analyst does. You hand them an individually-actionable question; they run through a "branching, but vaguely bounded in time" process of thinking and searching, generating a thinking log in the process, eventually arriving at a conclusion; they hand you back an answer to your question, with a lot of citations — or they "throw an exception" and tell you that the facts available to the agency cannot support a conclusion at this time.
Meanwhile, DeepResearch does what an intelligence agency as a whole does:
1. You send the agency a high-level strategic Request For Information;
2. the agency puts together a workgroup composed of people with trained-in expertise with breaking down problems (Intelligence Managers), and domain-matter experts with a wide-ranging gestalt picture of the problem space (Senior Intelligence Analysts), and tasks them with breaking down the problem into sub-problems;
3. some of these sub-problems are actionable — they can be assigned directly for research by a ground-level analyst; some of these sub-problems have prerequisite work that must be done to gather intelligence in the field; and some of these sub-problems are unknown unknowns — missing parts of the map that cannot be "planned into" until other sub-problems are resolved.
4. from there, the problem gets "scheduled" — in parallel, (the first batch of) individually-actionable questions get sent to analysts, and any field missions to gather pre-requisite intelligence are kicked off for planning (involving spawning new sub-workgroups!)
5. the top-level workgroup persists after their first meeting, asynchronously observing the reports from actionable questions; scheduling newly-actionable questions to analysts once field data comes in to be chewed on; and exploring newly-legible parts of the map to outline further sub-problems.
6. If this scheduling process runs out of work to schedule, it's either because the top-level question is now answerable, or because the process has worked itself into a corner. In the former case, a final summary reporting step is kicked off, usually assigned to a senior analyst. In the latter case, the workgroup reconvene to figure out how to backtrack out of the corner and pursue alternate avenues. (Note that, if they have the time, they'll probably make "if this strategy produces results that are unworkable in a later step" plans for every possible step in their original plan, in advance, so that the "scheduling engine" of analyst assignments and fieldwork need never run dry waiting for the workgroup to come up with a new plan.)
You're right, Han didn't define DeepResearch as "a cosmetic enhancement". I quoted his sentence long definition:
> DeepResearch builds upon DeepSearch by adding a structured framework for generating long research reports.
But then called it "a cosmetic enhancement" really to be slightly dismissive of it - I'm a skeptic of the report format because I think the way it's presented makes the information look more solid than it actually is. My complaint is at the aesthetic level, not relating to the (impressive) way the report synthesis is engineered.
So yeah, I'm being inaccurate and a bit catty about it.
Your explanation is much closer to what Han described, and much more useful than mine.
First, SimonW, I devour everything you write and appreciate you most in the AI community and recommend you from 0 all the way to 1!!!
Thank you.
-
Second, thank you for bringin up Jina, I recently discovered it and immediate began building a Thing based on it:
I want to use its functions to ferret-out all the entanglements from the roster from the WEF Leadership roster, similar to the NGO fractal connections - I’m doing that with every WEF member, through to Congress.
I would truly wish to work with you on such, I so inclined..
I prefer to build “dossiers” rather than reports, and represented in json schemas
I’m on mobile so will provide more details when at machine…
Looping through a dossier of connections is much more thoughtful than a “report” imo.
I need to see you on someone’s podcast, else you and I should make one!
Thanks! I've done quite a few podcast things recently, they're tagged on my blog: https://simonwillison.net/tags/podcast-appearances/
Dub, yeah I saw that but hadn’t listened yet.
What I want is a “podcast” with audience participation..
The lex fridman DeepSeek episode was so awesome but I have so many questions and I get exceedingly frustrated when lex doesn’t ask what may seem obv to us HNers…
-
Back topic:
Reports are flat; dossiers re malleable.
As I mentioned my goal is fractal visuals (in minamGL) of the true entanglements from the WEF out.
Much like mike Benz on usAid - using jina deep research, extraction etc will pull back the veil on the truth of the globalist agenda seeking control and will reveal true relationships, loyalties and connections.
It been running through my head for decades and I finally feel that jina is a tool that can start to reveal what myself and so many others can plainly see but can’t verify.
I'm an idiot...
anyway - I just watched it.
Fantastic.
---
What are your thoughts on API-verload -- We will need bots and bots to keep track of the APIS...
Meaning... Are we to establish an Internet Protocol ~~ AI Protocol addressing schema - whereby every API henceforth has an IP v6 addy?
and instead of "domain names" for sale -- you are purchasing and registering an IPv6-->API.MY.SERVICE
I watched them...
So amazed how aligned we are.