FANHEE
Published on

Deep Research 哪家强

Authors
  • avatar
    Name
    LI Tian
    Twitter

最近大模型的推理(reasoning)和深度思考(Deep Research)非常火。有些投资人问我,既然OpenAI那些这么厉害,还要做Agent干什么?先不忙于解释,我却有个问题,Grok, ChatGPT o1 Pro, Gemini 2.0 Flash Thinking— 这些商业的深度思考那个最强? 深度思考模型强悍的标准是什么?深度思考是模型还是程序?'

Reading Resource:
https://www.youtube.com/watch?v=CF9IDZoznQY

Reference:

Deep Research Head2Head: Brutally Honest 5-Dimension Test

Q1: Research Report

GIVEN the following prompt

  • Compare reasoning model API providers:

    1. DeepSeek R1 official API
    2. OpenRouter DeepSeek R1 API
    3. Gemini 2.0 flash thinking API
    4. o3-mini API
    5. Perplexity reasoner API

    Compare their output format, whether they output thinking tokens, and whether they can stream them. You must include the API output format or relevant code snippet in your answer. Also link relevant document.

Conclusion

image 1
image 2
image 3
image 4

Q2 - Technical Question

GIVEN the following prompt,

  • Survey the best Client-side JS library for hybrid search (full text and vector search combined), rank them by popularity, reliability, memory footprint, and ease of use. Present any performance benchmarks, release activity and references to documentation or community support that provide insight into their overall stability and adoption.

    TEST the performance of ChatGPT, Gemini and Perplexity Deep Research.

Conclusion

image 5
image 6
image 7
image 8

Benchmark of Deep Research

image 9
image 10
image 11
image 12

Reference

  1. Agent interview by Andrew Ng
  2. HuggingFace Smolagents - introduction blog post
  3. https://huggingface.co/blog/open-deep-research