FANHEE
Published on

Who has the best deep research agent? Reading Summary

Authors
  • avatar
    Name
    LI Tian
    Twitter

I have the such questions, 'Grok, ChatGPT o1 Pro, Gemini 2.0 Flash Thinking— which one is better? What are the criteria for evaluation? Is Deep Research in model companies the same as Deep Research in the industry?'

Reading Resource:
https://www.youtube.com/watch?v=CF9IDZoznQY

Reference:

Deep Research Head2Head: Brutally Honest 5-Dimension Test

Q1: Research Report

GIVEN the following prompt

  • Compare reasoning model API providers:

    1. DeepSeek R1 official API
    2. OpenRouter DeepSeek R1 API
    3. Gemini 2.0 flash thinking API
    4. o3-mini API
    5. Perplexity reasoner API

    Compare their output format, whether they output thinking tokens, and whether they can stream them. You must include the API output format or relevant code snippet in your answer. Also link relevant document.

Conclusion

image 1
image 2
image 3
image 4

Q2 - Technical Question

GIVEN the following prompt,

  • Survey the best Client-side JS library for hybrid search (full text and vector search combined), rank them by popularity, reliability, memory footprint, and ease of use. Present any performance benchmarks, release activity and references to documentation or community support that provide insight into their overall stability and adoption.

    TEST the performance of ChatGPT, Gemini and Perplexity Deep Research.

Conclusion

image 5
image 6
image 7
image 8

Benchmark of Deep Research

image 9
image 10
image 11
image 12

Reference

  1. Agent interview by Andrew Ng
  2. HuggingFace Smolagents - introduction blog post
  3. https://huggingface.co/blog/open-deep-research