How to Scrape and Compare Multiple Competitors in One Claude Conversation
Running competitive research is one of the most valuable things you can do as a founder, marketer, or product person. It tells you where your competitors are spending their marketing effort, which customer segments they're hunting, and how they're positioning their product offerings. When done well, it's a goldmine of actionable information.
The catch is that it's slow. The old process looked something like this:
- Open each competitor's site
- Find their sitemap
- Paste the URLs into a spreadsheet
- Filter to the pages you actually care about
- Scrape each one by hand, or pay for a tool that does it
- Drop the results into a doc
- Build a comparison table
All of this can now be accomplished with a single AI prompt thanks to the new 'extract urls' feature in the Simplescraper MCP server. Point it at any website and you get the full URL list back in seconds, ready for further scraping or analysis. The new process looks like this:
- Pull every URL from a competitor's site
- See what each site invests in
- Scrape the pages that matter for a deeper read
- Compare side by side
- Ask the open questions
All in one back-and-forth. No spreadsheets, no scripts, no copy-paste.
To show the full benefits of this workflow, we'll run a real teardown across five indie-known AI video tools - all serving creators, all with very different angles:
- Revid.ai - viral short-form, automation-first
- AutoShorts.ai - faceless YouTube and TikTok
- Submagic - captions and transitions for creators
- OpusClip - long-form to clips, podcasters and YouTubers
- Captions.ai - AI presenters and dubbing for creators
You'll need a Claude Desktop, Cursor, or Codex install with the Simplescraper MCP added (instructions at the end) and a free Simplescraper account. The URL Extractor tool is free to use.
Let's get started.
Step 1: Pull Every URL From All Five Sites
The first prompt:
"Extract all the URLs from revid.ai, autoshorts.ai, submagic.co, opus.pro and captions.ai."
Claude reads that, picks the right MCP tool (extract_urls), and calls it five times - once per domain. Each call finds the site's sitemap and returns the full URL list.
[VIDEO PLACEHOLDER: Claude Desktop running the first prompt, showing the five extract_urls MCP calls completing in sequence and the URL counts coming back]
The response Claude hands you back:
| Site | URLs in sitemap |
|---|---|
| revid.ai | 2,374 |
| autoshorts.ai | 23 |
| submagic.co | 20,219 |
| opus.pro | 6,801 |
| captions.ai | 138 |
The numbers give you the scale. What each site actually invests in lives in the URL paths themselves.
Step 2: See What Each Site Invests In
Next, let's group each site's URLs by their top-level path.
"For each site, group the URLs by their top-level path (e.g.
/blog/,/tools/,/pricing/) and ignore localized mirrors like/de/or/es-es/. Show me the top 8 path prefixes per site."
[VIDEO PLACEHOLDER: Claude returning the per-site investment breakdown as a table]
What comes back:
revid.ai
- 2,374 total URLs
- 344 English, 2,030 across 7 locales
| Path | URLs |
|---|---|
| /tools | 1,209 |
| /make | 1,048 |
| /features | 40 |
| /tiktok-video-finder | 34 |
| /pricing | 8 |
autoshorts.ai
- 23 total URLs
- No localization
| Path | URLs |
|---|---|
| /blog | 10 |
| /alternative | 6 |
| /login, /register, /contact, /faq | 1 each |
submagic.co
- 20,219 total URLs
- 506 English, 19,713 across 19 locales
| Path | URLs |
|---|---|
| /vs | 10,098 |
| /blog | 4,220 |
| /auto-subtitle-generator | 736 |
| /apps | 665 |
| /alternative | 665 |
| /alternatives | 544 |
| /use-cases | 456 |
| /tools | 416 |
opus.pro
- 6,801 total URLs
- 2,811 English, 3,990 across 4 locales
| Path | URLs |
|---|---|
| /agent | 4,079 |
| /blog | 1,068 |
| /tools | 684 |
| /research | 232 |
| /alternative | 155 |
| /business | 55 |
| /demo | 55 |
| /hub | 38 |
captions.ai
- 138 total URLs
- No localization
| Path | URLs |
|---|---|
| /blog | 54 |
| /solutions | 29 |
| /tools | 24 |
| /styles | 20 |
| /features | 7 |
A few observations:
- Submagic and OpusClip are betting on programmatic comparison content at scale. Submagic has 10,098
/vs/pages, 665/alternative/pages, and 544/alternatives/pages (yes, both spellings, separate sections). That's 11,300+ URLs whose entire job is to capture "X vs Y" and "X alternative" search traffic. - revid.ai and OpusClip are running programmatic landing-page farms. Revid has 2,257 URLs under
/tools/and/make/. Opus has 4,079 under/agent/. The naming dresses it up differently ("tools" vs "agent workflows") but the play is identical. - AutoShorts.ai isn't really competing on SEO. 10 blog posts and 6 alternative pages, no localization, that's it.
- Captions.ai is the curated outlier. 138 URLs, every section meaningfully populated, no localization, no programmatic farms. The sitemap reads like a hand-built marketing site.
Step 3: Scrape the Five Most Recent Blog Posts From Each
Now we pull real content. The Simplescraper MCP includes extract_markdown, which fetches one page and returns its body as clean Markdown - the right tool when the goal is "give me the post so I can analyze it."
"Scrape the five most recent blog posts from each company. I want the title, publish date, and a one-paragraph summary of each."
Claude picks the five most recent URLs per company, then calls extract_markdown for each one. Revid doesn't publish a blog, so it's skipped - 20 scrapes total.
[VIDEO PLACEHOLDER: scraping progress as the 20 jobs run inside Claude, then the markdown content landing back]
Simplescraper's cloud browsers load each page, render any JavaScript, and return clean Markdown. Claude holds all 20 posts in context.
A handful of the sites - the ones running on Webflow behind Cloudflare - will block the headless browser, so a few scrapes return errors. That's a signal in itself; we'll surface it in the next step.
This is the part that's genuinely awkward to do without MCP. The old way: write a script, loop the API calls, store the results, batch them, handle retries. The new way: wait 30 to 60 seconds and Claude has the full content ready to compare.
Step 4: Build the Report
The payoff. Claude is now sitting on 29,555 URLs and 12-20 blog posts worth of content. Ask for the report.
"Build me a competitor research report from everything we have. Include a comparison table with total URLs per site, top sections each is investing in, and the share that's localized. Then give me five signals I should act on."
[VIDEO PLACEHOLDER: Claude generating the report]
In Claude.ai you can ask for it as a rendered HTML artifact and open it in a browser; in Claude Desktop or Cursor you get the same content as Markdown. Either way, here's what it produced for this run.
Scale and investment
| Site | Total URLs | English | Locales | Top 3 sections |
|---|---|---|---|---|
| Submagic | 20,219 | 506 | 19 | /vs 10,098, /blog 4,220, /auto-subtitle-generator 736 |
| OpusClip | 6,801 | 2,811 | 4 | /agent 4,079, /blog 1,068, /tools 684 |
| revid.ai | 2,374 | 344 | 7 | /tools 1,209, /make 1,048, /features 40 |
| Captions.ai | 138 | 138 | 0 | /blog 54, /solutions 29, /tools 24 |
| AutoShorts.ai | 23 | 23 | 0 | /blog 10, /alternative 6 |
Five signals
- Three of five compete on programmatic SEO at industrial scale. Submagic runs 10,098
/vs/pages plus 1,209/alternative/pages, OpusClip runs 4,079/agent/workflows/pages targeting celebrity and brand names, Revid runs 2,257/tools/and/make/niche-targeted landers. The naming differs - alternatives, agents, tools - the play is identical. - Localization tracks SEO commitment. Submagic translates into 19 locales, Revid into 7, OpusClip into 4. AutoShorts and Captions don't localize at all. The three localizers are the three running programmatic SEO; the two non-localizers compete on product positioning.
- Captions.ai recently rebranded to Mirage and raised $175M. One of the scraped blog posts (March 24, 2026) announces the financing. The sitemap structure - curated, no localization, no programmatic farms - reflects a B2B-leaning strategy that the funding round supports.
- AutoShorts.ai publishes posts without dates. Five posts scraped, zero with a publish date in any extractable form. Total URLs: 23. The "no content channel" outlier.
- Webflow + Cloudflare sites resist headless-Chrome scraping. All 5 Submagic posts and 3 of 5 OpusClip posts failed (browser navigation timeout). The same URLs return full SSR HTML to plain
curlin under a second. Worth knowing as a real product limitation when promising "scrape any site."
That's the comparison from one prompt. You can drill into anything that catches your eye - "show me 10 random URLs from /agent/" or "who is Submagic comparing themselves against in the 10,098 /vs/ pages?" - same context, more depth.
Step 5: Apply It to Your Own Market
The chain is the same regardless of category. SaaS, e-commerce, dev tools, agencies, creator economy, niche marketplaces - pick your five competitors, install the Simplescraper MCP, run the prompts.
Five things to look for when you do:
- Where is each competitor's URL volume concentrated? A site with 10,000+ pages in one section is making a deliberate bet. Find out what.
- Who localizes, and into what? International SEO is a signal of where they think growth is. Match it, skip it, or pick the locales they missed.
- What programmatic patterns are running?
/alternatives/,/vs/,/tools/[niche],/agent/,/[city]- if anyone in your space is doing this, it's the lowest-effort SEO bet someone has already proven works. - What's their blog cadence and what topics dominate? Posts per month, top recurring terms, recent shifts. Tells you what they're optimizing for.
- What's the gap? Sections everyone has except one. Locales no one localizes into. Topics nobody owns. The gaps are where you can move first.
In fifteen minutes you'll know more about how your competitors are positioning, where they're spending marketing effort, and what they're betting on than most of them know about themselves.
The Same Workflow Without MCP, via API
If you're building competitor monitoring into your own product or running it on a schedule, the chain is two API endpoints.
const API_KEY = 'YOUR_API_KEY';
const RECIPE_ID = 'YOUR_BLOG_SCRAPE_RECIPE_ID';
// Step 1: extract URLs from one competitor
const response = await fetch('https://api.simplescraper.io/v1/extract-urls', {
method: 'POST',
headers: {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ domain: 'https://submagic.co' })
});
const { data } = await response.json();
// Step 2: filter (no API call - just JS)
const blogUrls = data.urls
.filter(url => /\/blog\//.test(url) && !/^https?:\/\/[^\/]+\/[a-z]{2}(-[a-z]{2})?\//.test(url))
.slice(0, 5);
// Step 3: scrape each via your scrape recipe
for (const url of blogUrls) {
await fetch(`https://api.simplescraper.io/v1/recipe/${RECIPE_ID}/run`, {
method: 'POST',
headers: { 'Authorization': `Bearer ${API_KEY}` },
body: JSON.stringify({ url })
});
}Loop that across each domain and you've got the same dataset Claude was holding in its context. Pipe it to Notion, Airtable, or a database, and you've got a competitor monitor you can run on a cron.
URL extraction stays free regardless of how you call it. Scraping uses credits per page. Full API docs at simplescraper.io/docs/extract-urls-api.
Install the Simplescraper MCP
Three quick install paths, depending on which client you use.
Claude Desktop. Open Settings → Developer → Edit Config, then add Simplescraper to your claude_desktop_config.json:
{
"mcpServers": {
"simplescraper": {
"command": "npx",
"args": ["-y", "@simplescraper/mcp"],
"env": { "SIMPLESCRAPER_API_KEY": "YOUR_API_KEY" }
}
}
}Restart Claude Desktop. The Simplescraper tools show up in the MCP picker automatically.
Cursor. Add the same config block to your Cursor MCP settings (Settings → MCP → Add new server).
Codex. [PLACEHOLDER: confirm exact Codex MCP install command - link to docs.]
Grab your API key from the Simplescraper account page. The free tier covers URL extraction in full. Paid tiers unlock more credits for the scraping step.
And that's the chain. Three minutes to install, five prompts to run the workflow, no code unless you want to write some.
Want the URL Extractor on its own? See the URL Extractor API docs. Want the full MCP server setup? Read the Simplescraper MCP guide.