Scrape Any Website

Your employees can read the full content of any public webpage. Just give them a URL and they'll extract clean, readable text.

TL;DR

Every employee can read the full content of any public webpage. Paste a URL and they extract clean, readable text with no ads or clutter. No setup required.

How It Works

The employee fetches the page, strips boilerplate (headers, footers, ads, navigation), and returns clean text you can work with.

What It Can Do

Capability	Example
Read articles	"Read this blog post and summarize the key points"
Extract data	"Pull the pricing from this page"
Research competitors	"Read their features page and compare to ours"
Process documentation	"Read this API doc and write me a quick-start guide"
Digest long content	"Read this 5,000-word report and give me the top 3 takeaways"

How to Set It Up

Nothing to do. Web Scraper is enabled for every employee at hire.

You can ask the employee to manage their own tools, or do it manually:

Select the employee
Click the Tools tab
Find "Web Scraper" in the Actions section

Action	What it does	How
Enable / Disable	Controls whether the employee can use this tool	Toggle the switch
Tool Rules	Custom instructions that guide how the employee uses this specific tool, e.g. "always extract tables as markdown" or "skip navigation content"	Expand the tool, then write your rules in the text field
Delete	Permanently removes the tool from the employee	Click the delete button

Tips & Tricks

Give the direct URL. Not a search results page or redirect link
Ask for specific extraction. "Read this page and pull out only the pricing table" works better than "read this"
Combine with search. "Search for X, then read the top 3 results" lets the employee use both tools together
Long pages get truncated. If you need content past the 30k character limit, ask the employee to focus on a specific section

Behind the Scene


Powered by	Trafilatura
How it works	A 3-step extraction pipeline. Each step only runs if the previous one fails

Step	Method	Handles
1. Trafilatura	Lightweight Python fetch	~90% of pages: articles, docs, blogs, news
2. Browser User-Agent	Fetch with browser headers	Sites that block bots but don't require JavaScript
3. Browser Controller	Real browser rendering	JavaScript-heavy SPAs (requires Desktop companion app)

Most pages resolve on step 1 in under a second.

Web Scraper vs Web Search vs Browser Controller

	Web Scraper	Web Search	Browser Controller
Purpose	Read full content from a specific URL	Find information via search engine	Control a real browser. Navigate, click, fill forms
Input	A URL you already have	A question or topic	Instructions like "go to this site and..."
Output	Full page text, clean, no ads or nav	Titles, snippets, and source links	Screenshots, extracted data, completed actions
Best for	"Read this article for me"	"What's happening with X?"	"Log into this site and download the report"
When to use	You know the exact page	You don't know where to look	You need to interact with a page
Requires	Nothing, built in	Nothing, built in	Desktop companion app

The employee often combines these automatically, searching first to find URLs, then reading the best results in full.

What It Costs


Cost	Runtime credits based on processing time, typically very fast
Rate limits	None from our side, but the target website may block rapid consecutive requests
Truncation	Pages longer than ~~30,000 characters (~~8,000 tokens) are truncated

Is It Safe

Public pages only. The scraper can only access publicly available content. No login-protected or paywalled content is accessible
Results in your chat. Scraped content is summarized in the employee's response, which is saved in your conversation history like any other message
Respects robots.txt. The scraper follows standard web crawling conventions

Good to Know

Clean text only. The scraper strips all HTML, scripts, styles, and navigation. You get readable text, not raw markup
No login-protected content. The scraper can only access publicly available pages. For authenticated content, use Browser Controller with the Desktop companion app
JavaScript-heavy sites. Single-page apps that require JavaScript to render will fall back to Browser Controller if the Desktop companion app is connected. Without it, the employee will let you know the page couldn't be read

Frequently Asked Questions

Q: Can the employee read PDFs from a URL? A: The scraper is optimized for HTML web pages. For PDFs, upload the file directly to the employee's chat. They can read uploaded documents natively.

Q: What happens with pages behind a login? A: The scraper can only access public pages. For authenticated content, use Browser Controller. It uses your actual browser session, so any site you're logged into is accessible.

Q: Does the employee cache scraped pages? A: No. Every scrape fetches the live page, so you always get current content.

Q: Can I scrape multiple pages at once? A: Yes. Give the employee a list of URLs and they'll read each one. They may do this automatically when combining Web Search and Web Scraper.

Q: Why is the content truncated? A: Pages are capped at ~~30,000 characters (~~8,000 tokens) to keep responses fast and costs predictable. If you need the full page, ask the employee to focus on a specific section or split the request.