Sistava

Scrape Any Website

Your employees can read the full content of any public webpage. Just give them a URL and they'll extract clean, readable text.

TL;DR

Every employee can read the full content of any public webpage. Paste a URL and they extract clean, readable text with no ads or clutter. No setup required.

How It Works

The employee fetches the page, strips boilerplate (headers, footers, ads, navigation), and returns clean text you can work with.

What It Can Do

Capability Example
Read articles "Read this blog post and summarize the key points"
Extract data "Pull the pricing from this page"
Research competitors "Read their features page and compare to ours"
Process documentation "Read this API doc and write me a quick-start guide"
Digest long content "Read this 5,000-word report and give me the top 3 takeaways"

How to Set It Up

Nothing to do. Web Scraper is enabled for every employee at hire.

You can ask the employee to manage their own tools, or do it manually:

  1. Select the employee
  2. Click the Tools tab
  3. Find "Web Scraper" in the Actions section
Action What it does How
Enable / Disable Controls whether the employee can use this tool Toggle the switch
Tool Rules Custom instructions that guide how the employee uses this specific tool, e.g. "always extract tables as markdown" or "skip navigation content" Expand the tool, then write your rules in the text field
Delete Permanently removes the tool from the employee Click the delete button

Tips & Tricks

Behind the Scene

Powered by Trafilatura
How it works A 3-step extraction pipeline. Each step only runs if the previous one fails
Step Method Handles
1. Trafilatura Lightweight Python fetch ~90% of pages: articles, docs, blogs, news
2. Browser User-Agent Fetch with browser headers Sites that block bots but don't require JavaScript
3. Browser Controller Real browser rendering JavaScript-heavy SPAs (requires Desktop companion app)

Most pages resolve on step 1 in under a second.

Web Scraper vs Web Search vs Browser Controller

Web Scraper Web Search Browser Controller
Purpose Read full content from a specific URL Find information via search engine Control a real browser. Navigate, click, fill forms
Input A URL you already have A question or topic Instructions like "go to this site and..."
Output Full page text, clean, no ads or nav Titles, snippets, and source links Screenshots, extracted data, completed actions
Best for "Read this article for me" "What's happening with X?" "Log into this site and download the report"
When to use You know the exact page You don't know where to look You need to interact with a page
Requires Nothing, built in Nothing, built in Desktop companion app

The employee often combines these automatically, searching first to find URLs, then reading the best results in full.

What It Costs

Cost Runtime credits based on processing time, typically very fast
Rate limits None from our side, but the target website may block rapid consecutive requests
Truncation Pages longer than 30,000 characters (8,000 tokens) are truncated

Is It Safe

Good to Know

Frequently Asked Questions

Q: Can the employee read PDFs from a URL? A: The scraper is optimized for HTML web pages. For PDFs, upload the file directly to the employee's chat. They can read uploaded documents natively.

Q: What happens with pages behind a login? A: The scraper can only access public pages. For authenticated content, use Browser Controller. It uses your actual browser session, so any site you're logged into is accessible.

Q: Does the employee cache scraped pages? A: No. Every scrape fetches the live page, so you always get current content.

Q: Can I scrape multiple pages at once? A: Yes. Give the employee a list of URLs and they'll read each one. They may do this automatically when combining Web Search and Web Scraper.

Q: Why is the content truncated? A: Pages are capped at 30,000 characters (8,000 tokens) to keep responses fast and costs predictable. If you need the full page, ask the employee to focus on a specific section or split the request.