Scrapingbee ScrapeUrlTool

This is a versatile tool that can fetch web pages or files, with features like javascript scenario, data extraction, screenshot, ai query, ai extraction and so on.

Overview

Integration details

Class	Package	Serializable	JS support	Package latest
ScrapeUrlTool	langchain-scrapingbee	✅	❌

Setup

pip install -U langchain-scrapingbee

Credentials

You should configure credentials by setting the following environment variables:

SCRAPINGBEE_API_KEY

import getpass
import os

# if not os.environ.get("SCRAPINGBEE_API_KEY"):
#     os.environ["SCRAPINGBEE_API_KEY"] = getpass.getpass("SCRAPINGBEE API key:\n")

Instantiation

All of the ScrapingBee tools only require the API Key during instantiation. If not set up in environment vairable, you can provide it directly here.

Here we show how to instantiate an instance of the Scrapingbee tools:

from langchain_scrapingbee import ScrapeUrlTool

scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

Invocation

Invoke directly with args

This tool accepts url (string) and params (dictionary) as argument. The url argument is necessary, and the params argument is optional. You can use params argument to customise the request. For example, to disable JavaScript Rendering, you can use the following as params:

{'render_js': False}

For a complete list of acceptable parameters, please visit the HTML API documentation.

scrape_tool.invoke({"url": "http://httpbin.org/html"})

scrape_tool.invoke(
    {
        "url": "https://treaties.un.org/doc/publication/ctc/uncharter.pdf",
        "params": {"render_js": False},
    }
)

Use within an agent

import os
from langchain_scrapingbee import ScrapeUrlTool
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

if not os.environ.get("GOOGLE_API_KEY") or not os.environ.get("SCRAPINGBEE_API_KEY"):
    raise ValueError(
        "Google and ScrapingBee API keys must be set in environment variables."
    )

llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")
scrapingbee_api_key = os.environ.get("SCRAPINGBEE_API_KEY")

scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

agent = create_react_agent(llm, [scrape_tool])

user_input = "Capture the full page screenshot of https://www.langchain.com/"

# Stream the agent's output step-by-step
for step in agent.stream(
    {"messages": user_input},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

API Reference:create_react_agent

API reference

For detailed documentation of ScrapingBee's HTML API features and configurations head to the API reference:

Tool conceptual guide
Tool how-to guides

Overview​

Integration details​

Setup​

Credentials​

Instantiation​

Invocation​

Invoke directly with args​

Use within an agent​

API reference​

Related​