Skip to main content
Open In ColabOpen on GitHub

Scrapingbee ScrapeUrlTool

This is a versatile tool that can fetch web pages or files, with features like javascript scenario, data extraction, screenshot, ai query, ai extraction and so on.

Overviewโ€‹

Integration detailsโ€‹

ClassPackageSerializableJS supportPackage latest
ScrapeUrlToollangchain-scrapingbeeโœ…โŒPyPI - Version

Setupโ€‹

pip install -U langchain-scrapingbee

Credentialsโ€‹

You should configure credentials by setting the following environment variables:

  • SCRAPINGBEE_API_KEY
import getpass
import os

# if not os.environ.get("SCRAPINGBEE_API_KEY"):
# os.environ["SCRAPINGBEE_API_KEY"] = getpass.getpass("SCRAPINGBEE API key:\n")

Instantiationโ€‹

All of the ScrapingBee tools only require the API Key during instantiation. If not set up in environment vairable, you can provide it directly here.

Here we show how to instantiate an instance of the Scrapingbee tools:

from langchain_scrapingbee import ScrapeUrlTool

scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

Invocationโ€‹

Invoke directly with argsโ€‹

This tool accepts url (string) and params (dictionary) as argument. The url argument is necessary, and the params argument is optional. You can use params argument to customise the request. For example, to disable JavaScript Rendering, you can use the following as params:

{'render_js': False}

For a complete list of acceptable parameters, please visit the HTML API documentation.

scrape_tool.invoke({"url": "http://httpbin.org/html"})

scrape_tool.invoke(
{
"url": "https://treaties.un.org/doc/publication/ctc/uncharter.pdf",
"params": {"render_js": False},
}
)

Use within an agentโ€‹

import os
from langchain_scrapingbee import ScrapeUrlTool
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent

if not os.environ.get("GOOGLE_API_KEY") or not os.environ.get("SCRAPINGBEE_API_KEY"):
raise ValueError(
"Google and ScrapingBee API keys must be set in environment variables."
)

llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")
scrapingbee_api_key = os.environ.get("SCRAPINGBEE_API_KEY")

scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))

agent = create_react_agent(llm, [scrape_tool])

user_input = "Capture the full page screenshot of https://www.langchain.com/"

# Stream the agent's output step-by-step
for step in agent.stream(
{"messages": user_input},
stream_mode="values",
):
step["messages"][-1].pretty_print()
API Reference:create_react_agent

API referenceโ€‹

For detailed documentation of ScrapingBee's HTML API features and configurations head to the API reference: