Scrapingbee ScrapeUrlTool
This is a versatile tool that can fetch web pages or files, with features like javascript scenario, data extraction, screenshot, ai query, ai extraction and so on.
Overviewโ
Integration detailsโ
Class | Package | Serializable | JS support | Package latest |
---|---|---|---|---|
ScrapeUrlTool | langchain-scrapingbee | โ | โ |
Setupโ
pip install -U langchain-scrapingbee
Credentialsโ
You should configure credentials by setting the following environment variables:
- SCRAPINGBEE_API_KEY
import getpass
import os
# if not os.environ.get("SCRAPINGBEE_API_KEY"):
# os.environ["SCRAPINGBEE_API_KEY"] = getpass.getpass("SCRAPINGBEE API key:\n")
Instantiationโ
All of the ScrapingBee tools only require the API Key during instantiation. If not set up in environment vairable, you can provide it directly here.
Here we show how to instantiate an instance of the Scrapingbee tools:
from langchain_scrapingbee import ScrapeUrlTool
scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
Invocationโ
Invoke directly with argsโ
This tool accepts url
(string) and params
(dictionary) as argument. The url
argument is necessary, and the params
argument is optional. You can use params
argument to customise the request. For example, to disable JavaScript Rendering, you can use the following as params
:
{'render_js': False}
For a complete list of acceptable parameters, please visit the HTML API documentation.
scrape_tool.invoke({"url": "http://httpbin.org/html"})
scrape_tool.invoke(
{
"url": "https://treaties.un.org/doc/publication/ctc/uncharter.pdf",
"params": {"render_js": False},
}
)
Use within an agentโ
import os
from langchain_scrapingbee import ScrapeUrlTool
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent
if not os.environ.get("GOOGLE_API_KEY") or not os.environ.get("SCRAPINGBEE_API_KEY"):
raise ValueError(
"Google and ScrapingBee API keys must be set in environment variables."
)
llm = ChatGoogleGenerativeAI(temperature=0, model="gemini-2.5-flash")
scrapingbee_api_key = os.environ.get("SCRAPINGBEE_API_KEY")
scrape_tool = ScrapeUrlTool(api_key=os.environ.get("SCRAPINGBEE_API_KEY"))
agent = create_react_agent(llm, [scrape_tool])
user_input = "Capture the full page screenshot of https://www.langchain.com/"
# Stream the agent's output step-by-step
for step in agent.stream(
{"messages": user_input},
stream_mode="values",
):
step["messages"][-1].pretty_print()
API referenceโ
For detailed documentation of ScrapingBee's HTML API features and configurations head to the API reference:
Relatedโ
- Tool conceptual guide
- Tool how-to guides