Enabling Caching
Caching saves the results of a scraper run. When you run the scraper again with the same inputs, Botasaurus serves the cached result instead of running the scraper again, saving both time and compute resources.
How to Enable Caching​
To enable caching for all scrapers, add Server.enableCache()
to src/scraper/backend/server.ts
file:
import { Server } from "botasaurus-server/server"
Server.enableCache();
How Caching Works​
Botasaurus uses input data to determine whether to use a cached result:
- When you run a scraper, Botasaurus creates a unique "cache key" from the
data
object - If the same
data
values are used in a future run, the cached result is returned instantly - If the
data
values change, the scraper runs again
Example: If you scrape a product page with data: { productId: "123" }
, the result is cached. Running the scraper again with the same productId
returns the cached result without re-running the scraper.
Excluding Fields from the Cache Key​
Some inputs should not affect caching decisions. For example:
- API keys
- Session cookies
- Proxy URLs
These values may change between runs without affecting the scraper's results. Including them in the cache key would cause unnecessary re-runs.
To exclude these fields from caching, mark an input control with isMetadata: true
:
// In your inputs/myScraper.js
.text("api_key", {
label: "API Key",
isMetadata: true,
})
When isMetadata
is set to true
:
- The value moves from the
data
object to themetadata
object - Botasaurus excludes
metadata
when creating cache keys
You can access metadata values in your scraper like this:
const myScraper = task({
name: "myScraper",
run: ({ data, metadata }) => {
// The api_key is now in metadata and won't affect caching
const apiKey = metadata["api_key"];
// ... your scraper logic
}
})
This approach ensures that only essential, data-defining inputs determine caching behavior, making your cache more effective.
Bypassing Cache Control​
When caching is enabled, users see a "Use cache" checkbox in the input form:
This checkbox:
- Defaults to
true
(using cached results if available) - Allows users to bypass the cache for a specific run by unchecking it