To cache responses with the requests library in Python, you'll need to use an additional caching library that works alongside requests. One popular choice is requests-cache, which provides a simple and effective way to cache responses automatically with minimal setup.
Here's how you can use requests-cache:
- Install
requests-cacheusingpip:
pip install requests-cache
- Use
requests-cachein your code to enable caching for yourrequestscalls:
import requests
import requests_cache
# Install the cache; 'cache_name' is the name of the SQLite database file
requests_cache.install_cache('cache_name', expire_after=180) # Cache for 180 seconds
# Make a request
response = requests.get('https://example.com')
# Check if this request was fetched from the cache
print(response.from_cache)
In the code above, expire_after parameter defines the number of seconds you want to keep the response in the cache. You can also set it to None for an infinite cache or use a timedelta for more complex time intervals.
Here are some additional options you can use with requests-cache:
- Backends: By default,
requests-cacheuses a SQLite backend, but you can also use others like Redis, MongoDB, or a simple in-memory store. - Session: Instead of installing the cache globally, you can create a custom session:
session = requests_cache.CachedSession('cache_name', expire_after=180)
response = session.get('https://example.com')
- Conditions: You can customize which requests are cached and which are not based on conditions:
def my_custom_cache_condition(request):
# Only cache requests to a specific domain
return 'example.com' in request.url
session = requests_cache.CachedSession('cache_name', expire_after=180)
session.cache.allowable_codes = (200,)
session.cache.allowable_methods = ('GET', 'POST')
session.cache.allowable_conditions = my_custom_cache_condition
response = session.get('https://example.com')
- Clearing the cache: You can clear the cache for specific URLs or the entire cache:
# Clear the cache for a specific URL
requests_cache.clear('https://example.com')
# Clear the entire cache
requests_cache.clear()
- Serialization: You can choose the serialization method for your cache (like JSON or BSON) by specifying the
serializerparameter when creating aCachedSession.
Remember to always respect the terms of service of the websites you're scraping and the legal implications of web scraping and caching.
For more advanced caching strategies, you might consider implementing your own caching logic using Python's built-in libraries like pickle for serialization and shelve for a simple persistent storage. However, for most use cases, requests-cache should suffice.