Rate Limiting & The Token Bucket System
Last Updated: 2026-01-24
To ensure platform stability and fair usage for all developers, the Abba Baba API uses a Token Bucket system for rate limiting. This page explains how it works and how your agent should interact with it gracefully.
What is the Token Bucket System?
Think of a bucket that is constantly being filled with water (tokens) at a steady rate. Every time you make an API call, you take some water out of the bucket. If the bucket is empty, you must wait for it to refill before you can take more.
This system is defined by three key concepts:
- Token Cost: Every API call has a "cost" in tokens, which is deducted from your bucket. Simple calls cost less, while complex ones cost more.
- Bucket Capacity: The maximum number of tokens your bucket can hold. This allows your agent to save up tokens to handle sudden bursts of requests.
- Refill Rate: The number of tokens added back to your bucket every second. This determines your agent's sustained request throughput over time.
You can see the different capacity and refill rates on our Pricing Page.
Handling 429 Too Many Requests Errors
If your agent makes requests faster than its refill rate and empties its token bucket, you will receive an HTTP 429 Too Many Requests error.
Your agent must be designed to handle these errors gracefully. The best practice is to implement a retry with exponential backoff strategy.
Example: Exponential Backoff in Python
This code demonstrates how to catch a 429 error and automatically wait before retrying the request.
import requests
import os
import time
def search_with_backoff(query: str, max_retries: int = 5):
api_key = os.getenv("ABA_API_KEY")
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
payload = {"query": query, "limit": 1}
url = "https://api.abbababa.com/v1/search"
# Start with a 1-second delay, which will increase exponentially
delay = 1
for attempt in range(max_retries):
try:
response = requests.post(url, headers=headers, json=payload, timeout=10)
# If we get a 429 error, raise an exception to trigger the retry logic
if response.status_code == 429:
print(f"Attempt {attempt + 1}: Rate limit exceeded. Waiting for {delay} seconds...")
time.sleep(delay)
delay *= 2 # Double the delay for the next potential retry
continue
response.raise_for_status() # Raise exceptions for other bad responses (4xx/5xx)
# If the request was successful, return the data
print(f"Attempt {attempt + 1}: Success!")
return response.json()
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
# For other errors, you might want to break or use a different retry strategy
break
print("Max retries exceeded. Request failed.")
return None
if __name__ == "__main__":
search_with_backoff("find me a great gift")
Do Not Hammer the API!
Continuously sending requests after receiving a 429 error without a backoff delay is considered abusive behavior and may result in temporary suspension of your API key.
Checking Your Rate Limit Status in Headers
Every API response includes headers that provide real-time information about your agent's current rate limit status. Your agent can use these headers to intelligently manage its request frequency.
X-RateLimit-Limit: The total capacity of your token bucket.X-RateLimit-Remaining: The number of tokens currently remaining in your bucket.X-RateLimit-Reset: The approximate time in UTC seconds until your bucket is fully refilled.
By inspecting the X-RateLimit-Remaining header, your agent can proactively slow down its requests as it approaches its limit, avoiding 429 errors altogether.