Algorithms

Algorithm	Accuracy	Speed	Burst handling	Complexity	Use case
`sliding_window` (default)	High	Fast	Smooth	Medium	General APIs
`fixed_window`	Medium	Fastest	2× at boundary	Low	High-throughput coarse limits
`token_bucket`	High	Fast	Absorbs bursts	Medium	Bursty traffic
`leaky_bucket`	High	Fast	Flattens bursts	Medium	Traffic shaping

Diagrams source: GeeksForGeeks — Rate Limiting Algorithms

Choosing an algorithm

Traffic pattern

The most important factor. Ask: is your traffic steady or bursty?

Traffic type	Recommended
Steady, predictable flow	`fixed_window` or `sliding_window`
Occasional bursts that should be absorbed	`token_bucket`
Bursts that must be flattened to a constant rate	`leaky_bucket`
Fluctuating traffic with strict accuracy requirements	`sliding_window`

Implementation complexity vs. control

fixed_window is the simplest to reason about. Use it when the boundary burst caveat is acceptable and you need maximum throughput.
sliding_window adds weighted counting to eliminate boundary bursts with minimal extra cost — a good default for most APIs.
token_bucket and leaky_bucket require tracking a continuous state (token count / water level) and are better suited when you need fine-grained burst control.

Performance and scalability

All four algorithms in Limitra use atomic Redis Lua scripts — they scale horizontally without race conditions. For very high-throughput systems where every millisecond matters, prefer fixed_window: it executes a single Redis INCR and EXPIRE, which is the cheapest possible operation.

Handling bursts

token_bucket — best for absorbing bursts. Clients can send a burst up to the full bucket capacity instantly, then are throttled to the fill rate. Ideal for APIs where occasional spikes are expected and acceptable.
leaky_bucket — best for flattening bursts. No matter how fast requests arrive, the output rate stays constant. Use when the downstream system cannot tolerate spikes.
sliding_window — handles fluctuating traffic accurately without allowing the 2× boundary burst. A good middle ground when bursts should be dampened but not fully flattened.
Hybrid — combine algorithms when you need both steady-state control and burst tolerance. For example, use token_bucket per user with a global fixed_window to cap total throughput:

from limitra import LimitraConfig, rate_limit

LimitraConfig(redis_url="redis://localhost:6379", project="my-service")

# Per-user: allows short bursts, throttles long-term
@rate_limit(limits=[(20, 1), (500, 3600)], algorithm="token_bucket", key="user_id")
def api(user_id: str):
    return {"data": "..."}

Sliding window

The window slides continuously with time. On each incoming request, Limitra computes a weighted count: the full count from the current slot plus a fraction of the previous slot proportional to how much of it still overlaps the window. This prevents the boundary burst problem of fixed windows.

Sliding Window Algorithm

When to use

Default choice for most APIs. Best when traffic fluctuates and you need accurate rate control without boundary bursts. Slightly more complex than fixed window but the accuracy gain is worth it at any scale.

from limitra import LimitraConfig, rate_limit

LimitraConfig(redis_url="redis://localhost:6379", project="my-service")

@rate_limit(requests=100, window=60, algorithm="sliding_window", key="user_id")
def api(user_id: str):
    return {"data": "..."}

Fixed window

Time is divided into discrete slots of equal length. Each slot has its own counter, reset to zero at the start of the next slot.

Fixed Window Algorithm

Boundary burst

A client can send requests calls at the very end of one slot and requests again at the very start of the next — effectively 2× the limit in a short period.

When to use

Best for high-throughput systems where simplicity and speed matter more than boundary precision — e.g. background jobs, bulk processing, or coarse global limits. The boundary burst is rarely a problem in practice for internal services.

from limitra import LimitraConfig, rate_limit

LimitraConfig(redis_url="redis://localhost:6379", project="my-service")

@rate_limit(requests=1000, window=60, algorithm="fixed_window", key="user_id")
def high_throughput_api(user_id: str):
    return {"data": "..."}

Token bucket

A bucket holds up to requests tokens. Tokens are added continuously at requests / window per second, up to the bucket capacity. Each request consumes one token. When the bucket is empty, the request is rejected.

Token Bucket Algorithm

This allows short bursts (up to the full bucket capacity) while enforcing a long-term average rate equal to the fill rate.

When to use

Best when clients have bursty but otherwise well-behaved traffic — e.g. a mobile app that sends several requests in quick succession, then goes quiet. The burst is absorbed without rejection; sustained overuse is throttled.

from limitra import LimitraConfig, rate_limit

LimitraConfig(redis_url="redis://localhost:6379", project="my-service")

# Allows a burst of up to 20 requests, then throttles to 20/60 req/s
@rate_limit(requests=20, window=60, algorithm="token_bucket", key="user_id")
def bursty_api(user_id: str):
    return {"data": "..."}

Leaky bucket

Requests fill a bucket like water into a container. The bucket drains at a fixed rate of requests / window per second regardless of input. If the bucket is full, incoming requests are dropped.

Leaky Bucket Algorithm

Unlike token bucket, the output rate is always constant — bursts are absorbed but never result in a faster drain.

When to use

Best for traffic shaping in front of a downstream service that cannot handle spikes — e.g. a payment processor, a third-party API with strict rate limits, or a database that performs poorly under bursty load. Guarantees a smooth, predictable request stream regardless of what comes in.

from limitra import LimitraConfig, rate_limit

LimitraConfig(redis_url="redis://localhost:6379", project="my-service")

# Smooths traffic to a constant 100/60 req/s regardless of bursts
@rate_limit(requests=100, window=60, algorithm="leaky_bucket", key="user_id")
def smooth_api(user_id: str):
    return {"data": "..."}

Direct use

Use the limiter classes directly without the decorator when you need lower-level control.

import redis
from limitra import SlidingWindowRateLimiter

client = redis.from_url("redis://localhost:6379")

limiter = SlidingWindowRateLimiter(
    capacity=100,
    fill_rate=1/60,
    scope="user",
    backend="redis",
    redis_client=client,
    project="my-svc",
)

user_id = "alice"

if limiter.allow_request(user_id):
    print("Request allowed")
else:
    status = limiter.get_status(user_id)
    wait = limiter.get_wait_time(user_id)
    print(f"Rate limited — {status['count']}/{status['capacity']} used, retry in {wait:.1f}s")

get_usage() returns a normalised dict across all algorithms:

limiter.get_usage("alice")
# {"count": 42, "limit": 100, "remaining": 58, "limited": False}

Available classes: TokenBucketLimiter, LeakyBucketLimiter, FixedWindowRateLimiter, SlidingWindowRateLimiter.