How Claude Code Rate Limits Work

Updated March 2026

Claude Code doesn't have an obvious rate limit dashboard. When you hit the limit, you get a "please wait" message and your session pauses. There's no countdown, no percentage, no warning before it happens. This guide explains how the limits actually work so you can plan around them.

Two rate limit windows

Anthropic enforces two separate rate limit windows on Claude Code usage:

5-hour burst window — limits how much you can use in a short period. This is the one most people hit. It resets on a rolling basis: usage from 5 hours ago drops off as time passes.
7-day sustained window — limits total usage over a week. Harder to hit, but if you're a heavy user on a lower tier, this can cap you before the week is up.

Both windows are rolling, not fixed. There's no "reset at midnight" — instead, your oldest usage continuously falls out of the window as time passes. If you used a lot of tokens 4 hours and 50 minutes ago, those will drop off in 10 minutes.

What counts toward the limit

Rate limits are based on token throughput — the total number of tokens processed in your requests and responses. This includes:

Input tokens — your messages, file contents Claude reads, tool results
Output tokens — Claude's responses, code it writes, tool calls it makes
Cache tokens — tokens read from or written to the prompt cache

A single complex coding task can burn through a large chunk of your limit. For example, if Claude reads several large files, reasons about them, then writes a multi-file refactor — that could be 50,000+ tokens in one turn.

Token usage is also tied to how you manage your context window. A bloated context means more input tokens per turn, which burns through your rate limit faster. Keeping context lean is one of the best ways to stretch your limit.

Limits by plan tier

The exact token limits aren't published by Anthropic, but the relative capacity differs significantly between plans:

Plan	Price	Relative capacity	Best for
Pro	$20/mo	Base (1x)	Light to moderate daily usage
Max 5x	$100/mo	~5x Pro	Full-time development, multiple projects
Max 20x	$200/mo	~20x Pro	Heavy usage, teams, all-day sessions

Pro users typically hit the 5-hour window after 2-4 hours of active coding, depending on task complexity. Max 5x users can usually work a full day without issues. Max 20x is designed for users who'd otherwise be on the API.

What happens when you hit the limit

When you reach either limit, Claude Code pauses your session. You'll see a message indicating you've been rate limited. The behavior:

Your current session stays alive — you don't lose context
You can't send new messages until capacity frees up
The wait depends on how much of your window is consumed. It could be minutes or hours
There's no countdown or ETA shown by Claude Code itself

Why the lack of visibility is a problem

Without knowing how close you are to the limit, you can't make informed decisions. Should you start a large refactoring task at 4pm, or will you get blocked halfway through? Is it worth switching to a simpler task to conserve capacity? Claude Code doesn't give you the information to answer these questions.

How to pace yourself

Until you have tooling to track your usage, here are practical strategies:

Be intentional about what Claude reads

Every file read costs tokens. Instead of letting Claude explore your codebase freely, point it at specific files. Use @file references deliberately rather than asking broad questions that trigger many file reads.

Batch related tasks

Switching between unrelated tasks means Claude needs to re-read context for each one. Grouping related changes together is more token-efficient.

Start fresh sessions for new topics

Long sessions accumulate stale context that Claude scans on every turn. A fresh session with a focused prompt is cheaper than continuing a bloated one.

Use /compact before you're forced to

Running /compact manually when your context is getting large lets you control what gets summarized. Waiting for auto-compaction means Claude decides what to keep. See our context health guide for more on managing your context window.

Monitor your usage

This is exactly why AI Battery exists. It parses Claude Code's local log files to show you real-time utilization percentages, reset countdowns, and time-to-limit predictions for both the 5-hour and 7-day windows.

Track your rate limits in real time

AI Battery shows your 5-hour and 7-day usage, reset countdowns, and time-to-limit predictions — right in your macOS menu bar.

Download AI Battery — Free

How AI Battery tracks rate limits

AI Battery reads Claude Code's local JSONL conversation logs — the same files stored on your machine at ~/.claude/projects/. It extracts token counts and timestamps from each conversation turn, then calculates:

Current utilization — what percentage of your 5-hour and 7-day windows you've used
Reset countdown — when the oldest usage in each window will expire
Time-to-limit — at your current pace, when you'll hit the limit
Per-model breakdown — token usage split by model (Opus, Sonnet, Haiku)

Everything runs locally. AI Battery never makes network requests — it just reads log files on your machine. No account required, no telemetry, no data leaves your computer.

Understanding the indicators

AI Battery uses a traffic-light system for both rate limits and context health:

Green (under 60%) — plenty of headroom. Work freely.
Orange (60-80%) — approaching the limit. Consider pacing or wrapping up large tasks.
Red (over 80%) — close to being rate limited. Finish what you're doing and let usage decay.

Common questions

Do rate limits reset at a specific time?

No. Both windows are rolling. Your usage from exactly 5 hours ago (or 7 days ago) continuously drops off. There's no midnight reset or fixed schedule.

Does the model I use matter?

Yes. Different models have different token costs. Opus uses more capacity per token than Sonnet or Haiku. Claude Code automatically selects models, but heavier tasks tend to use more capable (and more expensive) models.

Can I check my limits without AI Battery?

Not easily. Claude Code doesn't expose usage metrics. You could manually parse the JSONL logs, but that's what AI Battery automates for you.

Do cached tokens count toward rate limits?

Cache read tokens count at a reduced rate. Cache write tokens count at full rate. This means subsequent turns in the same conversation are somewhat cheaper than the first turn, because prior context gets cached.

What if I have multiple Claude Code sessions open?

All sessions share the same rate limit pool for your account. Opening multiple sessions doesn't give you more capacity — it just splits your limit across them. AI Battery tracks all sessions and shows your aggregate usage.