Retry Policy from HTTP Responses

Back to Cookbook Open in GitHub

This recipe extends the Basic Example to show how to extract retry information from HTTP response headers and make it available to Temporal's retry mechanisms.

HTTP response codes and headers on API calls have implications for retry behavior. For example, an HTTP 404 Not Found generally represents an application-level error that should not be retried. By contrast, a 500 Internal Server Error is typically transient, so should be retried. Servers can also set the Retry-After header to tell the client when to retry.

This recipe introduces a utility function that processes the HTTP response and populates a Temporal ApplicationError to provide inputs to the retry mechanism. Temporal combines this information with other configuration, such as timeouts and exponential backoff, to implement the complete retry policy.

Generate Temporal ApplicationErrors from HTTP responses

We introduce a utility function that takes an httpx.Response object and returns a Temporal ApplicationError with two key fields populated: non-retryable and next_retry_delay.

The non-retryable is determined by categorizing the HTTP status codes. The X-Should-Retry HTTP response header, when present, overrides the status code.

Example Retryable Status Codes:

408 Request Timeout → Retry because server is unresponsive, which can have many causes
409 Conflict → Retry when resource is temporarily locked or in use
429 Too Many Requests → Retry after rate limit cooldown (respect Retry-After header when available)
500 Internal Server Error → Retry for temporary server issues
502 Bad Gateway → Retry when upstream server is temporarily unavailable
503 Service Unavailable → Retry when service is temporarily overloaded
504 Gateway Timeout → Retry when upstream server times out

Example Non-Retryable Status Codes:

400 Bad Request → Do not retry - fix request format/parameters
401 Unauthorized → Do not retry - provide valid authentication
403 Forbidden → Do not retry - insufficient permissions
404 Not Found → Do not retry - resource does not exist
422 Unprocessable Entity → Do not retry - fix request validation errors
Other 4xx Client Errors → Do not retry - client-side issues need fixing
2xx Success → Do not expect to see this - call succeeded
3xx Redirects → Do not expect to see this - typically handled by httpx (with follow_redirects=True)

If the error is retryable and if the Retry-After header is present, we parse it to set the retry delay.

This implementation duplicates logic present in the OpenAI Python API Library, where it is part of the code generated by Stainless. Duplicating the logic makes sense because it is not accessible via the public library interface and because it applies to HTTP APIs in general, not just the OpenAI API.

File: util/http_retry.py

import email.utils
import time
from datetime import timedelta
from temporalio.exceptions import ApplicationError
from temporalio import workflow
from typing import Optional, Tuple

with workflow.unsafe.imports_passed_through():
    from httpx import Response, Headers


# Adapted from the OpenAI Python client (https://github.com/openai/openai-python/blob/main/src/openai/_base_client.py)
# which is generated by the Stainless SDK Generator.
def _parse_retry_after_header(response_headers: Optional[Headers] = None) -> float | None:
    """Returns a float of the number of seconds (not milliseconds) to wait after retrying, or None if unspecified.

    About the Retry-After header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After
    See also  https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After#syntax
    """
    if response_headers is None:
        return None

    # First, try the non-standard `retry-after-ms` header for milliseconds,
    # which is more precise than integer-seconds `retry-after`
    try:
        retry_ms_header = response_headers.get("retry-after-ms", None)
        return float(retry_ms_header) / 1000
    except (TypeError, ValueError):
        pass

    # Next, try parsing `retry-after` header as seconds (allowing nonstandard floats).
    retry_header = response_headers.get("retry-after")
    try:
        # note: the spec indicates that this should only ever be an integer
        # but if someone sends a float there's no reason for us to not respect it
        return float(retry_header)
    except (TypeError, ValueError):
        pass

    # Last, try parsing `retry-after` as a date.
    retry_date_tuple = email.utils.parsedate_tz(retry_header)
    if retry_date_tuple is None:
        return None

    retry_date = email.utils.mktime_tz(retry_date_tuple)
    return float(retry_date - time.time())

def _should_retry(response: Response) -> Tuple[bool, str]:
    # Note: this is not a standard header
    should_retry_header = response.headers.get("x-should-retry")

    # If the server explicitly says whether or not to retry, obey.
    if should_retry_header == "true":
        return True, f"Server requested retry via x-should-retry=true header (HTTP {response.status_code})"
    if should_retry_header == "false":
        return False, f"Server prevented retry via x-should-retry=false header (HTTP {response.status_code})"

    # Retry on request timeouts.
    if response.status_code == 408:
        return True, f"HTTP request timeout ({response.status_code}), will retry with backoff"

    # Retry on lock timeouts.
    if response.status_code == 409:
        return True, f"HTTP conflict/lock timeout ({response.status_code}), will retry with backoff"

    # Retry on rate limits.
    if response.status_code == 429:
        return True, f"HTTP rate limit exceeded ({response.status_code}), will retry with backoff"

    # Retry internal errors.
    if response.status_code >= 500:
        return True, f"HTTP server error ({response.status_code}), will retry with backoff"

    return False, f"HTTP client error ({response.status_code}), not retrying - check your request"


def http_response_to_application_error(response: Response) -> ApplicationError:
    """Transform HTTP response into Temporal ApplicationError for retry handling.

    This function implements generic HTTP retry logic based on status codes and headers.

    Args:
        response: The httpx.Response from a failed HTTP request

    Returns:
        ApplicationError: Always returns an ApplicationError configured for Temporal's retry system:
            - non_retryable: False for retryable errors, True for non-retryable
            - next_retry_delay: Server-provided delay hint (if valid)

    Note:
        Even when x-should-retry=true, this function returns an ApplicationError with
        non_retryable=False rather than raising an exception, for cleaner functional style.
    """
    should_retry, retry_message = _should_retry(response)
    if should_retry:
        # Calculate the retry delay only when retrying
        retry_after = _parse_retry_after_header(response.headers)
        # Make sure that the retry delay is in a reasonable range
        if retry_after is not None and 0 < retry_after <= 60:
            retry_after = timedelta(seconds=retry_after)
        else:
            retry_after = None

        # Add delay info for rate limits
        if response.status_code == 429 and retry_after is not None:
            retry_message = f"HTTP rate limit exceeded (429) (server requested {retry_after.total_seconds():.1f}s delay), will retry with backoff"

        return ApplicationError(
            retry_message,
            non_retryable=False,
            next_retry_delay=retry_after,
        )
    else:
        return ApplicationError(
            retry_message,
            non_retryable=True,
            next_retry_delay=None,
        )

Raise the exception from the Activity

When API calls fail, the OpenAI Client raises an APIStatusError exception which contains a response field, containing the underlying httpx.Response object. We use the http_response_to_application_error function defined above to translate this to a Temporal ApplicationError, which we re-throw to pass the retry information to Temporal.

File: activities/openai_responses.py

from temporalio import activity
from openai import AsyncOpenAI
from openai.types.responses import Response
from dataclasses import dataclass
from util.http_retry import http_response_to_application_error
from openai import APIStatusError

# Temporal best practice: Create a data structure to hold the request parameters.
@dataclass
class OpenAIResponsesRequest:
    model: str
    instructions: str
    input: str

@activity.defn
async def create(request: OpenAIResponsesRequest) -> Response:
    # Temporal best practice: Disable retry logic in OpenAI API client library.
    client = AsyncOpenAI(max_retries=0)

    try:
        resp = await client.responses.create(
            model=request.model,
            instructions=request.instructions,
            input=request.input,
            timeout=15,
        )
        return resp
    except APIStatusError as e:
        raise http_response_to_application_error(e.response) from e

Running

Start the Temporal Dev Server:

temporal server start-dev

Run the worker:

uv run python -m worker

Start execution:

uv run python -m start_workflow

Generate Temporal ApplicationErrors from HTTP responses​

Raise the exception from the Activity​

Running​

Generate Temporal ApplicationErrors from HTTP responses

Raise the exception from the Activity

Running