Skip to content

AiGateway

Source: src/Cloudflare/AiGateway/AiGateway.ts

A Cloudflare AI Gateway for observability, caching, rate limiting, and governance across AI provider requests.

AI Gateway gives your application a stable gateway ID and account-scoped endpoint that can route model requests through Cloudflare. Once bound to a Worker, aiGateway.model({...}) returns an effect/unstable/ai LanguageModel Layer so you use the standard generateText / streamText APIs — provider-agnostic, with caching, rate limiting, retries, and a unified request log handled by the gateway.

Basic gateway

const gateway = yield* Cloudflare.AiGateway("Gateway");

Gateway with caching and rate limiting

const gateway = yield* Cloudflare.AiGateway("Gateway", {
id: "my-gateway",
cacheTtl: 300,
cacheInvalidateOnUpdate: true,
rateLimitingInterval: 60,
rateLimitingLimit: 100,
rateLimitingTechnique: "sliding",
});
const gateway = yield* Cloudflare.AiGateway("Gateway", {
collectLogs: true,
logManagement: 10000,
logManagementStrategy: "STOP_INSERTING",
});

AiGateway.bind(gateway) returns a typed, Effect-native client during the Worker’s Init phase. Provide Cloudflare.AiGatewayBindingLive once at the bottom of the Init layer chain so every bind(...) resolves at runtime.

import * as Cloudflare from "alchemy/Cloudflare";
import * as Effect from "effect/Effect";
import { Gateway } from "./AiGateway.ts";
export default class Api extends Cloudflare.Worker<Api>()(
"Api",
{ main: import.meta.filename },
Effect.gen(function* () {
const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway);
return {
fetch: Effect.gen(function* () {
// …routes
}),
};
}).pipe(Effect.provide(Cloudflare.AiGatewayBindingLive)),
) {}

Call aiGateway.model({...}) with a Workers AI model id. It returns a Layer<LanguageModel, never, RuntimeContext> directly — no API key and no Layer.unwrap, since the binding handles auth and the gateway URL. Build it in the Init phase; construction is pure.

const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway);
const languageModel = aiGateway.model({
client: aiGateway,
model: "@cf/meta/llama-3.1-8b-instruct",
parameters: { temperature: 0.7, maxTokens: 1024 },
});

Provide the languageModel layer to the handler and call LanguageModel.generateText like any other Effect. Effect.orDie collapses AiError to a defect (a 500); use Effect.catchTag("AiError", …) for typed handling instead.

import { LanguageModel } from "effect/unstable/ai";
import * as HttpServerResponse from "effect/unstable/http/HttpServerResponse";
fetch: Effect.gen(function* () {
const response = yield* LanguageModel.generateText({
prompt: "Say hello.",
}).pipe(Effect.orDie);
return yield* HttpServerResponse.json({
text: response.text,
usage: {
inputTokens: response.usage.inputTokens.total,
outputTokens: response.usage.outputTokens.total,
},
});
}).pipe(Effect.provide(languageModel));

LanguageModel.streamText returns a Stream of typed response parts. Stream.provide(languageModel) keeps the model available for the whole stream lifetime; pipe through Sse.encode for an SSE response.

import { LanguageModel } from "effect/unstable/ai";
import * as Stream from "effect/Stream";
import * as Sse from "effect/unstable/encoding/Sse";
import * as HttpServerResponse from "effect/unstable/http/HttpServerResponse";
const stream = LanguageModel.streamText({ prompt }).pipe(
Stream.provide(languageModel),
Sse.encode,
);
return HttpServerResponse.stream(stream, {
headers: {
"content-type": "text/event-stream",
"cache-control": "no-cache",
"x-accel-buffering": "no",
},
});

Every prop maps to an in-place update — no replacement, no downtime.

export const Gateway = Cloudflare.AiGateway("Gateway", {
id: "prod-gateway",
cacheTtl: 300,
cacheInvalidateOnUpdate: true,
rateLimitingInterval: 60,
rateLimitingLimit: 100,
rateLimitingTechnique: "sliding",
collectLogs: true,
logManagement: 100_000,
logManagementStrategy: "DELETE_OLDEST",
authentication: true,
});