AiGateway
Source:
src/Cloudflare/AiGateway/AiGateway.ts
A Cloudflare AI Gateway for observability, caching, rate limiting, and governance across AI provider requests.
AI Gateway gives your application a stable gateway ID and account-scoped
endpoint that can route model requests through Cloudflare. Once bound to a
Worker, aiGateway.model({...}) returns an effect/unstable/ai
LanguageModel Layer so you use the standard generateText / streamText
APIs — provider-agnostic, with caching, rate limiting, retries, and a
unified request log handled by the gateway.
Creating a Gateway
Section titled “Creating a Gateway”Basic gateway
const gateway = yield* Cloudflare.AiGateway("Gateway");Gateway with caching and rate limiting
const gateway = yield* Cloudflare.AiGateway("Gateway", { id: "my-gateway", cacheTtl: 300, cacheInvalidateOnUpdate: true, rateLimitingInterval: 60, rateLimitingLimit: 100, rateLimitingTechnique: "sliding",});Logging
Section titled “Logging”const gateway = yield* Cloudflare.AiGateway("Gateway", { collectLogs: true, logManagement: 10000, logManagementStrategy: "STOP_INSERTING",});Binding into a Worker
Section titled “Binding into a Worker”AiGateway.bind(gateway) returns a typed, Effect-native client during the
Worker’s Init phase. Provide Cloudflare.AiGatewayBindingLive once at the
bottom of the Init layer chain so every bind(...) resolves at runtime.
import * as Cloudflare from "alchemy/Cloudflare";import * as Effect from "effect/Effect";import { Gateway } from "./AiGateway.ts";
export default class Api extends Cloudflare.Worker<Api>()( "Api", { main: import.meta.filename }, Effect.gen(function* () { const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway);
return { fetch: Effect.gen(function* () { // …routes }), }; }).pipe(Effect.provide(Cloudflare.AiGatewayBindingLive)),) {}Building a LanguageModel
Section titled “Building a LanguageModel”Call aiGateway.model({...}) with a Workers AI model id. It returns a
Layer<LanguageModel, never, RuntimeContext> directly — no API key and no
Layer.unwrap, since the binding handles auth and the gateway URL. Build it
in the Init phase; construction is pure.
const aiGateway = yield* Cloudflare.AiGateway.bind(Gateway);
const languageModel = aiGateway.model({ client: aiGateway, model: "@cf/meta/llama-3.1-8b-instruct", parameters: { temperature: 0.7, maxTokens: 1024 },});Generating Text
Section titled “Generating Text”Provide the languageModel layer to the handler and call
LanguageModel.generateText like any other Effect. Effect.orDie collapses
AiError to a defect (a 500); use Effect.catchTag("AiError", …) for typed
handling instead.
import { LanguageModel } from "effect/unstable/ai";import * as HttpServerResponse from "effect/unstable/http/HttpServerResponse";
fetch: Effect.gen(function* () { const response = yield* LanguageModel.generateText({ prompt: "Say hello.", }).pipe(Effect.orDie); return yield* HttpServerResponse.json({ text: response.text, usage: { inputTokens: response.usage.inputTokens.total, outputTokens: response.usage.outputTokens.total, }, });}).pipe(Effect.provide(languageModel));Streaming Text
Section titled “Streaming Text”LanguageModel.streamText returns a Stream of typed response parts.
Stream.provide(languageModel) keeps the model available for the whole
stream lifetime; pipe through Sse.encode for an SSE response.
import { LanguageModel } from "effect/unstable/ai";import * as Stream from "effect/Stream";import * as Sse from "effect/unstable/encoding/Sse";import * as HttpServerResponse from "effect/unstable/http/HttpServerResponse";
const stream = LanguageModel.streamText({ prompt }).pipe( Stream.provide(languageModel), Sse.encode,);return HttpServerResponse.stream(stream, { headers: { "content-type": "text/event-stream", "cache-control": "no-cache", "x-accel-buffering": "no", },});Tuning the Gateway
Section titled “Tuning the Gateway”Every prop maps to an in-place update — no replacement, no downtime.
export const Gateway = Cloudflare.AiGateway("Gateway", { id: "prod-gateway", cacheTtl: 300, cacheInvalidateOnUpdate: true, rateLimitingInterval: 60, rateLimitingLimit: 100, rateLimitingTechnique: "sliding", collectLogs: true, logManagement: 100_000, logManagementStrategy: "DELETE_OLDEST", authentication: true,});