SDK Quick Start

Wrap your LLM client with the SuperPenguin Python or TypeScript SDK to get per-request cost tracking, customer attribution, and spend analytics.

Install the SDK & get your key

Install the Python SDK with pip install superpenguin or the TypeScript SDK with npm install @superpenguin/js, then go to API Keys and create an API key. You'll get a key starting with sp-. Copy it. It's only shown once. For TypeScript apps, keep SP_API_KEY server-side only.

Wrap your client and attribute costs

Initialize the SDK with your sp- key, then call sp.wrap() in Python or TypeScript native provider clients, or trackGenerateText() / trackStreamText() for the Vercel AI SDK. Pass metadata / spMetadata so every request is attributed to a customer, feature, or prompt. No base URL changes needed. One row is logged per call with the right billing unit (tokens for LLMs, audio_seconds for Deepgram, characters for ElevenLabs).

Python (OpenAI)

import superpenguin as sp
from openai import OpenAI

sp.init(api_key="sp-...")

client = sp.wrap(OpenAI(), metadata={
    "customer_id":    "cust_acme_123",
    "feature":        "doc_summary",
    "team":           "product",
    "environment":    "production",
    "prompt_key":     "summarize-article",
    "prompt_version": "1",
})

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
)

Metadata Fields

Field	Type	Purpose
`prompt_key`	string	Identifies the prompt. Appears on the Prompts analytics page.
`prompt_version`	string \| number	Version identifier of the prompt (e.g. "1", "beta", "2.1")
`customer_id`	string	End-customer or account consuming the AI call
`feature`	string	Product feature name (e.g., search, support_agent)
`team`	string	Internal team owning the feature
`environment`	string	production, staging, dev, etc.
`*`	string	Any other key is stored as a custom tag

All fields are optional. Start with none and add them incrementally.

Per-call overrides (optional)

Need to log a different value on every request (e.g. one call for customer A, the next for customer B)? Yes, customer_id and every other field accept any value you compute at request time. Three patterns are supported. Precedence is per-call > @sp.trace > wrap-time defaults, so you can layer them.

Override support per provider

Provider	`extra_body`	`metadata=` kwarg	`sp.metadata()`	`@sp.trace`
OpenAI	yes	n/a	yes	yes
Anthropic	yes(0.3.1+)	n/a	yes	yes
Gemini (google-genai)	not in SDK	n/a	yes	yes
AWS Bedrock (boto3)	not in SDK	n/a	yes	yes
LiteLLM	n/a	yes	yes	yes
Deepgram	not in SDK	n/a	yes	yes
ElevenLabs	not in SDK	n/a	yes	yes

“not in SDK” means the provider's own Python library (Google's genai, Deepgram's deepgram-sdk, ElevenLabs' elevenlabs, or Bedrock's strictly-validated Converserequest) doesn't accept that argument, so passing it would raise TypeError or be rejected by the provider. Use sp.metadata() or @sp.trace for those providers. sp.metadata() is the recommended default: it works on every provider and never touches the HTTP body, so there's nothing for the provider's validator to reject.

Python

# sp.metadata() is the recommended per-call override. It works on every
# provider (OpenAI, Anthropic, Gemini, Deepgram, ElevenLabs, LiteLLM) and
# never touches the HTTP body, so there's nothing for a strict provider
# validator (like Anthropic's) to reject. Layers on top of any active
# @sp.trace metadata; per-call extra_body still wins on top.

def summarize_for(customer_id: str, text: str) -> str:
    with sp.metadata({"customer_id": customer_id, "prompt_version": "2"}):
        response = client.chat.completions.create(
            model="gpt-5.4",
            messages=[{"role": "user", "content": text}],
        )
        return response.choices[0].message.content

summarize_for("cust_acme_123", "First doc...")     # row 1, customer A
summarize_for("cust_widgets_456", "Second doc...") # row 2, customer B

How does this actually work?

sp.metadata(...) and @sp.trace both push metadata onto a contextvars.ContextVar that the wrapper reads when emitting each row. They never touch the HTTP body, so they work for everyprovider, including Gemini, AWS Bedrock, Deepgram, and ElevenLabs whose SDKs (or strictly-validated request bodies) don't accept extra_body. sp.metadata() is a context manager (block scope); @sp.trace is a decorator (function scope). Both reset on exit (also on exception).
OpenAI & Anthropic extra_body is read out and stripped from the kwargs before the request leaves the SDK, so the provider never sees sp_metadata. Other extra_body keys (user, anthropic_beta, etc.) flow through unchanged.
LiteLLM exposes metadata= as a first-class kwarg on litellm.completion(). The wrapper reads it from kwargs.

Final precedence on every emitted row: sp.wrap(metadata=...) < @sp.trace(metadata=...) < sp.metadata(...) < extra_body / metadata=. Per-key, last-write-wins. Omitted keys fall through, so you only override the fields you want to change.

View your data

Go to the Attribution page. You'll see:

KPI Cards

Total SDK spend, request count, avg cost per request

Breakdown Tabs

Slice by model, provider, model provider, customer, feature, team, or environment

Drilldowns

Click any row to see nested attribution (e.g., models per customer)

Recent Requests

Individual request log with tokens, cost, latency, metadata

Switch to the Reconciliation tab to compare SDK-estimated costs against your actual provider bills.

Batch API jobs (async, ~50% off)

Native provider Batch APIs (OpenAI, Anthropic, Google Gemini Developer API) run async and file-based. Usage is only known after a job completes, so sp.wrap() / wrap() does not auto-capture them. Call a track_*_batch / track*Batch helper once when the job reaches a terminal success state.

Python

import superpenguin as sp
from openai import OpenAI

sp.init(api_key="sp-...")
client = OpenAI()

batch = client.batches.retrieve("batch_abc")  # status == "completed"
count = sp.track_openai_batch(
    client,
    batch,
    metadata={"feature": "nightly-eval"},
)

Each succeeded request line emits one batch=true row backdated to the batch completion time, with a deterministic idempotency key (batch_id:custom_id). Retries are safe: duplicate keys dedupe on the server.

Platform compatibility

Platform	Python	TypeScript	Required state
OpenAI	`track_openai_batch`	`trackOpenAIBatch`	`completed`
Azure OpenAI	same as OpenAI	`trackOpenAIBatch`	`completed`
Anthropic	`track_anthropic_batch`	`trackAnthropicBatch`	`ended`
Google Gemini (Developer API / AI Studio)	`track_google_batch`	`trackGoogleBatch`	`succeeded`
AWS Bedrock batch	not yet	not yet	deferred
Vertex AI batch	not yet	not yet	deferred

The server prices batch=true rows from explicit rate_variants.batch catalog legs (no runtime multiplier). Each batch event also carries base_url (the provider endpoint hostname) so the dashboard can infer which platform billed the job (e.g. Azure vs first-party OpenAI).

Regional and tier pricing

Some providers charge different token rates by AWS region, deployment type (Azure data zone / Bedrock cross-region), or service tier (OpenAI flex / priority / batch). The SDK forwards these as first-class event fields so the server can pick the right rate card. They are not attribution tags: reserved keys are stripped from custom_tags.

Precedence:per-call value > wrap-level override > auto-capture > omitted (server uses the model's default reference price). Every field is optional.

Field	Auto-captured?	Notes
`region`	Bedrock (boto3 / AWS SDK)	From `meta.region_name` or the `bedrock-runtime.<region>.amazonaws.com` endpoint host. Override via wrap kwargs or per-call bag (OpenAI-compatible providers).
`deployment_type`	No	Wrap-level or per-call. Use for Azure data-zone routing or Bedrock global / geo cross-region inference.
`service_tier`	OpenAI (response)	Echoed as `flex`, `priority`, etc. Batch jobs use `batch=true` instead.

AWS Bedrock sends cost_usd_micros: 0 from the client; the server always prices Bedrock rows with the live aws_bedrock catalog (including per-region variants when present). Converse has no per-call metadata passthrough, so use wrap kwargs for deployment_type and attribute with sp.metadata() / @sp.trace.

Python (AWS Bedrock)

# AWS Bedrock: region is auto-captured from the boto3 client.
# deployment_type is wrap-level only (Converse has no per-call metadata bag).
import boto3

client = sp.wrap(
    boto3.client("bedrock-runtime", region_name="ap-northeast-1"),
    metadata={"feature": "doc_summary"},
    deployment_type="global",  # optional: global / geo / in-region Nova profiles
)

client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[{"role": "user", "content": [{"text": "Hello"}]}],
)
# Row carries region="ap-northeast-1" and cost is priced server-side.

Cost Estimation

The SDK includes a built-in pricing table for automatic cost estimation. Models with known pricing:

Model Prefix	Provider
`gpt-`, `o3-`, `o4-*`	OpenAI
`claude-*`	Anthropic
`gemini-*`	Google
`grok-*`	xAI
`bedrock-runtime modelId`	AWS Bedrock (priced server-side at the `aws_bedrock` rate; the SDK sends usage only)
`deepgram/nova-*`	Deepgram (STT, per audio-second)
`elevenlabs/eleven_*`	ElevenLabs (TTS, per character)

Unknown models still get tracked (tokens, latency, metadata). Cost shows as $0 until pricing is added.

Content capture (prompts & completions)

By default the SDK sends only cost metadata, never prompt or completion text. If your org opts in from Settings (Pro or Enterprise), the SDK then captures a sampled subset of prompts and completions automatically. There is nothing to call to turn it on. Samples are text-only (images, audio, and binary parts are stripped), redacted by default, encrypted at rest, and deletable by the org owner from Settings.

Grouping multi-turn conversations. Set two metadata keys so captured rows can be stitched back together: session_id identifies one conversation, and turn_id identifies one turn. Each model call is its own row, so a turn that makes a tool call (model → tool → model) spans two rows: reuse the same turn_id across those calls to group them. If you omit turn_id, each row gets a fresh random one and will not group.

Python

import superpenguin as sp
from openai import OpenAI

sp.init(api_key="sp-...")
client = sp.wrap(OpenAI())

# Capture is OFF until your org opts in from Settings (Pro/Enterprise). Once on,
# sampled prompts + completions are captured automatically and encrypted at rest.
# Group them with two metadata keys:
#   session_id  -> one conversation
#   turn_id     -> one turn (reuse it across a tool round-trip to group calls)
with sp.metadata({
    "session_id":  conversation_id,
    "turn_id":     turn_id,
    "customer_id": "cust_acme_123",
    "prompt_key":  "support-reply",
}):
    first = client.chat.completions.create(model="gpt-4o", messages=msgs)
    # ...the model requests a tool; you run it locally...
    final = client.chat.completions.create(model="gpt-4o", messages=msgs_with_tool_result)

# Optional: narrow capture from the SDK side. It can only REDUCE capture,
# never enable it (the Settings opt-in is always required).
sp.configure_content_capture(
    allow=True,           # False vetoes ALL capture from this process
    record_prompt=True,   # False drops inputs, keeps outputs
    record_outcome=True,  # False drops outputs, keeps inputs
)

configure_content_capture() / configureContentCapture() can only narrow what is captured (veto entirely with allow=False, or keep only one side with record_prompt / record_outcome). It can never enable capture, the Settings opt-in is always required.

Privacy

Not captured by default

Prompt content, response content, images, audio bytes, transcript text, tool arguments, and function results. The SDK only sends cost-relevant metadata unless your organization explicitly opts in from Settings (Pro or Enterprise), the SDK allows capture, and the call is sampled.

What is always logged: provider, model, token counts (or audio-seconds / characters / events for voice rows), estimated cost, latency, status, and the metadata fields you set in Step 2.

Opt-in content capture (off by default) is text-only: images, audio, and binary parts are stripped before send. Samples are redacted by default, encrypted at rest, deletable from Settings, and never shown in the dashboard. From the SDK you can veto all capture for a process with configure_content_capture(allow=False) (see the Content capture section above).

Spend Alerts

Get notified in Slack, email, or Discord when your AI spend crosses a threshold. Alerts require a Growth plan or above.

Connect a notification channel

Go to Integrations and pick a channel: Slack, Email, or Discord
Slack: click Add to Slack and authorize SuperPenguin in your workspace (or paste an Incoming Webhook URL). Email: add the recipient addresses. Discord: paste a channel webhook URL.
Choose a default destination for alert notifications
Click Send test to verify. You should see a confirmation message on the channel.

You can enable more than one channel. Each alert fans out to every connected channel.

Create alert rules

Go to Alerts and create rules. Each rule fires once per period (month or day) to avoid noise.

Alert Types

Type	Threshold	Fires when
Monthly budget	Dollar amount	Calendar-month spend exceeds the threshold (once per month)
Daily spike	Percentage (> 100%)	Today's spend exceeds yesterday's by the given percentage (once per day)
Daily amount	Dollar amount	Today's estimated spend exceeds the threshold (once per day)

Scoping & channel routing

Each rule can optionally be scoped to a specific provider, model, project, or API key, or left org-wide. You can also route individual rules to different destinations instead of the default channel.

Deduplication

Each rule fires at most once per period: once per calendar month for monthly budget alerts, once per day for daily alerts. You won't be spammed.

Troubleshooting

Problem	Fix
`sp.init() has not been called`	Call `sp.init(api_key="sp-...")` or set `SP_API_KEY` env var
`Unsupported client type`	`sp.wrap()` supports OpenAI, AsyncOpenAI, Anthropic, AsyncAnthropic, `google.genai.Client` (AI Studio + Vertex AI), Deepgram, ElevenLabs, and `boto3`/`aioboto3` `bedrock-runtime` clients
Attribution page is empty	Data appears within seconds. Try refreshing.
Cost shows as $0	Model may not be in the pricing table yet. Tokens and latency still track correctly. For Bedrock, also confirm the client region matches a catalog variant (region is auto-captured from the boto3 / AWS SDK client).
Bedrock cost looks too low / high	Check the captured `region` and wrap-level `deployment_type`. Cross-region Nova and non-default AWS regions use different rate cards.