SDK Quick Start

Wrap your LLM client with the SuperPenguin Python or TypeScript SDK to get per-request cost tracking, customer attribution, and spend analytics.

1

Install the SDK & get your key

Install the Python SDK with pip install superpenguin or the TypeScript SDK with npm install @superpenguin/js, then go to SDK Keys and create an SDK API key. You'll get a key starting with sp_. Copy it. It's only shown once. For TypeScript apps, keep SP_API_KEY server-side only.

2

Wrap your client and attribute costs

Initialize the SDK with your sp_ key, then call sp.wrap() in Python or TypeScript native provider clients, or trackGenerateText() / trackStreamText() for the Vercel AI SDK. Pass metadata / spMetadata so every request is attributed to a customer, feature, or prompt. No base URL changes needed. One row is logged per call with the right billing unit (tokens for LLMs, audio_seconds for Deepgram, characters for ElevenLabs).

Python (OpenAI)

import superpenguin as sp
from openai import OpenAI

sp.init(api_key="sp_...")

client = sp.wrap(OpenAI(), metadata={
    "customer_id":    "cust_acme_123",
    "feature":        "doc_summary",
    "team":           "product",
    "environment":    "production",
    "prompt_key":     "summarize-article",
    "prompt_version": "1",
})

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "Hello"}],
)

Metadata Fields

FieldTypePurpose
prompt_keystringIdentifies the prompt. Appears on the Prompts analytics page.
prompt_versionstring | numberVersion identifier of the prompt (e.g. "1", "beta", "2.1")
customer_idstringEnd-customer or account consuming the AI call
featurestringProduct feature name (e.g., search, support_agent)
teamstringInternal team owning the feature
environmentstringproduction, staging, dev, etc.
*stringAny other key is stored as a custom tag

All fields are optional. Start with none and add them incrementally.

Per-call overrides (optional)

Need to log a different value on every request (e.g. one call for customer A, the next for customer B)? Yes, customer_id and every other field accept any value you compute at request time. Three patterns are supported. Precedence is per-call > @sp.trace > wrap-time defaults, so you can layer them.

Override support per provider

Providerextra_bodymetadata= kwargsp.metadata()@sp.trace
OpenAIyesn/ayesyes
Anthropicyes(0.3.1+)n/ayesyes
Gemini (google-genai)not in SDKn/ayesyes
LiteLLMn/ayesyesyes
Deepgramnot in SDKn/ayesyes
ElevenLabsnot in SDKn/ayesyes

“not in SDK” means the provider's own Python library (Google's genai, Deepgram's deepgram-sdk, ElevenLabs' elevenlabs) doesn't accept that argument, so passing it would raise TypeError. Use sp.metadata() or @sp.trace for those providers. sp.metadata() is the recommended default: it works on every provider and never touches the HTTP body, so there's nothing for the provider's validator to reject.

Python

# sp.metadata() is the recommended per-call override. It works on every
# provider (OpenAI, Anthropic, Gemini, Deepgram, ElevenLabs, LiteLLM) and
# never touches the HTTP body, so there's nothing for a strict provider
# validator (like Anthropic's) to reject. Layers on top of any active
# @sp.trace metadata; per-call extra_body still wins on top.
def summarize_for(customer_id: str, text: str) -> str:
    with sp.metadata({"customer_id": customer_id, "prompt_version": "2"}):
        response = client.chat.completions.create(
            model="gpt-5.4",
            messages=[{"role": "user", "content": text}],
        )
        return response.choices[0].message.content

summarize_for("cust_acme_123", "First doc...")     # row 1, customer A
summarize_for("cust_widgets_456", "Second doc...") # row 2, customer B

How does this actually work?

  • sp.metadata(...) and @sp.trace both push metadata onto a contextvars.ContextVar that the wrapper reads when emitting each row. They never touch the HTTP body, so they work for everyprovider, including Gemini, Deepgram, and ElevenLabs whose SDKs don't accept extra_body. sp.metadata() is a context manager (block scope); @sp.trace is a decorator (function scope). Both reset on exit (also on exception).
  • OpenAI & Anthropic extra_body is read out and stripped from the kwargs before the request leaves the SDK, so the provider never sees sp_metadata. Other extra_body keys (user, anthropic_beta, etc.) flow through unchanged.
  • LiteLLM exposes metadata= as a first-class kwarg on litellm.completion(). The wrapper reads it from kwargs.

Final precedence on every emitted row: sp.wrap(metadata=...) < @sp.trace(metadata=...) < sp.metadata(...) < extra_body / metadata=. Per-key, last-write-wins. Omitted keys fall through, so you only override the fields you want to change.

3

View your data

Go to the Attribution page. You'll see:

KPI Cards

Total SDK spend, request count, avg cost per request

Breakdown Tabs

Slice by model, provider, customer, feature, team, or environment

Drilldowns

Click any row to see nested attribution (e.g., models per customer)

Recent Requests

Individual request log with tokens, cost, latency, metadata

Switch to the Reconciliation tab to compare SDK-estimated costs against your actual provider bills.

Cost Estimation

The SDK includes a built-in pricing table for automatic cost estimation. Models with known pricing:

Model PrefixProvider
gpt-*, o3-*, o4-*OpenAI
claude-*Anthropic
gemini-*Google
grok-*xAI
deepgram/nova-*Deepgram (STT, per audio-second)
elevenlabs/eleven_*ElevenLabs (TTS, per character)

Unknown models still get tracked (tokens, latency, metadata). Cost shows as $0 until pricing is added.

Privacy

Never logged

Prompt content, response content, images, audio bytes, transcript text, tool arguments, and function results. The SDK never captures conversation or audio data, only cost-relevant measurements.

What is logged: provider, model, token counts (or audio-seconds / characters / events for voice rows), estimated cost, latency, status, and the metadata fields you set in Step 2.

Slack Alerts

Get notified in Slack when your AI spend crosses a threshold. Alerts require a Growth plan or above.

Connect Slack

  1. Go to Integrations and click Add to Slack
  2. Authorize SuperPenguin in your Slack workspace
  3. Choose a default channel for alert notifications
  4. Click Send test to verify. You should see a confirmation message in the channel.

Alternative: paste an Incoming Webhook URL instead of using the OAuth flow.

Create alert rules

Go to Alerts and create rules. Each rule fires once per period (month or day) to avoid noise.

Alert Types

TypeThresholdFires when
Monthly budgetDollar amountCalendar-month spend exceeds the threshold (once per month)
Daily spikePercentage (> 100%)Today's spend exceeds yesterday's by the given percentage (once per day)
Daily amountDollar amountToday's estimated spend exceeds the threshold (once per day)

Scoping & channel routing

Each rule can optionally be scoped to a specific provider, model, project, or API key, or left org-wide. You can also route individual rules to different Slack channels instead of the default.

Deduplication

Each rule fires at most once per period: once per calendar month for monthly budget alerts, once per day for daily alerts. You won't be spammed.

Troubleshooting

ProblemFix
sp.init() has not been calledCall sp.init(api_key="sp_...") or set SP_API_KEY env var
Unsupported client typesp.wrap() supports OpenAI, AsyncOpenAI, Anthropic, AsyncAnthropic, and google.genai.Client (AI Studio + Vertex AI)
Attribution page is emptyData appears within seconds. Try refreshing.
Cost shows as $0Model may not be in the pricing table yet. Tokens and latency still track correctly.