SDK Quick Start
Wrap your LLM client with the SuperPenguin Python or TypeScript SDK to get per-request cost tracking, customer attribution, and spend analytics.
Install the SDK & get your key
Install the Python SDK with pip install superpenguin or the TypeScript SDK with npm install @superpenguin/js, then go to SDK Keys and create an SDK API key. You'll get a key starting with sp_. Copy it. It's only shown once. For TypeScript apps, keep SP_API_KEY server-side only.
Wrap your client and attribute costs
Initialize the SDK with your sp_ key, then call sp.wrap() in Python or TypeScript native provider clients, or trackGenerateText() / trackStreamText() for the Vercel AI SDK. Pass metadata / spMetadata so every request is attributed to a customer, feature, or prompt. No base URL changes needed. One row is logged per call with the right billing unit (tokens for LLMs, audio_seconds for Deepgram, characters for ElevenLabs).
Python (OpenAI)
import superpenguin as sp
from openai import OpenAI
sp.init(api_key="sp_...")
client = sp.wrap(OpenAI(), metadata={
"customer_id": "cust_acme_123",
"feature": "doc_summary",
"team": "product",
"environment": "production",
"prompt_key": "summarize-article",
"prompt_version": "1",
})
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Hello"}],
)Metadata Fields
| Field | Type | Purpose |
|---|---|---|
prompt_key | string | Identifies the prompt. Appears on the Prompts analytics page. |
prompt_version | string | number | Version identifier of the prompt (e.g. "1", "beta", "2.1") |
customer_id | string | End-customer or account consuming the AI call |
feature | string | Product feature name (e.g., search, support_agent) |
team | string | Internal team owning the feature |
environment | string | production, staging, dev, etc. |
* | string | Any other key is stored as a custom tag |
All fields are optional. Start with none and add them incrementally.
Per-call overrides (optional)
Need to log a different value on every request (e.g. one call for customer A, the next for customer B)? Yes, customer_id and every other field accept any value you compute at request time. Three patterns are supported. Precedence is per-call > @sp.trace > wrap-time defaults, so you can layer them.
Override support per provider
| Provider | extra_body | metadata= kwarg | sp.metadata() | @sp.trace |
|---|---|---|---|---|
| OpenAI | yes | n/a | yes | yes |
| Anthropic | yes(0.3.1+) | n/a | yes | yes |
| Gemini (google-genai) | not in SDK | n/a | yes | yes |
| LiteLLM | n/a | yes | yes | yes |
| Deepgram | not in SDK | n/a | yes | yes |
| ElevenLabs | not in SDK | n/a | yes | yes |
“not in SDK” means the provider's own Python library (Google's genai, Deepgram's deepgram-sdk, ElevenLabs' elevenlabs) doesn't accept that argument, so passing it would raise TypeError. Use sp.metadata() or @sp.trace for those providers. sp.metadata() is the recommended default: it works on every provider and never touches the HTTP body, so there's nothing for the provider's validator to reject.
Python
# sp.metadata() is the recommended per-call override. It works on every
# provider (OpenAI, Anthropic, Gemini, Deepgram, ElevenLabs, LiteLLM) and
# never touches the HTTP body, so there's nothing for a strict provider
# validator (like Anthropic's) to reject. Layers on top of any active
# @sp.trace metadata; per-call extra_body still wins on top.
def summarize_for(customer_id: str, text: str) -> str:
with sp.metadata({"customer_id": customer_id, "prompt_version": "2"}):
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": text}],
)
return response.choices[0].message.content
summarize_for("cust_acme_123", "First doc...") # row 1, customer A
summarize_for("cust_widgets_456", "Second doc...") # row 2, customer BHow does this actually work?
sp.metadata(...)and@sp.traceboth push metadata onto acontextvars.ContextVarthat the wrapper reads when emitting each row. They never touch the HTTP body, so they work for everyprovider, including Gemini, Deepgram, and ElevenLabs whose SDKs don't acceptextra_body.sp.metadata()is a context manager (block scope);@sp.traceis a decorator (function scope). Both reset on exit (also on exception).- OpenAI & Anthropic
extra_bodyis read out and stripped from the kwargs before the request leaves the SDK, so the provider never seessp_metadata. Otherextra_bodykeys (user,anthropic_beta, etc.) flow through unchanged. - LiteLLM exposes
metadata=as a first-class kwarg onlitellm.completion(). The wrapper reads it from kwargs.
Final precedence on every emitted row: sp.wrap(metadata=...) < @sp.trace(metadata=...) < sp.metadata(...) < extra_body / metadata=. Per-key, last-write-wins. Omitted keys fall through, so you only override the fields you want to change.
View your data
Go to the Attribution page. You'll see:
KPI Cards
Total SDK spend, request count, avg cost per request
Breakdown Tabs
Slice by model, provider, customer, feature, team, or environment
Drilldowns
Click any row to see nested attribution (e.g., models per customer)
Recent Requests
Individual request log with tokens, cost, latency, metadata
Switch to the Reconciliation tab to compare SDK-estimated costs against your actual provider bills.
Cost Estimation
The SDK includes a built-in pricing table for automatic cost estimation. Models with known pricing:
| Model Prefix | Provider |
|---|---|
gpt-*, o3-*, o4-* | OpenAI |
claude-* | Anthropic |
gemini-* | |
grok-* | xAI |
deepgram/nova-* | Deepgram (STT, per audio-second) |
elevenlabs/eleven_* | ElevenLabs (TTS, per character) |
Unknown models still get tracked (tokens, latency, metadata). Cost shows as $0 until pricing is added.
Privacy
Never logged
Prompt content, response content, images, audio bytes, transcript text, tool arguments, and function results. The SDK never captures conversation or audio data, only cost-relevant measurements.
What is logged: provider, model, token counts (or audio-seconds / characters / events for voice rows), estimated cost, latency, status, and the metadata fields you set in Step 2.
Slack Alerts
Get notified in Slack when your AI spend crosses a threshold. Alerts require a Growth plan or above.
Connect Slack
- Go to Integrations and click Add to Slack
- Authorize SuperPenguin in your Slack workspace
- Choose a default channel for alert notifications
- Click Send test to verify. You should see a confirmation message in the channel.
Alternative: paste an Incoming Webhook URL instead of using the OAuth flow.
Create alert rules
Go to Alerts and create rules. Each rule fires once per period (month or day) to avoid noise.
Alert Types
| Type | Threshold | Fires when |
|---|---|---|
| Monthly budget | Dollar amount | Calendar-month spend exceeds the threshold (once per month) |
| Daily spike | Percentage (> 100%) | Today's spend exceeds yesterday's by the given percentage (once per day) |
| Daily amount | Dollar amount | Today's estimated spend exceeds the threshold (once per day) |
Scoping & channel routing
Each rule can optionally be scoped to a specific provider, model, project, or API key, or left org-wide. You can also route individual rules to different Slack channels instead of the default.
Deduplication
Each rule fires at most once per period: once per calendar month for monthly budget alerts, once per day for daily alerts. You won't be spammed.
Troubleshooting
| Problem | Fix |
|---|---|
sp.init() has not been called | Call sp.init(api_key="sp_...") or set SP_API_KEY env var |
Unsupported client type | sp.wrap() supports OpenAI, AsyncOpenAI, Anthropic, AsyncAnthropic, and google.genai.Client (AI Studio + Vertex AI) |
| Attribution page is empty | Data appears within seconds. Try refreshing. |
| Cost shows as $0 | Model may not be in the pricing table yet. Tokens and latency still track correctly. |