Lightning API¶
InferencePort AI exposes two hosted API experiences on the same Lightning backend:
The Generation API is the subscription-backed API used for normal cloud chatting and the app’s default Lightning chat flow.
The Pay-2-Go (P2G) API is the credit-backed API intended for enterprise integrations, stable production workloads, and other usage that should be metered independently of a subscription.
Both APIs run on https://sharktide-lightning.hf.space and both can be used
with either a Supabase access token or a dashboard-generated Lightning API key.
Which API should I use?¶
Use this rule of thumb:
Generation (Subscription) API: best for high-volume, regular chatting with low token usage. This is the default for chatting in the product, but it comes with strict abuse controls and per-plan token limits.
P2G API: best for enterprise integrations, stable production APIs, and workloads that need predictable credit-based billing instead of plan quotas.
In other words:
If you are building a chat UI and want the same behavior as the app, start with the Generation API.
If you are building an external service, customer-facing integration, or production backend, use the P2G API.
How they work together¶
The two APIs are complementary, not competing:
A single account can use both.
Subscription access controls the generation/chat experience and the daily plan limits shown in the app.
P2G credits are separate and are spent from the wallet when you call the hosted
/v1routes.The hosted console at
https://inference.js.org/consoleis where you can review balances, inspect usage, buy credit packs, and create API keys.
Authentication¶
Lightning accepts either of these bearer credentials:
A Supabase access token for the signed-in user.
A dashboard-generated Lightning API key.
Use the same Authorization header for both APIs:
curl https://sharktide-lightning.hf.space/subscription \
-H "Authorization: Bearer YOUR_TOKEN"
Create and manage API keys¶
Sign in at
https://inference.js.org/console.Open the API key section in the console dashboard.
Enter a label for the key.
Optionally add an ISO-8601 expiration timestamp.
Copy the key immediately after creation. The raw secret is only shown once.
Generation API¶
The Generation API is the subscription-backed API. It uses plan resolution and plan quotas to control access to cloud chat, images, video, and audio.
Base URL¶
https://sharktide-lightning.hf.space
Plan and usage endpoints¶
GET /subscription¶
Returns the authenticated user’s resolved plan and subscription view.
Response fields:
emailsigned_upplan_keyplan_namesubscription:nullfor free users, otherwise a list of active or recent subscription rowsauth_type:jwtorapi_key
GET /usage¶
Returns the current usage snapshot for the resolved identity.
Response fields:
plan_keyplan_namegenerated_atusage.cloudChatDailyusage.imagesDailyusage.videosDailyusage.audioWeekly
Each usage metric contains:
limitusedremainingwindowperiod
GET /tier-config¶
Returns the normalized plan catalog used by the UI.
defaultPlanKeyplans[]withkey,name,url,price,limits, andorder
GET /tiers¶
Returns the paid plans only, mainly for upgrade UI.
Status and discovery endpoints¶
GET /¶
Permanent redirect to the public site.
GET /models¶
Scrapes the Ollama library page and returns public model metadata used by the marketplace UI.
GET /status¶
Returns overall Lightning service health with:
stateservicesnotificationslatest
HEAD /status/image, /status/video, /status/sfx, /status/text¶
Lightweight capability checks that return content-type headers only.
Generation endpoints¶
All generation routes are rooted at /gen.
POST /gen/chat/completions¶
OpenAI-compatible chat completions endpoint.
Important request fields:
messages: required arraystream: optional booleantools: optional tool listtool_choice: optional tool selection
Behavior notes:
Lightning chooses an upstream model automatically based on prompt complexity, code signals, tools, and image presence.
Authenticated requests consume
cloudChatDailyusage from the resolved plan.Streaming responses are returned as
text/event-stream.The first streaming metadata chunk includes
router_metadata.model_name.
Example:
curl https://sharktide-lightning.hf.space/gen/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Write a haiku about shipping software."}
]
}'
POST /gen/image and GET /gen/image/{prompt}¶
Image generation endpoint.
Request fields:
prompt: requiredmode: optional,fantasyorrealisticimage_urls: optional list with up to two URLs or base64 image strings
Notes:
Base64 image inputs are stored temporarily and re-served through
/asset-cdn/assets/{image_id}.Image generation consumes
imagesDailyusage.
POST /gen/video and GET /gen/video/{prompt}¶
Pollinations-backed video generation.
Request fields:
prompt: requiredratio: optional,3:2,2:3, or1:1mode: optional,normalorfunduration: optional integer, clamped to1through10image_urls: optional list with up to two URLs or base64 image strings
Video generation consumes videosDaily usage.
POST /gen/video/airforce and GET /gen/video/airforce/{prompt}¶
Alternate Airforce-backed video endpoint with the same auth model. It also
consumes videosDaily usage.
POST /gen/sfx and GET /gen/sfx/{prompt}¶
Music and sound-effect generation. Requires prompt and consumes
audioWeekly usage.
POST /gen/tts and GET /gen/tts/{prompt}¶
Text-to-speech generation. Requires prompt and also consumes
audioWeekly usage.
POST /gen/prompt_analyze¶
Returns the router’s chosen display model for a prompt payload. The request
body expects prompt as a message array.
Asset CDN¶
GET /asset-cdn/assets/{image_id}¶
Returns temporary PNG files created when image or video generation receives base64 image inputs.
Rate limits and identity¶
Lightning tracks these metrics per resolved identity:
cloudChatDailyimagesDailyvideosDailyaudioWeekly
If no bearer token is present, Lightning falls back to a free-tier identity
derived from X-Client-ID when available, otherwise from request IP and
user-agent.
Rate limits and fit¶
Generation is the right choice when you want:
Regular chat traffic.
Low token usage.
The same routing behavior used by the app’s default chat flow.
It is not the best choice when you need:
Enterprise-style billing isolation.
A stable public API surface for production systems.
A usage model that does not depend on subscription quotas.
P2G API¶
The Pay-2-Go API is the credit-billed API. It keeps its own wallet and ledger and charges credits per request.
Base URL¶
https://sharktide-lightning.hf.space/v1
Auth and identity¶
The P2G router resolves either a Supabase JWT or a dashboard-generated API key. It returns the authenticated user’s wallet and usage history.
GET /config¶
Returns the public dashboard configuration used by the web console.
GET /models¶
Returns the configured model list.
GET /me¶
Returns the current wallet and usage summary.
GET /credits/ledger¶
Returns the latest ledger rows.
GET /stripe/reconcile/{session_id}¶
Looks up a Stripe checkout session and returns reconciliation details used by the console confirm page.
Example:
curl https://sharktide-lightning.hf.space/v1/me \
-H "Authorization: Bearer YOUR_TOKEN"
Credit-backed generation endpoints¶
POST /chat/completions¶
OpenAI-compatible chat completions with metered credit charging.
Behavior notes:
Requests are rate-limited per identity.
The server checks wallet balance before generating.
Streamed responses include a
payg-usageevent when the final charge is known.The response usage object includes
payg_input_tokens,payg_output_tokens,payg_total_tokens, andpayg_credits_charged.
Example:
curl https://sharktide-lightning.hf.space/v1/chat/completions \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Give me a concise API summary."}
]
}'
POST /images/generations¶
Generates an image and returns an OpenAI-style base64 payload.
POST /videos/generations¶
Generates a video and returns an video/mp4 response.
POST /audio/generations¶
Generates audio and returns an audio/mpeg response.
Pricing¶
P2G pricing is credit-based. The current default server rates in
helper/payg.py are:
textCreditPerMillionTokens:0.75credits per 1,000,000 text tokensimageCreditPerImage:0.02credits per imagevideoCreditPerSecond:0.01credits per second of videoaudioCreditPerSecond:0.01credits per second of audio
The hosted configuration can override those defaults, and the console always shows the live dashboard values.
API key behavior¶
API keys are long-lived bearer tokens that can be created from the console. They are stored hashed in the database, and only the prefix is retained for display purposes. A key can be revoked from the console at any time, after which it stops authenticating requests immediately.
Failure modes¶
Common status codes:
400for invalid request payloads401for invalid, expired, or revoked bearer credentials402when a P2G wallet does not have enough credits413for prompts that exceed configured size limits429for plan or spam-limit exhaustion500for upstream provider or server configuration failures