openai platform
AI models and API integration
$ npx docs2skills add openai-platformOpenAI Platform
AI models and API integration for GPT, DALL-E, Whisper, and embeddings
What this skill does
The OpenAI Platform provides REST API access to state-of-the-art AI models including GPT-4, GPT-3.5, DALL-E 3, Whisper, and text embeddings. It enables developers to integrate conversational AI, text generation, image creation, speech processing, and semantic search capabilities into applications through HTTP requests.
The platform handles model inference, scaling, and optimization behind a unified API. It supports both streaming and batch processing, fine-tuning custom models, and managing conversation context. OpenAI's API is the foundation for ChatGPT and powers thousands of AI applications across web, mobile, and enterprise systems.
Prerequisites
- OpenAI API key from platform.openai.com
- HTTP client library (curl, axios, requests, etc.)
- Node.js 16+ for JavaScript SDK
- Python 3.7+ for Python SDK
- Valid payment method for usage-based billing
- Organization ID for team accounts
Quick start
npm install openai
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const completion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Say hello!" }],
model: "gpt-3.5-turbo",
});
console.log(completion.choices[0].message.content);
Core concepts
Models are the AI systems you interact with. Each model has different capabilities, costs, and context limits. GPT-4 excels at complex reasoning, GPT-3.5-turbo balances performance and cost, DALL-E generates images, and Whisper processes audio.
Messages structure conversations with roles (system, user, assistant) that provide context and control behavior. The system message sets instructions, user messages contain prompts, and assistant messages are model responses. Message history maintains conversation context.
Tokens are text units that models process. Each model has token limits affecting context length and costs. Roughly 1 token equals 0.75 words in English. Token counting impacts both billing and technical constraints.
Streaming delivers responses incrementally instead of waiting for completion. Essential for real-time applications and long responses. Non-streaming returns complete responses in single requests.
Fine-tuning customizes models on specific datasets to improve performance for particular use cases, changing model behavior beyond what prompting alone can achieve.
Key API surface
| Endpoint | Purpose |
|---|---|
chat.completions.create() | Generate conversational responses with GPT models |
completions.create() | Generate text completions (legacy models) |
images.generate() | Create images with DALL-E models |
images.createVariation() | Generate variations of existing images |
audio.transcriptions.create() | Convert speech to text with Whisper |
audio.translations.create() | Translate audio to English text |
audio.speech.create() | Convert text to speech |
embeddings.create() | Generate vector embeddings for text |
files.create() | Upload files for fine-tuning or assistants |
fine_tuning.jobs.create() | Start model fine-tuning jobs |
models.list() | List available models and capabilities |
moderations.create() | Check content for policy violations |
Common patterns
System-prompted conversations control model behavior:
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "Explain async/await in JavaScript" }
]
});
Streaming responses for real-time UX:
const stream = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [{ role: "user", content: "Tell me a story" }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Function calling for structured outputs:
const response = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [{ role: "user", content: "What's the weather in Boston?" }],
functions: [{
name: "get_weather",
description: "Get current weather",
parameters: {
type: "object",
properties: { location: { type: "string" } }
}
}]
});
Image generation with customization:
const image = await openai.images.generate({
model: "dall-e-3",
prompt: "A cyberpunk cityscape at night",
size: "1024x1024",
quality: "hd",
style: "vivid"
});
Audio transcription:
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream("audio.mp3"),
model: "whisper-1",
language: "en"
});
Configuration
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
organization: "org-123", // Optional organization ID
project: "proj-456", // Optional project ID
baseURL: "https://api.openai.com/v1", // API endpoint
timeout: 60000, // Request timeout in ms
maxRetries: 2, // Automatic retry attempts
dangerouslyAllowBrowser: false // Browser usage warning
});
Environment variables:
OPENAI_API_KEY: Required API keyOPENAI_ORG_ID: Organization identifierOPENAI_PROJECT_ID: Project scope
Best practices
Use system messages to set consistent behavior across conversations. They're more reliable than user instructions for controlling model personality and output format.
Implement exponential backoff for rate limit errors (429) and server errors (5xx). The SDK includes automatic retries, but add custom logic for critical applications.
Count tokens before requests to avoid context limit errors. Use tiktoken library for accurate counting, especially with long conversations or documents.
Stream long responses to improve perceived performance. Buffer chunks appropriately and handle connection interruptions gracefully.
Validate and sanitize inputs before sending to the API. Use the moderation endpoint for user-generated content to ensure policy compliance.
Cache embeddings when possible - they're deterministic and expensive to regenerate. Store vectors in dedicated databases like Pinecone or Weaviate for similarity search.
Set temperature and top_p appropriately: temperature=0 for deterministic outputs, 0.7-0.9 for creative tasks. Lower top_p (0.1) for focused responses, higher (0.9) for diverse outputs.
Use function calling instead of parsing free-form text when you need structured data. It's more reliable and handles edge cases better.
Gotchas and common mistakes
- Rate limits vary by model and tier - GPT-4 has lower limits than GPT-3.5. Check headers for remaining quota and implement proper queuing
- Context limits are hard boundaries - Requests exceeding max tokens fail entirely. Truncate or summarize long conversations
- Streaming responses can fail mid-stream - Always handle incomplete responses and connection drops gracefully
- Token counts differ between models - The same text uses different token counts across model families
- Fine-tuning requires specific formats - JSONL files with exact schema. Validation errors only appear after upload
- Images in chat require vision models - Only GPT-4V and later support image inputs. Regular GPT models will error
- Function calling responses may not include function calls - Always check if the model chose to call functions before parsing
- Moderation results are not binary - Check specific category scores, not just the flagged boolean
- Organization billing applies to all members - API usage counts against organization limits regardless of individual API keys
- Browser usage exposes API keys - Never use API keys in client-side code. Always proxy through your backend
- Retries can cause duplicate processing - Implement idempotency for non-idempotent operations
- Model availability changes - Check model endpoints before deployment. Deprecated models eventually become unavailable