openai platform

AI models and API integration

$ npx docs2skills add openai-platform

SKILL.md

OpenAI Platform

AI models and API integration for GPT, DALL-E, Whisper, and embeddings

What this skill does

The OpenAI Platform provides REST API access to state-of-the-art AI models including GPT-4, GPT-3.5, DALL-E 3, Whisper, and text embeddings. It enables developers to integrate conversational AI, text generation, image creation, speech processing, and semantic search capabilities into applications through HTTP requests.

The platform handles model inference, scaling, and optimization behind a unified API. It supports both streaming and batch processing, fine-tuning custom models, and managing conversation context. OpenAI's API is the foundation for ChatGPT and powers thousands of AI applications across web, mobile, and enterprise systems.

Prerequisites

OpenAI API key from platform.openai.com
HTTP client library (curl, axios, requests, etc.)
Node.js 16+ for JavaScript SDK
Python 3.7+ for Python SDK
Valid payment method for usage-based billing
Organization ID for team accounts

Quick start

npm install openai

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const completion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say hello!" }],
  model: "gpt-3.5-turbo",
});

console.log(completion.choices[0].message.content);

Core concepts

Models are the AI systems you interact with. Each model has different capabilities, costs, and context limits. GPT-4 excels at complex reasoning, GPT-3.5-turbo balances performance and cost, DALL-E generates images, and Whisper processes audio.

Messages structure conversations with roles (system, user, assistant) that provide context and control behavior. The system message sets instructions, user messages contain prompts, and assistant messages are model responses. Message history maintains conversation context.

Tokens are text units that models process. Each model has token limits affecting context length and costs. Roughly 1 token equals 0.75 words in English. Token counting impacts both billing and technical constraints.

Streaming delivers responses incrementally instead of waiting for completion. Essential for real-time applications and long responses. Non-streaming returns complete responses in single requests.

Fine-tuning customizes models on specific datasets to improve performance for particular use cases, changing model behavior beyond what prompting alone can achieve.

Key API surface

Endpoint	Purpose
`chat.completions.create()`	Generate conversational responses with GPT models
`completions.create()`	Generate text completions (legacy models)
`images.generate()`	Create images with DALL-E models
`images.createVariation()`	Generate variations of existing images
`audio.transcriptions.create()`	Convert speech to text with Whisper
`audio.translations.create()`	Translate audio to English text
`audio.speech.create()`	Convert text to speech
`embeddings.create()`	Generate vector embeddings for text
`files.create()`	Upload files for fine-tuning or assistants
`fine_tuning.jobs.create()`	Start model fine-tuning jobs
`models.list()`	List available models and capabilities
`moderations.create()`	Check content for policy violations

Common patterns

System-prompted conversations control model behavior:

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user", content: "Explain async/await in JavaScript" }
  ]
});

Streaming responses for real-time UX:

const stream = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Function calling for structured outputs:

const response = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [{ role: "user", content: "What's the weather in Boston?" }],
  functions: [{
    name: "get_weather",
    description: "Get current weather",
    parameters: {
      type: "object",
      properties: { location: { type: "string" } }
    }
  }]
});

Image generation with customization:

const image = await openai.images.generate({
  model: "dall-e-3",
  prompt: "A cyberpunk cityscape at night",
  size: "1024x1024",
  quality: "hd",
  style: "vivid"
});

Audio transcription:

const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream("audio.mp3"),
  model: "whisper-1",
  language: "en"
});

Configuration

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  organization: "org-123", // Optional organization ID
  project: "proj-456", // Optional project ID  
  baseURL: "https://api.openai.com/v1", // API endpoint
  timeout: 60000, // Request timeout in ms
  maxRetries: 2, // Automatic retry attempts
  dangerouslyAllowBrowser: false // Browser usage warning
});

Environment variables:

OPENAI_API_KEY: Required API key
OPENAI_ORG_ID: Organization identifier
OPENAI_PROJECT_ID: Project scope

Best practices

Use system messages to set consistent behavior across conversations. They're more reliable than user instructions for controlling model personality and output format.

Implement exponential backoff for rate limit errors (429) and server errors (5xx). The SDK includes automatic retries, but add custom logic for critical applications.

Count tokens before requests to avoid context limit errors. Use tiktoken library for accurate counting, especially with long conversations or documents.

Stream long responses to improve perceived performance. Buffer chunks appropriately and handle connection interruptions gracefully.

Validate and sanitize inputs before sending to the API. Use the moderation endpoint for user-generated content to ensure policy compliance.

Cache embeddings when possible - they're deterministic and expensive to regenerate. Store vectors in dedicated databases like Pinecone or Weaviate for similarity search.

Set temperature and top_p appropriately: temperature=0 for deterministic outputs, 0.7-0.9 for creative tasks. Lower top_p (0.1) for focused responses, higher (0.9) for diverse outputs.

Use function calling instead of parsing free-form text when you need structured data. It's more reliable and handles edge cases better.

Gotchas and common mistakes

Rate limits vary by model and tier - GPT-4 has lower limits than GPT-3.5. Check headers for remaining quota and implement proper queuing
Context limits are hard boundaries - Requests exceeding max tokens fail entirely. Truncate or summarize long conversations
Streaming responses can fail mid-stream - Always handle incomplete responses and connection drops gracefully
Token counts differ between models - The same text uses different token counts across model families
Fine-tuning requires specific formats - JSONL files with exact schema. Validation errors only appear after upload
Images in chat require vision models - Only GPT-4V and later support image inputs. Regular GPT models will error
Function calling responses may not include function calls - Always check if the model chose to call functions before parsing
Moderation results are not binary - Check specific category scores, not just the flagged boolean
Organization billing applies to all members - API usage counts against organization limits regardless of individual API keys
Browser usage exposes API keys - Never use API keys in client-side code. Always proxy through your backend
Retries can cause duplicate processing - Implement idempotency for non-idempotent operations
Model availability changes - Check model endpoints before deployment. Deprecated models eventually become unavailable