OpenAI Platform logo

openai platform

AI models and API integration

$ npx docs2skills add openai-platform
SKILL.md

OpenAI Platform

AI models and API integration for GPT, DALL-E, Whisper, and embeddings

What this skill does

The OpenAI Platform provides REST API access to state-of-the-art AI models including GPT-4, GPT-3.5, DALL-E 3, Whisper, and text embeddings. It enables developers to integrate conversational AI, text generation, image creation, speech processing, and semantic search capabilities into applications through HTTP requests.

The platform handles model inference, scaling, and optimization behind a unified API. It supports both streaming and batch processing, fine-tuning custom models, and managing conversation context. OpenAI's API is the foundation for ChatGPT and powers thousands of AI applications across web, mobile, and enterprise systems.

Prerequisites

  • OpenAI API key from platform.openai.com
  • HTTP client library (curl, axios, requests, etc.)
  • Node.js 16+ for JavaScript SDK
  • Python 3.7+ for Python SDK
  • Valid payment method for usage-based billing
  • Organization ID for team accounts

Quick start

npm install openai
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const completion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say hello!" }],
  model: "gpt-3.5-turbo",
});

console.log(completion.choices[0].message.content);

Core concepts

Models are the AI systems you interact with. Each model has different capabilities, costs, and context limits. GPT-4 excels at complex reasoning, GPT-3.5-turbo balances performance and cost, DALL-E generates images, and Whisper processes audio.

Messages structure conversations with roles (system, user, assistant) that provide context and control behavior. The system message sets instructions, user messages contain prompts, and assistant messages are model responses. Message history maintains conversation context.

Tokens are text units that models process. Each model has token limits affecting context length and costs. Roughly 1 token equals 0.75 words in English. Token counting impacts both billing and technical constraints.

Streaming delivers responses incrementally instead of waiting for completion. Essential for real-time applications and long responses. Non-streaming returns complete responses in single requests.

Fine-tuning customizes models on specific datasets to improve performance for particular use cases, changing model behavior beyond what prompting alone can achieve.

Key API surface

EndpointPurpose
chat.completions.create()Generate conversational responses with GPT models
completions.create()Generate text completions (legacy models)
images.generate()Create images with DALL-E models
images.createVariation()Generate variations of existing images
audio.transcriptions.create()Convert speech to text with Whisper
audio.translations.create()Translate audio to English text
audio.speech.create()Convert text to speech
embeddings.create()Generate vector embeddings for text
files.create()Upload files for fine-tuning or assistants
fine_tuning.jobs.create()Start model fine-tuning jobs
models.list()List available models and capabilities
moderations.create()Check content for policy violations

Common patterns

System-prompted conversations control model behavior:

const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "You are a helpful coding assistant." },
    { role: "user", content: "Explain async/await in JavaScript" }
  ]
});

Streaming responses for real-time UX:

const stream = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Function calling for structured outputs:

const response = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [{ role: "user", content: "What's the weather in Boston?" }],
  functions: [{
    name: "get_weather",
    description: "Get current weather",
    parameters: {
      type: "object",
      properties: { location: { type: "string" } }
    }
  }]
});

Image generation with customization:

const image = await openai.images.generate({
  model: "dall-e-3",
  prompt: "A cyberpunk cityscape at night",
  size: "1024x1024",
  quality: "hd",
  style: "vivid"
});

Audio transcription:

const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream("audio.mp3"),
  model: "whisper-1",
  language: "en"
});

Configuration

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  organization: "org-123", // Optional organization ID
  project: "proj-456", // Optional project ID  
  baseURL: "https://api.openai.com/v1", // API endpoint
  timeout: 60000, // Request timeout in ms
  maxRetries: 2, // Automatic retry attempts
  dangerouslyAllowBrowser: false // Browser usage warning
});

Environment variables:

  • OPENAI_API_KEY: Required API key
  • OPENAI_ORG_ID: Organization identifier
  • OPENAI_PROJECT_ID: Project scope

Best practices

Use system messages to set consistent behavior across conversations. They're more reliable than user instructions for controlling model personality and output format.

Implement exponential backoff for rate limit errors (429) and server errors (5xx). The SDK includes automatic retries, but add custom logic for critical applications.

Count tokens before requests to avoid context limit errors. Use tiktoken library for accurate counting, especially with long conversations or documents.

Stream long responses to improve perceived performance. Buffer chunks appropriately and handle connection interruptions gracefully.

Validate and sanitize inputs before sending to the API. Use the moderation endpoint for user-generated content to ensure policy compliance.

Cache embeddings when possible - they're deterministic and expensive to regenerate. Store vectors in dedicated databases like Pinecone or Weaviate for similarity search.

Set temperature and top_p appropriately: temperature=0 for deterministic outputs, 0.7-0.9 for creative tasks. Lower top_p (0.1) for focused responses, higher (0.9) for diverse outputs.

Use function calling instead of parsing free-form text when you need structured data. It's more reliable and handles edge cases better.

Gotchas and common mistakes

  • Rate limits vary by model and tier - GPT-4 has lower limits than GPT-3.5. Check headers for remaining quota and implement proper queuing
  • Context limits are hard boundaries - Requests exceeding max tokens fail entirely. Truncate or summarize long conversations
  • Streaming responses can fail mid-stream - Always handle incomplete responses and connection drops gracefully
  • Token counts differ between models - The same text uses different token counts across model families
  • Fine-tuning requires specific formats - JSONL files with exact schema. Validation errors only appear after upload
  • Images in chat require vision models - Only GPT-4V and later support image inputs. Regular GPT models will error
  • Function calling responses may not include function calls - Always check if the model chose to call functions before parsing
  • Moderation results are not binary - Check specific category scores, not just the flagged boolean
  • Organization billing applies to all members - API usage counts against organization limits regardless of individual API keys
  • Browser usage exposes API keys - Never use API keys in client-side code. Always proxy through your backend
  • Retries can cause duplicate processing - Implement idempotency for non-idempotent operations
  • Model availability changes - Check model endpoints before deployment. Deprecated models eventually become unavailable