langchain python
Platform to build, test, deploy AI agents with tools
$ npx docs2skills add langchain-agent-pyLangChain
Platform for building, testing, and deploying AI agents with production tools
What this skill does
LangChain provides a complete agent engineering platform combining production-grade observability through LangSmith with flexible open-source frameworks. The platform addresses the full agent lifecycle: LangSmith offers detailed tracing, performance metrics, evaluation on production data, prompt versioning with collaboration features, and one-click deployment infrastructure. The open-source frameworks provide different abstraction levels - Deep Agents for quick task-oriented agents, LangChain for customizable building blocks, and LangGraph for low-level orchestration with memory and human-in-the-loop support.
Trusted by AI teams at Replit, Clay, Rippling, Cloudflare, and Workday, the platform solves key production challenges like agent reliability, performance monitoring, continuous improvement through live data, and scaling deployment infrastructure. It meets enterprise requirements with HIPAA, SOC 2 Type 2, and GDPR compliance.
Prerequisites
- Python 3.8+ or Node.js 16+ depending on framework choice
- LangSmith account for observability features
- API keys for LLM providers (OpenAI, Anthropic, etc.)
- Docker for local deployment testing
- Git for prompt version control features
Quick start
pip install langchain langsmith
from langchain_openai import ChatOpenAI
from langsmith import Client
# Initialize with tracing
import os
os.environ["LANGSMITH_API_KEY"] = "your-key"
os.environ["LANGSMITH_TRACING"] = "true"
# Create a simple agent
llm = ChatOpenAI(temperature=0)
response = llm.invoke("What is the capital of France?")
print(response.content)
# View traces at https://smith.langchain.com/
Core concepts
Agent Architecture: LangChain uses a modular approach with chains, agents, and tools. Chains combine multiple components sequentially, agents make decisions about which tools to use, and tools provide specific capabilities like web search or API calls.
Observability Layer: LangSmith automatically captures every agent interaction, creating detailed traces showing input/output, latency, token usage, and decision paths. This observability enables debugging complex agent behaviors and identifying performance bottlenecks.
Evaluation Framework: The platform distinguishes between offline evaluation (testing on datasets) and online evaluation (monitoring production performance). Evaluators can be LLM-based, rule-based, or custom functions measuring accuracy, helpfulness, or domain-specific metrics.
Prompt Engineering Workflow: Version control for prompts with branching, collaboration features, and A/B testing capabilities. Prompts are treated as code artifacts with proper deployment pipelines.
Key API surface
| Component | Purpose |
|---|---|
ChatOpenAI(), ChatAnthropic() | LLM integrations with streaming |
AgentExecutor.from_agent_and_tools() | Create agents with tool access |
PromptTemplate.from_template() | Template prompts with variables |
LangSmithClient.create_run() | Manual trace creation |
@traceable | Decorator for function tracing |
evaluate() | Run evaluations on datasets |
LangGraph.StateGraph() | Define agent workflows |
VectorStore.similarity_search() | RAG document retrieval |
Tool.from_function() | Convert functions to agent tools |
RunnableSequence | Chain multiple components |
Common patterns
RAG Agent with Tracing:
from langchain.agents import create_retrieval_agent
from langchain.vectorstores import Chroma
from langsmith import traceable
@traceable
def rag_agent(query: str):
vectorstore = Chroma.from_documents(documents)
agent = create_retrieval_agent(llm, vectorstore.as_retriever())
return agent.invoke(query)
Custom Tool Integration:
from langchain.tools import Tool
def weather_tool(location: str) -> str:
# Your weather API call
return f"Weather in {location}: Sunny"
tools = [Tool.from_function(
func=weather_tool,
name="weather",
description="Get weather for a location"
)]
Evaluation Pipeline:
from langsmith import evaluate
def accuracy_evaluator(run, example):
return {"score": 1 if run.outputs["answer"] == example.outputs["answer"] else 0}
evaluate(
lambda inputs: my_agent.invoke(inputs),
data="my_dataset",
evaluators=[accuracy_evaluator]
)
Configuration
| Setting | Default | Purpose |
|---|---|---|
LANGSMITH_TRACING | false | Enable automatic tracing |
LANGSMITH_PROJECT | default | Project for organizing traces |
LANGCHAIN_VERBOSE | false | Debug logging |
LANGCHAIN_CACHE | false | Enable LLM response caching |
LANGSMITH_ENDPOINT | API default | Custom LangSmith instance |
Best practices
Structure for Observability: Design agents with clear intermediate steps. Use named chains and add metadata to traces for easier debugging. Tag runs with environment info.
Evaluation-Driven Development: Start with evaluation datasets before building agents. Define success metrics early and run evaluations on every change. Use both automated and human evaluation.
Prompt Versioning: Store prompts in LangSmith with semantic versions. Use feature flags for A/B testing prompts. Document prompt changes with context about why changes were made.
Error Handling: Wrap agent calls in retry logic with exponential backoff. Log failures with full context. Use fallback agents for critical paths.
Memory Management: For long conversations, implement context window management. Use LangGraph's built-in memory for stateful agents. Persist important context to external storage.
Gotchas and common mistakes
- Token Limits: LangChain doesn't automatically handle context window limits. Implement truncation or summarization for long conversations.
- API Rate Limits: Default retry logic may not handle all provider rate limits. Configure custom retry strategies per provider.
- Async Context: Tracing may not work correctly in async contexts without proper setup. Use
await acall()methods consistently. - Tool Schema Validation: Agent tools need proper type hints and descriptions. Missing or poor descriptions cause tool selection failures.
- State Persistence: LangGraph state isn't automatically persisted. Implement custom persistence for production workflows.
- Evaluation Bias: LLM evaluators can be biased toward certain response styles. Use multiple evaluation methods and human validation.
- Memory Leaks: Long-running agents may accumulate memory. Implement periodic cleanup of conversation history.
- Version Mismatches: LangChain and LangSmith versions must be compatible. Pin versions in production.
- Trace Sampling: High-volume applications should implement trace sampling to avoid overwhelming LangSmith.
- Security: Never log sensitive data in traces. Configure trace filtering for PII.