System Architecture
Architectural Overview
The system is structured as a modern three-tier web application with a clear separation between presentation, business logic, and data layers. The architecture prioritizes scalability, maintainability, and performance for handling large-scale knowledge graph visualization.
System Components:
┌─────────────────────┐
│ Frontend (Next.js) │
│ - React Three Fiber│
│ - TypeScript │
└──────────┬──────────┘
│
│ HTTP/REST API
│
┌──────────▼──────────┐
│ Backend (FastAPI) │
│ - Python │
│ - AI Layer │
└──────────┬──────────┘
│
│ SQL + pgvector
│
┌──────────▼──────────┐
│ PostgreSQL │
│ - pgvector ext │
│ - Full-text search │
└─────────────────────┘
Design Principles
- Separation of concerns: Clear boundaries between data processing, business logic, and presentation
- API-first design: Backend exposes RESTful API consumed by frontend, enabling future client diversity
- Stateless services: Backend API is stateless, allowing horizontal scaling
- Offline data generation: Expensive computations (embeddings, UMAP, clustering) performed offline in pipeline
- Client-side rendering: 3D visualization rendered on client GPU for performance and responsiveness
Frontend Architecture
Technology Stack
Next.js 14 (App Router)
React metaframework providing server-side rendering, file-based routing, API routes, and optimized production builds. The App Router paradigm enables React Server Components for improved performance.
React Three Fiber
React renderer for Three.js, enabling declarative 3D scene construction with React component patterns. Handles WebGL context management, frame loop orchestration, and scene graph updates.
Three.js
WebGL abstraction library providing camera controls, geometry primitives, materials, lighting, and rendering pipeline. Handles GPU-accelerated rendering of thousands of nodes and edges.
TypeScript
Statically typed superset of JavaScript providing compile-time type checking, enhanced IDE support, and improved code maintainability for large codebases.
Component Architecture
The frontend is organized into a hierarchical component structure:
Component Hierarchy:
App
├── Layout (navigation, metadata)
├── Page (route-specific content)
│ ├── Scene (3D visualization container)
│ │ ├── ArticleNode (individual node rendering)
│ │ ├── Edge (connection line rendering)
│ │ └── GraphControls (camera, interaction)
│ ├── SearchPanel (query interface)
│ └── DetailPanel (article information)
└── Documentation Pages (static content)
Scene Component
Root 3D component managing WebGL canvas, camera setup, and coordinate system. Orchestrates rendering of all nodes and edges, handles user interaction events, and coordinates with UI panels for selection and search highlighting.
ArticleNode Component
Renders individual article as instanced geometry (spheres). Handles hover states, click interactions, and visual encoding (color, size). Uses GPU instancing for efficient rendering of thousands of nodes.
Edge Component
Renders connections between articles as line segments. Uses BufferGeometry for efficient GPU-based line rendering. Color and width encode edge properties (direction, weight).
State Management
Application state is managed through React hooks and context:
useState for Local Component State
UI state like panel visibility, hover targets, and input values managed locally within components.
Context API for Shared State
Global state like selected article, search results, and graph data shared across components via React Context. Avoids prop drilling while maintaining React's unidirectional data flow.
No External State Library
Application complexity doesn't warrant Redux or similar. React's built-in state management provides sufficient control and predictability.
Data Loading Strategy
Graph data is fetched from backend API on application load:
Load sequence:
1. Component mount triggers useEffect hook
2. Fetch all articles: GET /api/articles
3. Fetch all edges: GET /api/graph/edges
4. Fetch cluster metadata: GET /api/clusters
5. Parse JSON responses into typed interfaces
6. Render scene with loaded data
Data is cached in component state after initial load. No real-time updates or polling—data is considered static for visualization session duration.
Performance Optimizations
- GPU instancing: Render thousands of identical geometries (spheres) in single draw call
- Frustum culling: Three.js automatically culls objects outside camera view
- Level of detail: Can reduce geometry complexity for distant objects (not currently implemented)
- BufferGeometry: Use typed arrays for geometry data, avoiding JavaScript object overhead
- React.memo: Memoize components to prevent unnecessary re-renders
Backend Architecture
Technology Stack
FastAPI
Modern Python web framework built on Starlette and Pydantic. Provides automatic OpenAPI documentation, request validation, dependency injection, and high performance (comparable to Node.js and Go).
SQLAlchemy
Python SQL toolkit and ORM providing database abstraction, query construction, and connection pooling. Enables database-agnostic code and migration management.
Pydantic
Data validation library using Python type annotations. Provides automatic request/response validation, serialization, and API schema generation.
API Structure
The API is organized into logical route modules:
Route Organization:
/api
├── /articles
│ ├── GET / (list all articles)
│ ├── GET /:id (get single article)
│ └── GET /:id/connections (get article edges)
├── /clusters
│ ├── GET / (list all clusters)
│ └── GET /:id (get cluster details)
├── /graph
│ ├── GET /edges (all graph edges)
│ └── GET /stats (graph statistics)
└── /ai
├── POST /search/semantic (semantic search)
├── POST /search/text (keyword search)
└── POST /journey (topic journey generation)
Request/Response Flow
Typical request processing follows this pattern:
1. HTTP request arrives at FastAPI application
2. Route handler matched based on path and method
3. Pydantic validates request parameters/body
4. Dependencies injected (database session, config)
5. Business logic executes (database queries, AI calls)
6. Response model constructed from results
7. Pydantic serializes response to JSON
8. FastAPI sends HTTP response with status code
Database Layer
Connection Pooling
SQLAlchemy connection pool maintains pool of database connections, reusing them across requests. Avoids connection overhead (TCP handshake, authentication) for each request.
ORM Models
Python classes map to database tables: Article, Edge, Cluster, Region. ORM handles query construction, result mapping, and relationship loading.
Query Optimization
Eager loading (joinedload) used for frequently accessed relationships. Indexes on commonly queried columns (id, cluster_id). Vector index (IVFFlat or HNSW) for embedding similarity queries.
Error Handling
The API uses HTTP status codes and structured error responses:
Status Codes:
200 OK - Successful request
400 Bad Request - Invalid input (Pydantic validation error)
404 Not Found - Resource doesn't exist
500 Internal Server Error - Unexpected error
Error Response Format:
{
"detail": "Error message",
"error_type": "ValidationError"
}
AI Layer
Integration Architecture
AI functionality is encapsulated in a dedicated module within the backend, isolating external API dependencies and enabling easier testing and model swapping:
AI Module Structure:
app/ai/
├── embeddings.py (OpenAI embedding generation)
├── semantic_search.py (vector similarity search)
├── llm.py (GPT-4 text generation)
├── cluster_naming.py (cluster name generation)
├── region_summary.py (region description generation)
└── journey.py (topic journey path finding)
OpenAI API Integration
Embedding Generation
Uses text-embedding-ada-002 model via OpenAI Python SDK. Handles batching (up to 2048 texts per request), retry logic, and rate limiting. Embeddings cached in database to avoid regeneration.
Text Generation
Uses GPT-4 or GPT-3.5-turbo for cluster naming and region summaries. Prompt engineering ensures consistent output format. Temperature=0.7 for creative but coherent names. Max tokens limited to control costs.
Error Handling
Exponential backoff retry for rate limit errors. Fallback to cached or default values if API unavailable. Logging of all API calls for debugging and cost tracking.
Asynchronous Processing
Some AI operations are expensive and performed asynchronously:
- Embedding generation: Performed offline in data pipeline, not real-time during requests
- Cluster naming: Generated after clustering completes, cached in database
- Search queries: Executed synchronously but optimized with vector indexes for sub-second response
Database Schema
Core Tables
articles:
├── id (integer, PK)
├── title (text, indexed)
├── content (text)
├── url (text)
├── embedding (vector(1536), indexed)
├── x, y, z (float, 3D coordinates)
├── cluster_id (integer, FK)
└── created_at (timestamp)
edges:
├── id (integer, PK)
├── source_id (integer, FK to articles)
├── target_id (integer, FK to articles)
├── weight (integer, 1 or 2)
└── is_bidirectional (boolean)
clusters:
├── id (integer, PK)
├── name (text)
├── member_count (integer)
├── centroid_x, centroid_y, centroid_z (float)
└── color (text, hex code)
Indexes
- articles.id: B-tree primary key index for fast single-article lookup
- articles.title: GIN index for full-text search
- articles.embedding: IVFFlat or HNSW index for vector similarity search
- edges(source_id, target_id): Composite index for edge queries
- articles.cluster_id: B-tree index for cluster membership queries
Vector Extension (pgvector)
PostgreSQL extension enabling vector storage and similarity search:
CREATE EXTENSION vector;
ALTER TABLE articles ADD COLUMN embedding vector(1536);
CREATE INDEX ON articles USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Similarity query:
SELECT * FROM articles ORDER BY embedding <=> query_vector LIMIT 20;
Data Pipeline
Data processing occurs offline in a separate pipeline module, not in the web application runtime:
Pipeline Stages:
1. Data Extraction
- Fetch Wikipedia articles via API
- Parse article text and metadata
- Extract inter-article links
2. Embedding Generation
- Batch articles into groups of 2048
- Call OpenAI API for embeddings
- Store embeddings in database
3. Dimensionality Reduction
- Load all embeddings from database
- Run UMAP to generate 3D coordinates
- Update article x, y, z columns
4. Clustering
- Run HDBSCAN on 3D coordinates
- Assign cluster IDs to articles
- Generate cluster names via GPT-4
5. Edge Construction
- Process Wikipedia links into edges
- Detect bidirectional connections
- Calculate edge weights
- Insert into edges table
Pipeline Execution
Pipeline is executed via command-line interface:
$ python -m pipeline extract --count 1000
$ python -m pipeline embed
$ python -m pipeline reduce
$ python -m pipeline cluster
$ python -m pipeline edges
Each stage can be run independently, enabling iterative development and debugging. Intermediate results stored in database, allowing pipeline resumption if interrupted.
Deployment Architecture
Production Environment
Frontend Deployment
Next.js application deployed as static export or Node.js server. CDN (Cloudflare, Vercel) serves static assets (JS bundles, CSS) with edge caching. WebGL rendering occurs client-side, so no server GPU required.
Backend Deployment
FastAPI application runs as ASGI server (Uvicorn or Gunicorn with Uvicorn workers). Deployed on cloud platform (AWS, GCP, Render) behind HTTPS load balancer. Horizontal scaling adds more application server instances.
Database Deployment
PostgreSQL with pgvector extension. Managed service (RDS, Cloud SQL, Supabase) handles backups, replication, and maintenance. Vector indexes require sufficient memory for good performance.
Scaling Considerations
- Frontend: Scales horizontally easily—multiple CDN edge nodes serve static content
- Backend API: Stateless design enables horizontal scaling with load balancer
- Database reads: Can use read replicas for scaled read throughput
- Vector search: May require database vertical scaling (more RAM) for large corpora
- OpenAI API: Rate limits and costs scale with usage; consider caching and batching