System Architecture

Architectural Overview

The system is structured as a modern three-tier web application with a clear separation between presentation, business logic, and data layers. The architecture prioritizes scalability, maintainability, and performance for handling large-scale knowledge graph visualization.

System Components:

┌─────────────────────┐

│ Frontend (Next.js) │

│ - React Three Fiber│

│ - TypeScript │

└──────────┬──────────┘

│

│ HTTP/REST API

│

┌──────────▼──────────┐

│ Backend (FastAPI) │

│ - Python │

│ - AI Layer │

└──────────┬──────────┘

│

│ SQL + pgvector

│

┌──────────▼──────────┐

│ PostgreSQL │

│ - pgvector ext │

│ - Full-text search │

└─────────────────────┘

Design Principles

Separation of concerns: Clear boundaries between data processing, business logic, and presentation
API-first design: Backend exposes RESTful API consumed by frontend, enabling future client diversity
Stateless services: Backend API is stateless, allowing horizontal scaling
Offline data generation: Expensive computations (embeddings, UMAP, clustering) performed offline in pipeline
Client-side rendering: 3D visualization rendered on client GPU for performance and responsiveness

Frontend Architecture

Technology Stack

Next.js 14 (App Router)

React metaframework providing server-side rendering, file-based routing, API routes, and optimized production builds. The App Router paradigm enables React Server Components for improved performance.

React Three Fiber

React renderer for Three.js, enabling declarative 3D scene construction with React component patterns. Handles WebGL context management, frame loop orchestration, and scene graph updates.

Three.js

WebGL abstraction library providing camera controls, geometry primitives, materials, lighting, and rendering pipeline. Handles GPU-accelerated rendering of thousands of nodes and edges.

TypeScript

Statically typed superset of JavaScript providing compile-time type checking, enhanced IDE support, and improved code maintainability for large codebases.

Component Architecture

The frontend is organized into a hierarchical component structure:

Component Hierarchy:

App

├── Layout (navigation, metadata)

├── Page (route-specific content)

│ ├── Scene (3D visualization container)

│ │ ├── ArticleNode (individual node rendering)

│ │ ├── Edge (connection line rendering)

│ │ └── GraphControls (camera, interaction)

│ ├── SearchPanel (query interface)

│ └── DetailPanel (article information)

└── Documentation Pages (static content)

Scene Component

Root 3D component managing WebGL canvas, camera setup, and coordinate system. Orchestrates rendering of all nodes and edges, handles user interaction events, and coordinates with UI panels for selection and search highlighting.

ArticleNode Component

Renders individual article as instanced geometry (spheres). Handles hover states, click interactions, and visual encoding (color, size). Uses GPU instancing for efficient rendering of thousands of nodes.

Edge Component

Renders connections between articles as line segments. Uses BufferGeometry for efficient GPU-based line rendering. Color and width encode edge properties (direction, weight).

State Management

Application state is managed through React hooks and context:

useState for Local Component State

UI state like panel visibility, hover targets, and input values managed locally within components.

Context API for Shared State

Global state like selected article, search results, and graph data shared across components via React Context. Avoids prop drilling while maintaining React's unidirectional data flow.

No External State Library

Application complexity doesn't warrant Redux or similar. React's built-in state management provides sufficient control and predictability.

Data Loading Strategy

Graph data is fetched from backend API on application load:

Load sequence:

1. Component mount triggers useEffect hook

2. Fetch all articles: GET /api/articles

3. Fetch all edges: GET /api/graph/edges

4. Fetch cluster metadata: GET /api/clusters

5. Parse JSON responses into typed interfaces

6. Render scene with loaded data

Data is cached in component state after initial load. No real-time updates or polling—data is considered static for visualization session duration.

Performance Optimizations

GPU instancing: Render thousands of identical geometries (spheres) in single draw call
Frustum culling: Three.js automatically culls objects outside camera view
Level of detail: Can reduce geometry complexity for distant objects (not currently implemented)
BufferGeometry: Use typed arrays for geometry data, avoiding JavaScript object overhead
React.memo: Memoize components to prevent unnecessary re-renders

Backend Architecture

Technology Stack

FastAPI

Modern Python web framework built on Starlette and Pydantic. Provides automatic OpenAPI documentation, request validation, dependency injection, and high performance (comparable to Node.js and Go).

SQLAlchemy

Python SQL toolkit and ORM providing database abstraction, query construction, and connection pooling. Enables database-agnostic code and migration management.

Pydantic

Data validation library using Python type annotations. Provides automatic request/response validation, serialization, and API schema generation.

API Structure

The API is organized into logical route modules:

Route Organization:

/api

├── /articles

│ ├── GET / (list all articles)

│ ├── GET /:id (get single article)

│ └── GET /:id/connections (get article edges)

├── /clusters

│ ├── GET / (list all clusters)

│ └── GET /:id (get cluster details)

├── /graph

│ ├── GET /edges (all graph edges)

│ └── GET /stats (graph statistics)

└── /ai

├── POST /search/semantic (semantic search)

├── POST /search/text (keyword search)

└── POST /journey (topic journey generation)

Request/Response Flow

Typical request processing follows this pattern:

1. HTTP request arrives at FastAPI application

2. Route handler matched based on path and method

3. Pydantic validates request parameters/body

4. Dependencies injected (database session, config)

5. Business logic executes (database queries, AI calls)

6. Response model constructed from results

7. Pydantic serializes response to JSON

8. FastAPI sends HTTP response with status code

Database Layer

Connection Pooling

SQLAlchemy connection pool maintains pool of database connections, reusing them across requests. Avoids connection overhead (TCP handshake, authentication) for each request.

ORM Models

Python classes map to database tables: Article, Edge, Cluster, Region. ORM handles query construction, result mapping, and relationship loading.

Query Optimization

Eager loading (joinedload) used for frequently accessed relationships. Indexes on commonly queried columns (id, cluster_id). Vector index (IVFFlat or HNSW) for embedding similarity queries.

Error Handling

The API uses HTTP status codes and structured error responses:

Status Codes:

200 OK - Successful request

400 Bad Request - Invalid input (Pydantic validation error)

404 Not Found - Resource doesn't exist

500 Internal Server Error - Unexpected error

Error Response Format:

{

"detail": "Error message",

"error_type": "ValidationError"

}

AI Layer

Integration Architecture

AI functionality is encapsulated in a dedicated module within the backend, isolating external API dependencies and enabling easier testing and model swapping:

AI Module Structure:

app/ai/

├── embeddings.py (OpenAI embedding generation)

├── semantic_search.py (vector similarity search)

├── llm.py (GPT-4 text generation)

├── cluster_naming.py (cluster name generation)

├── region_summary.py (region description generation)

└── journey.py (topic journey path finding)

OpenAI API Integration

Embedding Generation

Uses text-embedding-ada-002 model via OpenAI Python SDK. Handles batching (up to 2048 texts per request), retry logic, and rate limiting. Embeddings cached in database to avoid regeneration.

Text Generation

Uses GPT-4 or GPT-3.5-turbo for cluster naming and region summaries. Prompt engineering ensures consistent output format. Temperature=0.7 for creative but coherent names. Max tokens limited to control costs.

Error Handling

Exponential backoff retry for rate limit errors. Fallback to cached or default values if API unavailable. Logging of all API calls for debugging and cost tracking.

Asynchronous Processing

Some AI operations are expensive and performed asynchronously:

Embedding generation: Performed offline in data pipeline, not real-time during requests
Cluster naming: Generated after clustering completes, cached in database
Search queries: Executed synchronously but optimized with vector indexes for sub-second response

Database Schema

Core Tables

articles:

├── id (integer, PK)

├── title (text, indexed)

├── content (text)

├── url (text)

├── embedding (vector(1536), indexed)

├── x, y, z (float, 3D coordinates)

├── cluster_id (integer, FK)

└── created_at (timestamp)

edges:

├── id (integer, PK)

├── source_id (integer, FK to articles)

├── target_id (integer, FK to articles)

├── weight (integer, 1 or 2)

└── is_bidirectional (boolean)

clusters:

├── id (integer, PK)

├── name (text)

├── member_count (integer)

├── centroid_x, centroid_y, centroid_z (float)

└── color (text, hex code)

Indexes

articles.id: B-tree primary key index for fast single-article lookup
articles.title: GIN index for full-text search
articles.embedding: IVFFlat or HNSW index for vector similarity search
edges(source_id, target_id): Composite index for edge queries
articles.cluster_id: B-tree index for cluster membership queries

Vector Extension (pgvector)

PostgreSQL extension enabling vector storage and similarity search:

CREATE EXTENSION vector;

ALTER TABLE articles ADD COLUMN embedding vector(1536);

CREATE INDEX ON articles USING ivfflat (embedding vector_cosine_ops)

WITH (lists = 100);

-- Similarity query:

SELECT * FROM articles ORDER BY embedding <=> query_vector LIMIT 20;

Data Pipeline

Data processing occurs offline in a separate pipeline module, not in the web application runtime:

Pipeline Stages:

1. Data Extraction

- Fetch Wikipedia articles via API

- Parse article text and metadata

- Extract inter-article links

2. Embedding Generation

- Batch articles into groups of 2048

- Call OpenAI API for embeddings

- Store embeddings in database

3. Dimensionality Reduction

- Load all embeddings from database

- Run UMAP to generate 3D coordinates

- Update article x, y, z columns

4. Clustering

- Run HDBSCAN on 3D coordinates

- Assign cluster IDs to articles

- Generate cluster names via GPT-4

5. Edge Construction

- Process Wikipedia links into edges

- Detect bidirectional connections

- Calculate edge weights

- Insert into edges table

Pipeline Execution

Pipeline is executed via command-line interface:

$ python -m pipeline extract --count 1000

$ python -m pipeline embed

$ python -m pipeline reduce

$ python -m pipeline cluster

$ python -m pipeline edges

Each stage can be run independently, enabling iterative development and debugging. Intermediate results stored in database, allowing pipeline resumption if interrupted.

Deployment Architecture

Production Environment

Frontend Deployment

Next.js application deployed as static export or Node.js server. CDN (Cloudflare, Vercel) serves static assets (JS bundles, CSS) with edge caching. WebGL rendering occurs client-side, so no server GPU required.

Backend Deployment

FastAPI application runs as ASGI server (Uvicorn or Gunicorn with Uvicorn workers). Deployed on cloud platform (AWS, GCP, Render) behind HTTPS load balancer. Horizontal scaling adds more application server instances.

Database Deployment

PostgreSQL with pgvector extension. Managed service (RDS, Cloud SQL, Supabase) handles backups, replication, and maintenance. Vector indexes require sufficient memory for good performance.

Scaling Considerations

Frontend: Scales horizontally easily—multiple CDN edge nodes serve static content
Backend API: Stateless design enables horizontal scaling with load balancer
Database reads: Can use read replicas for scaled read throughput
Vector search: May require database vertical scaling (more RAM) for large corpora
OpenAI API: Rate limits and costs scale with usage; consider caching and batching