AI Chatbot in Next.js App Router: Context-First Approach

The Problem With Generic AI Chatbots

Most portfolio chatbots are useless. You ask "what projects has Fahd worked on?" and it either hallucinates or says "I don't have that information." The reason is obvious in hindsight: you gave the model no context about the person it's supposed to represent. This week I shipped two AI features for this site — a chat widget that answers questions about my work, and a project estimator that helps clients ballpark scope. Both hit the same pattern: static context data fed into a tight system prompt. No vector database, no RAG pipeline, no embeddings. Just a well-structured data file and a lean API route.

Here's exactly how it works.

Build the Context Data File First

The first thing I built was data/ai-context.ts. Not the API route, not the component — the data. If the model doesn't know what to say, nothing else matters.

The file exports a single getContextString() function that assembles a compact, text-friendly summary of everything the assistant needs: bio, skills, projects with descriptions, contact info. The key constraint is token budget — you're paying per token and you have a context window to respect, so be ruthless about what you include.

// data/ai-context.ts
export function getContextString(): string {
  return `
You are Fahd's portfolio assistant. Answer questions about Fahd Gamad, a full-stack MERN
engineer based in Cairo, Egypt. Use only the information below — do not invent details.

## About
Full-stack engineer specializing in MongoDB, Express, React, and Node.js.
Currently open to senior roles and consulting work.

## Projects
- OrderX: Restaurant management SaaS. Real-time orders, multi-tenant architecture,
  role-based access. Stack: React, Node, MongoDB, Socket.io.
- Oppozite Wears: E-commerce for a Cairo streetwear brand. Custom storefront,
  Stripe payments, inventory management.
- FastUp: Delivery tracking platform with live driver updates and SMS notifications.

## Contact
Email: fahdscode@gmail.com | Available for freelance and full-time.
`.trim()
}

Notice what's missing: metrics I haven't confirmed, client names that are confidential, anything I'd be embarrassed to have the model say out loud. The instruction "use only the information below — do not invent details" in the system prompt does most of the grounding work. The model is not magic — it will stay in bounds if you tell it to stay in bounds.

The API Route

With the context in place, the route handler is almost boring:

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk'
import { getContextString } from '@/data/ai-context'

const client = new Anthropic()

export async function POST(request: Request) {
  const { messages } = await request.json()

  if (!Array.isArray(messages) || messages.length === 0) {
    return new Response('Invalid request', { status: 400 })
  }

  const stream = await client.messages.stream({
    model: 'claude-haiku-4-5-20251001',
    max_tokens: 512,
    system: getContextString(),
    messages,
  })

  return new Response(stream.toReadableStream(), {
    headers: { 'Content-Type': 'text/event-stream' },
  })
}

A few deliberate choices here.

Haiku, not Sonnet. For a portfolio chatbot, Haiku is fast enough and cheap enough that I don't have to think about rate limits or cost. Sonnet is overkill when your context fits in 600 tokens and your answers are two paragraphs.

max_tokens: 512. I want concise answers, not essays. Capping tokens also prevents the model from rambling when it gets an open-ended question.

Stream from day one. Even short responses feel sluggish if you wait for the full completion before rendering anything. Streaming makes the chat feel alive. The Anthropic SDK's .toReadableStream() converts the async iterator into a Web Streams ReadableStream, which Next.js Route Handlers accept natively — no wrapper needed.

Messages come from the client. The route receives the full message history on every request. This is stateless — no session storage, no database. For a portfolio assistant handling casual visitor questions, that's the right call. If I needed persistent history across sessions I'd reach for Upstash, but that complexity isn't justified here.

The Project Estimator Variant

The estimator uses the same route pattern with a different framing. Instead of a conversational assistant, it takes structured form input — project type, rough scope, timeline — and returns a tiered estimate.

The system prompt changes entirely:

const ESTIMATOR_SYSTEM = `
You are a senior full-stack engineer estimating project scope.
Given the inputs below, return three tiers: Lean MVP, Full Build, and Premium.
For each tier: estimated weeks, ballpark cost range in USD, and 3 bullet points
on what's included vs. deferred. Be direct. Do not hedge.
`.trim()

Same API client, same model, completely different behavior — because the system prompt drives everything. This is the core insight: your system prompt is your product. The Claude API is infrastructure. What you do with the system prompt is the engineering.

For the estimator I also skip streaming. The user fills out a form and waits — a different UX contract than a chat window. A single client.messages.create() call is simpler and the latency is acceptable for a one-shot structured response.

Consuming the Stream on the Client

On the client side, the chat widget manages the message array in local state and POSTs to /api/chat on each submit. Consuming a streaming response is a ReadableStream + TextDecoder:

const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ messages }),
})

const reader = response.body!.getReader()
const decoder = new TextDecoder()

let assistantMessage = ''
while (true) {
  const { done, value } = await reader.read()
  if (done) break
  assistantMessage += decoder.decode(value, { stream: true })
  setCurrentReply(assistantMessage)
}

The setCurrentReply call on every chunk produces the typewriter effect — no animation library, just React state updating as the stream arrives. One gotcha: Anthropic's raw HTTP endpoint sends SSE events with data: prefixes and JSON payloads that you'd need to parse manually. Using stream.toReadableStream() from the SDK sidesteps all of that. Use the SDK.

What I'd Change at Scale

This setup works well for a low-traffic portfolio with a narrow, bounded use case. If I were wiring this into a product — a customer support bot for OrderX or a scoping tool inside a client-facing SaaS — I'd make a few changes:

Rate limiting. Right now anyone can POST to /api/chat and rack up API charges. On a personal site, obscurity is the protection. In a product, you add a token bucket at the edge — Upstash Rate Limit is a 10-minute integration with Next.js middleware.

Input validation. I validate that messages is an array, but I don't cap message length or history depth. A malicious user can send a 100-message history to inflate token usage. In production, truncate history to the last N turns before passing it to the API.

Context retrieval. If the knowledge base grows — multiple products, long documentation, team members — you'd swap the static string for a retrieval step. Embed the query, pull the top-k chunks, inject them. But that's a RAG pipeline, and a RAG pipeline is infrastructure. Start with a static file; add retrieval when the context genuinely doesn't fit.

Takeaway

Build your context data file first, write a 20-line route handler, stream the response back. The AI capability is table stakes — the SDK makes calling the API nearly trivial. The engineering is in what you tell the model, what you leave out, and how you structure the context so it can't lie about things it doesn't know.

Both the chat widget and the estimator shipped in a single afternoon. The time wasn't spent on AI integration. It was spent writing the context data well enough that the model's answers were actually true.