Building AI Comic Studio

An AI-Powered Comic Creation Tool for Education

JavaScript Firebase Gemini OpenAI IndexedDB

The Problem

Teaching complex topics—like the Great Oxygenation Event in Earth science—can be challenging. Students often struggle to engage with abstract scientific concepts that happened billions of years ago. Visual storytelling can bridge this gap, but creating comics or graphic novels traditionally requires artistic skills many students don't have.

Existing AI image generators are either too expensive for classroom use, require technical expertise, or lack the guardrails needed for educational settings. Teachers need a tool that's simple enough for students to use independently, safe for the classroom, and produces results that actually look like comic panels.

The Goal

Create a web-based comic creation tool where students describe scenes in plain English and AI generates consistent, comic-styled panels—with built-in content moderation and no data leaving the browser unnecessarily.

AI Comic Studio evolved to solve this: a modular JavaScript application with Firebase Cloud Functions handling AI generation, IndexedDB for local persistence, and a carefully tuned content moderation pipeline to keep things classroom-appropriate.

Architecture Overview

The application follows a modular architecture with clear separation of concerns:

public/js/modules/
├── index.js          // App bootstrap & global interface
├── database.js       // IndexedDB operations
├── storage.js        // Scene data persistence
├── sceneCards.js     // Scene card UI components
├── imageGeneration.js// AI image request handling
├── ui.js             // Comic display & interactions
├── export.js         // PDF generation
├── layout.js         // Panel layout management
├── thumbnailNav.js   // Scene thumbnail navigation
└── settings.js       // User preferences

functions/
└── index.js          // Firebase Cloud Functions (V2)

Key Design Decisions

ES Modules — Clean imports/exports, no build step required for development
Firebase Cloud Functions V2 — Better cold start performance, simpler configuration
IndexedDB over localStorage — Handles larger data (base64 images) without 5MB limits
Pub/Sub for async processing — Decouples request handling from image generation

Multi-Model AI Integration

The system supports multiple AI image generation models, each with different strengths:

Gemini

Google's multimodal model. Default choice—good quality, fast generation.

Imagen 3

Google's dedicated image model. Higher quality, supports person generation controls.

GPT-Image-1

OpenAI's image model. Alternative option with different aesthetic.

Prompt Engineering for Comic Styles

Each scene combines user input with style-specific prompt engineering:

const comicStyleDescriptions = {
  graphic_novel: "a European graphic novel (inspired by Cyril Pedrosa) " +
                 "with precise linework, flat coloring, architectural details...",
  anime_manga:   "A scenic anime-style illustration inspired by Studio Ghibli " +
                 "and Makoto Shinkai, featuring vibrant natural colors...",
  cartoon:       "a simplified, exaggerated cartoon panel with bright, " +
                 "saturated colors akin to Adventure Time...",
  photorealistic: "a photorealistic comic panel with hyper-detailed " +
                  "textures, accurate colors, and complex natural lighting..."
};

Camera Angle Descriptions

Camera angles are translated into detailed composition guidance:

const cameraDescriptions = {
  'birds-eye': "Captured from a bird's eye view, this overhead shot " +
               "looks directly down, showcasing layout and spatial relationships...",
  'dutch-angle': "Captured from a dutch angle, a diagonal composition " +
                 "due to a tilted camera, introduces unease or tension...",
  'low-angle': "Captured from a low angle, this shot looks up from below, " +
               "emphasizing power or dominance..."
};

Async Image Generation with Pub/Sub

Image generation can take 10-30 seconds. Rather than blocking HTTP requests, the system uses Google Cloud Pub/Sub for asynchronous processing:

// 1. Client submits request → gets job ID immediately
POST /api/generate → { jobId: "uuid-..." }

// 2. Request published to Pub/Sub topic
pubSubClient.topic('image-generation-requests').publishMessage({
  data: Buffer.from(JSON.stringify({ jobId, prompt, imageModel, ... }))
});

// 3. Cloud Function triggered by Pub/Sub processes the job
exports.processImageGenerationPubSub = onMessagePublished(TOPIC, async (event) => {
  // Generate image, upload to Cloud Storage, update Firestore
});

// 4. Client polls for status
GET /api/checkStatus?jobId=... → { status: 'completed', imageUrl: '...' }

Why Pub/Sub?

HTTP Cloud Functions have a 60-second timeout. Pub/Sub-triggered functions can run longer, handle retries automatically, and decouple the request/response cycle from actual processing.

Client-Side Polling with Visual Feedback

While waiting, the client shows a pixelation effect that animates through the existing image:

const pixelationFactors = [1, 2, 3, 4, 6, 8, 12, 16, 24, 32];

function drawPixelatedOnCanvas(ctx, canvas, sourceImage, factor) {
  ctx.imageSmoothingEnabled = false;
  const scaledW = Math.floor(canvas.width / factor);
  const scaledH = Math.floor(canvas.height / factor);
  
  // Draw small, then scale up for pixelation effect
  ctx.drawImage(sourceImage, 0, 0, scaledW, scaledH);
  ctx.drawImage(canvas, 0, 0, scaledW, scaledH, 0, 0, canvas.width, canvas.height);
}

Content Moderation Pipeline

For classroom use, content moderation is critical. Every prompt passes through OpenAI's GPT-4o-mini before image generation:

async function checkContentModeration(promptText) {
  const moderationPrompt = `
    Evaluate if the following image generation prompt contains 
    ANY explicitly inappropriate material for educational contexts.
    
    Check for:
    1. Sexual content or innuendo
    2. Graphic depictions of violence or gore
    3. Hate speech, discrimination, or harassment
    4. Illegal activities
    5. Self-harm references
    6. Profanity
    7. Dangerous instructional content
    
    Note: Do NOT flag lighthearted, scientific, educational, or 
    cartoonish references if they are not graphically violent.
    
    Prompt: "${promptText}"
    
    Return JSON: { flagged: boolean, reason: string, categories: [] }
  `;
  
  const result = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: moderationPrompt }],
    response_format: { type: 'json_object' }
  });
  
  return JSON.parse(result.choices[0].message.content);
}

Tuning for Education

The moderation prompt explicitly allows "lighthearted, scientific, educational, or cartoonish" content. This prevents false positives when students describe scenes like "bacteria struggling to survive" or "volcanic eruptions destroying habitats."

Flagged Prompts Logging

Flagged prompts are logged to Firestore for review, helping refine moderation rules over time:

if (moderationResult.flagged) {
  await db.collection('flaggedPrompts').add({
    prompt: promptText,
    reason: moderationResult.reason,
    categories: moderationResult.categories,
    flaggedTimestamp: FieldValue.serverTimestamp()
  });
}

Local-First Storage with IndexedDB

Student work is stored locally in IndexedDB, not on remote servers. This provides privacy, offline access, and avoids backend storage costs:

const DB_NAME = 'graphic-novel-maker';
const SCENES_STORE = 'scenes';
const HISTORY_STORE = 'scene-history';

request.onupgradeneeded = (event) => {
  const db = event.target.result;
  
  // Composite key: [storyType, sceneNumber]
  const scenesStore = db.createObjectStore(SCENES_STORE, { 
    keyPath: ['storyType', 'sceneNumber'] 
  });
  scenesStore.createIndex('by_storyType', 'storyType');
  
  // Auto-increment for history entries
  const historyStore = db.createObjectStore(HISTORY_STORE, { 
    keyPath: 'id', 
    autoIncrement: true 
  });
  historyStore.createIndex('by_scene', ['storyType', 'sceneNumber']);
};

Image Compression

Generated images are compressed before storage to stay within IndexedDB practical limits:

async function compressBase64Image(base64, quality = 0.7, maxWidth = 1024) {
  const img = new Image();
  img.src = base64;
  await new Promise(r => img.onload = r);
  
  let { width, height } = img;
  if (width > maxWidth) {
    height = (height * maxWidth) / width;
    width = maxWidth;
  }
  
  const canvas = document.createElement('canvas');
  canvas.width = width;
  canvas.height = height;
  canvas.getContext('2d').drawImage(img, 0, 0, width, height);
  
  return canvas.toDataURL('image/jpeg', quality);
}

Scene History

Every image generation saves the previous version to history, allowing students to restore earlier attempts:

// Before generating new image, save current to history
if (currentScene.imageUrl) {
  await saveSceneToHistory(currentScene);
}

// History modal lets students browse and restore
const historyItems = await getSceneHistory(storyType, sceneNumber);
// Returns array sorted by timestamp (newest first)

Scene Card Management

Each scene is represented by a card with controls for description, camera angle, mood, and aspect ratio:

Dynamic Scene Cards

Scene cards are created from a template and populated with saved data:

function renderSceneCard(sceneNumber, sceneData) {
  const template = document.getElementById('sceneCardTemplate');
  const card = template.content.cloneNode(true).querySelector('.scene-card');
  
  card.id = `scene-card-${sceneData.id}`;
  card.dataset.sceneId = sceneData.id;
  card.dataset.sceneNumber = sceneNumber;
  
  // Populate camera angle, mood, description, image...
  const imageElement = card.querySelector('.scene-image');
  if (sceneData.imageBase64) {
    imageElement.src = sceneData.imageBase64;
  } else if (sceneData.imageUrl) {
    imageElement.src = sceneData.imageUrl;
  } else {
    imageElement.src = '/img/placeholder.webp';
  }
  
  return card;
}

Thumbnail Navigation

A thumbnail strip provides quick navigation between scenes, with IntersectionObserver tracking which scene is currently visible:

const observer = new IntersectionObserver((entries) => {
  entries.forEach(entry => {
    if (entry.isIntersecting) {
      setActiveThumbnail(entry.target.dataset.sceneId);
    }
  });
}, { root: scrollContainer, threshold: 0.5 });

sceneCards.forEach(card => observer.observe(card));

Aspect Ratio Support

Panels support three aspect ratios that map to AI model parameters:

// Panel types map to generation dimensions
const aspectRatios = {
  'square': { ratio: '1:1',  openai: '1024x1024', google: '1:1' },
  'wide':   { ratio: '2:1',  openai: '1536x1024', google: '16:9' },
  'tall':   { ratio: '1:2',  openai: '1024x1536', google: '9:16' }
};

Comic Layout & PDF Export

The final comic view arranges panels in a grid layout, respecting aspect ratios:

function getSlotsForDimensions(dimensions) {
  if (dimensions.width > dimensions.height) return 2; // Wide: 2 slots
  if (dimensions.height > dimensions.width) return 2; // Tall: 2 slots
  return 1; // Square: 1 slot
}

// CSS Grid handles the layout
.comic-page {
  display: grid;
  grid-template-columns: repeat(2, 1fr);
  gap: 15px;
}
.comic-panel.wide-panel {
  grid-column: span 2;
}

PDF Generation

Export uses html2canvas to render pages, then jsPDF to create a downloadable PDF:

async function exportToPDF() {
  const { jsPDF } = window.jspdf;
  const pdf = new jsPDF({
    orientation: 'portrait',
    unit: 'mm',
    format: [210, 210] // Square format
  });

  for (let i = 0; i < pages.length; i++) {
    const canvas = await html2canvas(pageContainer, {
      scale: 2,
      useCORS: true,
      backgroundColor: 'white'
    });
    
    const imgData = canvas.toDataURL('image/jpeg', 1.0);
    if (i > 0) pdf.addPage();
    pdf.addImage(imgData, 'JPEG', 10, 10, pdfWidth, pdfHeight);
  }
  
  const studentId = localStorage.getItem('studentId') || 'anonymous';
  pdf.save(`${studentId}-comic.pdf`);
}

Lessons Learned

Pub/Sub > Long HTTP requests

AI image generation is slow. Async processing with job polling provides better UX and avoids timeout issues.

LLM-based moderation is tunable

Using GPT-4o-mini for moderation allows nuanced rules that built-in safety filters can't provide—like allowing "educational violence."

IndexedDB handles images well

With compression, storing base64 images locally is practical. Students keep their work without server storage costs.

Prompt engineering is product work

The difference between "AI slop" and usable comic panels is entirely in the style prompts. Iterate relentlessly.

Multi-model flexibility matters

Different AI models have different strengths and rate limits. Supporting multiple models provides fallbacks and options.

Simple auth is enough

An 8-character student ID stored in localStorage with 30-day expiry—no passwords, no OAuth. Appropriate for the use case.