novela/docs/TECHNICAL.md
Ivo Oskamp e4d2e2c636 DB-stored books, full-text search, backup restore, and AO3 scraper
- DB-stored books (Fase 1–6): chapters and images stored in PostgreSQL; grabber writes to DB, EPUB→DB conversion, DB→EPUB export, FTS search page (/search)
- Chapter editor: Monaco editor supports DB-stored books; inline title editing
- Grabber: DB/EPUB storage toggle on Convert page
- Backup: restore from Dropbox snapshot (browse snapshots, restore individual or selected files)
- AO3 scraper: initial implementation
- Changelog: v0.1.2 and v0.1.3 entries added to changelog.py and changelog.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 15:13:08 +02:00

33 KiB
Raw Permalink Blame History

Novela 2.0 - Technical Status (Develop)

Scope

This document describes the current technical status of the develop codebase. It is the primary technical reference for the current implementation.

Architecture

  • Stack: FastAPI, Jinja2 templates, plain JavaScript, PostgreSQL 16, Docker.
  • Startup lifecycle (main.py):
    1. init_pool()
    2. run_migrations()
    3. start_backup_scheduler()
    4. mount routers
  • Shutdown lifecycle:
    1. stop_backup_scheduler()
    2. close_pool()
  • Source-of-truth rule: files on disk are authoritative, the database is an index/cache.

File Storage Paths

All files are stored under library/ (relative to the app working directory, mapped via Docker volume). LIBRARY_DIR = Path("library"), LIBRARY_ROOT = LIBRARY_DIR.resolve().

Path structure per format

Format Path pattern
EPUB (no series) library/epub/{publisher}/{author}/Stories/{title}.epub
EPUB (series) library/epub/{publisher}/{author}/Series/{series}/{idx:03d} - {title}.epub
PDF library/pdf/{publisher}/{author}/{title}.pdf
CBR (no series) library/comics/{publisher}/{author}/{title}.cbr
CBR (series) library/comics/{publisher}/{author}/Series/{series}/{idx:03d} - {title}.cbr
CBZ (no series) library/comics/{publisher}/{author}/{title}.cbz
CBZ (series) library/comics/{publisher}/{author}/Series/{series}/{idx:03d} - {title}.cbz
  • Segments are sanitised: special chars stripped, max lengths applied (publisher/author 80, title 140, series 80).
  • Series index is zero-padded to 3 digits (001, 002, …), clamped to 1999.
  • Duplicate filenames get a (2), (3), … suffix.
  • After any file move, empty parent directories are pruned up to LIBRARY_ROOT.

Path logic

  • common.make_rel_path(media_type, publisher, author, title, series, series_index, ext) — used by import and grabber.
  • reader.py _make_rel_path(publisher, author, title, series, series_index, ext) — used by metadata PATCH; same logic, uses actual file extension.
  • Both functions produce identical paths for all formats.

Metadata save behaviour per format

Format File written? DB written?
EPUB Yes — OPF metadata updated in-place Yes
PDF No Yes
CBR No Yes
CBZ No (tags/metadata); rating written to ComicInfo.xml Yes

Router Status

routers/library.py

  • GET /library — library page
  • GET /api/library — book list JSON (fast-path by default)
  • POST /library/rescan — forced full disk rescan
  • POST /library/import — upload EPUB/PDF/CBR/CBZ
  • DELETE /library/file/{filename} — delete file + DB row + prune dirs
  • GET /download/{filename} — download file with Content-Disposition: attachment
  • GET /library/cover/{filename} — serve cover (EPUB from file; PDF/CBR from cache)
  • GET /library/cover-cached/{filename} — serve cover from DB cache only
  • POST /library/cover/{filename} — upload/replace cover (EPUB only)
  • POST /library/want-to-read/{filename} — toggle want-to-read flag
  • POST /library/archive/{filename} — toggle archived flag
  • POST /library/new/mark-reviewed — bulk set needs_review=false
  • POST /library/bulk-delete — delete multiple files; accepts {"filenames": [...]}, removes files from disk and DB in one query per batch; returns {ok, deleted, skipped}
  • POST /library/rating/{filename} — set/clear star rating {"rating": 0-5}
  • GET /home — home page
  • GET /api/home — home data JSON
  • GET /stats — statistics page
  • GET /api/stats — statistics data JSON
  • GET /api/disk — partition usage for the library directory: {total, used, free, pct_used}
  • POST /api/bulk-check-duplicates — accepts {"items": [{title, author, volume}, ...]}, returns {"duplicates": [bool, ...]} — when volume is a number, requires title+author+series_index to all match; when volume is absent, matches on title+author only
  • GET /library/list — compat alias

GET /api/library runs in fast-path mode by default (DB-only, no full disk rescan). For a forced sync: GET /api/library?rescan=true or POST /library/rescan. include_file_info=true is optional for file size/mtime enrichment. ETag caching: response includes ETag: "{count}-{max_updated_at_unix}" and Cache-Control: no-cache. Client sends If-None-Match; server returns 304 Not Modified when nothing changed.

/api/home returns:

  • continue_reading
  • shorts_unread
  • novels_unread
  • shorts_read
  • novels_read

/api/stats returns totals plus chart/history data for stats.html:

  • reads_by_month, reads_by_dow, reads_by_hour
  • genre_counts, publisher_counts, fav_genre, fav_publisher
  • top_books, history

Home sections exclude series books via:

  • COALESCE(series, '') = ''
  • filename NOT LIKE '%/Series/%'

Home read sections are ordered oldest-first:

  • shorts_read: ORDER BY MAX(read_at) ASC
  • novels_read: ORDER BY MAX(read_at) ASC

routers/reader.py

  • GET /library/db-images/{path:path} — serve image from content-addressed imagestore (library/images/); security: path must be under IMAGES_DIR
  • POST /api/library/convert-to-db/{filename:path} — convert on-disk EPUB to a DB-stored book; extracts chapters via _epub_body_inner (stores images in imagestore, rewrites src to /library/db-images/…), migrates all child tables (INSERT new library row → UPDATE children → DELETE old row), deletes EPUB file; returns {ok, new_filename}
  • GET /api/library/export-epub/{filename:path} — build and stream an EPUB from a DB-stored book; _rewrite_db_images_for_epub rewrites /library/db-images/… back to OEBPS/Images/… paths (dedup by sha256); returns as Content-Disposition: attachment
  • GET /library/epub/{filename} — serve EPUB inline (no attachment header)
  • GET /library/chapters/{filename} — EPUB spine as JSON; for storage_type='db' books returns chapters from book_chapters
  • GET /library/chapter/{index}/{filename} — single chapter as HTML fragment; for storage_type='db' books reads from book_chapters
  • GET /library/chapter-img/{path}?filename=… — image extracted from EPUB ZIP; path is the full internal ZIP path (e.g. OEBPS/Images/cover.jpg or EPUB/images/cover.jpg); case-insensitive fallback for mismatched folder names
  • GET /library/pdf/{filename}?page=N&dpi=150 — render PDF page as PNG
  • GET /api/pdf/info/{filename}{"page_count": N}
  • GET /library/cbr/{filename}/{page} — CBR/CBZ page as image
  • GET /library/progress/{filename} — read progress
  • POST /library/progress/{filename} — save progress {"cfi": "…", "progress": N}
  • DELETE /library/progress/{filename} — clear progress
  • POST /library/mark-read/{filename} — mark as read (with optional date)
  • GET /library/book/{filename} — book detail page
  • GET /api/genres — all tags from book_tags (optional ?type=genre|subgenre|tag)
  • PATCH /library/book/{filename} — update metadata + tags; moves file if path fields change; DB-only for non-EPUB; for storage_type='db' books: recomputes synthetic db/… filename, FK-safe rename (INSERT→UPDATE children→DELETE old), updates book_chapters + bookmarks as well
  • POST /library/rating/{filename} — set/clear 15 star rating; writes to EPUB OPF / CBZ ComicInfo.xml; DB-only for CBR/PDF
  • GET /library/read/{filename} — reader page (EPUB or PDF); supports ?bm_ch=N&bm_scroll=F to jump to bookmark position
  • GET /library/bookmarks/{filename} — list bookmarks for a book
  • POST /library/bookmarks/{filename} — add bookmark {chapter_index, scroll_frac, chapter_title, note}
  • PATCH /library/bookmarks/{id} — update bookmark note
  • DELETE /library/bookmarks/{id} — delete bookmark
  • GET /api/bookmarks — all bookmarks across all books (includes book_title, book_author)

routers/bulk_import.py

  • GET /bulk-import — Bulk Import page
  • POST /library/bulk-import — import files with pre-parsed metadata; accepts multipart files[], rows (JSON array of per-file metadata), shared (JSON with author/publisher/status/genres/tags applied to all files)

Filename parsing is done client-side in bulk_import.html. The page uses a free-text %placeholder% pattern (e.g. %series% - %volume% - %title% - %year%). Available placeholders: %series% %volume% %title% %year% %month% %day% %author% %publisher% %ignore%. Colored chips can be clicked (insert at cursor) or dragged onto the input. Pattern is converted to a regex at parse time. Shared metadata fields override filename-parsed values. Files are uploaded in batches of 5 with a progress bar.

routers/editor.py

  • GET /library/editor/{filename} — chapter editor page; supports both EPUB files and DB-stored books (db/… filenames); passes is_db flag to template; DB branch queries library table directly (no file check)
  • GET /api/edit/chapter/{index}/{filename} — get chapter content; DB branch reads from book_chapters and returns {index, href, title, content}
  • POST /api/edit/chapter/{index}/{filename} — save chapter; DB branch accepts {content, title}, calls upsert_chapter (updates content_tsv too)
  • POST /api/edit/chapter/add/{filename} — add new chapter after after_index; DB branch shifts chapter_index up via UPDATE … SET chapter_index = chapter_index + 1 WHERE chapter_index >= insert_idx then inserts
  • DELETE /api/edit/chapter/{index}/{filename} — delete chapter; DB branch deletes and re-indexes via UPDATE … SET chapter_index = chapter_index - 1 WHERE chapter_index > index

routers/grabber.py

  • GET /grabber — grabber page
  • GET /convert — convert page
  • GET /credentials-manager — credentials manager UI
  • GET /debug — debug page
  • POST /debug/run — run debug scrape
  • GET /credentials — list stored credentials
  • POST /credentials — save credential
  • DELETE /credentials/{site} — delete credential
  • POST /preload — preload book info from URL
  • POST /convert — run scrape; body may include storage_mode: "db" (default) or "epub" to control output format
  • GET /events/{job_id} — SSE stream for job progress; done event includes storage_type ('db' or 'file')

Scrape/convert flow (DB storage — default):

  1. Fetch book info + chapters via scraper
  2. Per chapter: download images → write to library/images/{sha2}/{sha256}{ext} (content-addressed) → rewrite img[src] to /library/db-images/... → build content_html via element_to_xhtml
  3. One DB transaction: ensure_unique_db_filenameupsert_book (storage_type='db') → upsert_chapter for each chapter → upsert_cover_cache if cover provided
  4. Synthetic filename: db/{publisher}/{author}/{title} (or db/{pub}/{auth}/Series/{series}/{idx} - {title} for series)

Scrape/convert flow (EPUB file — storage_mode: "epub"): 12. Same as DB flow (images downloaded, HTML built) 3. Chapters converted to XHTML via make_chapter_xhtml; EPUB file built via make_epub and written to library/epub/… 4. upsert_book called with storage_type='file'

routers/search.py

  • GET /search — full-text search page (search.html); Enter-to-search, ?q= param auto-runs on load
  • GET /api/search?q=… — FTS over book_chapters.content_tsv; uses plainto_tsquery('simple', q) with ts_rank ordering and ts_headline for highlighted snippets; also matches chapters whose title contains the query (case-insensitive ILIKE fallback); LIMIT 30; excludes archived books; results include filename, title, author, chapter_index, chapter_title, snippet, rank

routers/settings.py

  • GET /settings — settings page
  • GET /api/break-patterns — list chapter-break patterns
  • POST /api/break-patterns — add break pattern (type: regex or css_class)
  • PATCH /api/break-patterns/{id} — update pattern (enable/disable or change value)
  • DELETE /api/break-patterns/{id} — delete pattern
  • DELETE /api/reading-history — wipe all reading sessions

routers/builder.py

  • GET /builder — Book Builder index (draft list + new draft form)
  • POST /builder — create new draft; redirects to /builder/{id}
  • GET /builder/{draft_id} — draft editor page
  • DELETE /api/builder/{draft_id} — delete draft
  • GET /api/builder/{draft_id} — draft JSON (id, title, author, publisher, source_url, chapters)
  • POST /api/builder/{draft_id}/chapter — add chapter {title, after_index}; returns {index, count}
  • PUT /api/builder/{draft_id}/chapter/{idx} — save chapter {title?, content?}
  • DELETE /api/builder/{draft_id}/chapter/{idx} — delete chapter; returns {index, count}
  • POST /api/builder/{draft_id}/normalize/{idx} — normalize chapter HTML (preview only, does not save); returns {content}
  • POST /api/builder/{draft_id}/publish — normalize all chapters → build_epub() → write to library/epub/upsert_book() → delete draft; returns {filename}; redirects browser to /library/book/{filename}

Publish flow: all chapters are run through normalize_wysiwyg_html(), then build_epub() produces an EPUB 2.0 ZIP. The file path is computed via make_rel_path(media_type="epub", …). The book is inserted into the library with needs_review=True. The draft is deleted on success.

routers/following.py

  • GET /following — Following page (author URL management)
  • GET /api/following — all distinct library authors with URL (if set), book count, and last-added date
  • POST /api/following/{author_name} — set or clear URL for an author (empty url removes the record)

GET /api/following returns one entry per non-archived author:

{ "name": "Author Name", "book_count": 5, "last_added": "2026-03-27T…", "url": "https://…" }

URL is stored in the authors table (name unique, url, created_at, updated_at).

routers/backup.py

  • GET /backup — backup page
  • GET /api/backup/credentials — Dropbox settings (includes app_key_configured flag)
  • POST /api/backup/credentials — save Dropbox settings
  • DELETE /api/backup/credentials — remove all Dropbox credentials
  • POST /api/backup/oauth/prepare — save app key + secret, return Dropbox auth URL
  • POST /api/backup/oauth/exchange — exchange authorization code for refresh token
  • GET /api/backup/health — Dropbox connectivity check (includes schedule_enabled, schedule_interval_hours)
  • GET /api/backup/status — current backup status
  • GET /api/backup/history — backup run history (last 20)
  • GET /api/backup/progress — live progress of running backup {running, done, total, phase}
  • POST /api/backup/run — trigger backup (background task)
  • GET /api/backup/snapshots — list available snapshots {ok, snapshots: [{name, created_at}]}
  • GET /api/backup/snapshots/{snapshot_name}/files — list files in a snapshot with local existence check {ok, snapshot, files: [{path, size, sha256, exists_locally}]}
  • POST /api/backup/restore — restore files from a snapshot: {snapshot_name, files: [rel_paths]}; downloads from Dropbox, writes to disk, re-indexes via scan_media + upsert_book; returns {ok, restored, total, results: [{path, ok, error?}]}

Backup & Security

  • Dropbox token (refresh token or legacy access token) stored encrypted in credentials (site='dropbox').
  • Dropbox app key stored encrypted in credentials (site='dropbox_app_key').
  • Dropbox app secret stored encrypted in credentials (site='dropbox_app_secret').
  • Dropbox backup root stored encrypted in credentials (site='dropbox_backup_root').
  • Retention (snapshots to keep) stored encrypted in credentials (site='dropbox_backup_retention').
  • Backup schedule (enabled + interval_hours) stored encrypted in credentials (site='dropbox_backup_schedule').
  • Encryption uses NOVELA_MASTER_KEY (Fernet).

Dropbox authentication

  • Preferred: OAuth2 refresh token (does not expire). Set up via the two-step flow on /backup:
    1. Enter App Key + App Secret → click Generate Auth URL
    2. Approve in browser → paste the code → click Save & Activate
    • _dbx() uses oauth2_refresh_token + app_key + app_secret for automatic token renewal.
  • Fallback: legacy short-lived access token (backwards compatible; works without app key/secret).

Implementation details

  • Versioned backups with deduplication:
    • file objects in Dropbox: library_objects/{sha256_prefix}/{sha256}
    • snapshots in Dropbox: library_snapshots/snapshot-YYYYMMDD-HHMMSS.json
  • Each run creates a new snapshot version and uploads only missing objects.
  • Retention removes older snapshots above the configured limit.
  • Orphan object pruning removes objects no longer referenced by retained snapshots.
  • Local manifest cache (config/backup_manifest.json) speeds up change detection.
  • Database backup is done via pg_dump to Dropbox postgres/.
  • POST /api/backup/run always starts a background task and returns immediately.
  • GET /api/backup/progress returns in-memory progress updated per file; phases: startingscanninguploadingsnapshotpg_dump.
  • Scheduler runs in the background (start_backup_scheduler) and triggers on interval when enabled.
  • Concurrency guard: only one backup can run at a time.
  • After container restart/crash, stale running logs are auto-marked as interrupted/error.

Environment

stack/novela.env should include at least:

  • POSTGRES_DB
  • POSTGRES_USER
  • POSTGRES_PASSWORD
  • NOVELA_MASTER_KEY
  • CONFIG_DIR

Dropbox settings are managed via the web UI on /backup.


Branding

Static assets in static/:

File Size Purpose
logo.png 546×575, transparent Sidebar wordmark (displayed at 26px height)
favicon.ico 16×16 Browser tab (legacy)
favicon-32.png 32×32 Browser tab (modern)
favicon-256.png 256×256 Pinned tabs / high-DPI
apple-touch-icon.png 180×180 iOS/iPadOS home screen icon

All 15 page templates include:

<link rel="icon" href="/static/favicon.ico" sizes="16x16"/>
<link rel="icon" type="image/png" sizes="32x32" href="/static/favicon-32.png"/>
<link rel="icon" type="image/png" sizes="256x256" href="/static/favicon-256.png"/>
<link rel="apple-touch-icon" sizes="180x180" href="/static/apple-touch-icon.png"/>

Sidebar logo: logo.png (26px, flex-aligned) next to the "Novela" wordmark ("No" in --text, "vela" in --accent). apple-touch-icon.png uses #0f0e0c background (= --bg) with the orange N logo centered at 60% of canvas size.


Shared CSS (static/theme.css)

Single :root { } block defining all global CSS custom properties. Loaded first on every page (<link rel="stylesheet" href="/static/theme.css"/>). No template defines its own global colours — only page-specific layout vars stay inline.

Variable Value Role
--bg #0f0e0c Page background
--surface #1a1815 Card/panel background
--surface2 #221f1b Nested surface
--border #2e2a24 Borders
--accent #ffa20e Orange highlight (logo colour)
--accent2 #ffb840 Lighter orange
--text #e8e2d9 Body text
--text-dim #8a8278 Muted text
--text-faint #4a453e Very muted text
--success #6baa6b Success state
--warning #c8a03a Warning state
--error #c85a3a Error state
--radius 6px Border radius
--sidebar 220px Sidebar width
--mono 'DM Mono', monospace Monospace font stack
--serif 'Libre Baskerville', Georgia, serif Serif font stack

Page-specific overrides: reader.html (--header-h, --footer-h, --content-w); backup.html (--ok, --warn, --err); editor.css (--danger, --header-h, --panel-w).

Shared JavaScript (static/books.js)

Loaded before any page-specific script on every page that needs book data or UI helpers.

Function Purpose
esc(s) HTML-escape a string for safe insertion into markup
strHash(s) Deterministic integer hash of a string (for colour selection)
COVER_PALETTES Array of 8 [bg, fg] colour pairs for placeholder covers
wrapText(ctx, text, x, y, maxW, lineH) Canvas word-wrap helper
truncate(s, n) Truncate string with ellipsis
makePlaceholderCover(canvas, title, author) Draw a generated book cover on a <canvas>
_filenameBase(filename) Strip path and extension from a filename
bookTitle(b) Return display title (falls back to filename parsing)
bookAuthor(b) Return display author (falls back to filename parsing)
tagValuesByType(b, type) Return tag strings of a given type from b.tags
bookGenres(b) Tags of type genre; falls back to subject
bookSubgenres(b) Tags of type subgenre
bookPlainTags(b) Tags of type tag
filterBooks(books, query) Filter book list by query across title, author, publisher, genre, sub-genre, tag
setupSearchInput(inputId, clearId, onSearch) Wire input: show/hide clear button on input; call onSearch(query) on Enter

Shared JavaScript (static/conversion.js)

Loaded by index.html (Convert page) and grabber.html (Grabber page). Requires books.js for esc().

Function Purpose
addLog(msg, cls) Append a log line to #log-lines
connectConversionStream(job_id) Open SSE stream /events/{job_id} and handle all conversion events: status, meta, chapters, progress, warning, error, done

UI Notes

  • Library import accepts EPUB/PDF/CBR/CBZ.
  • Home supports the same import formats.
  • Home includes search.
  • Home header/dropzone alignment matches Library (search top-right, dropzone below).
  • New view supports Grid and List mode.
    • Bulk selection + Remove from New works only in List mode.
    • List mode has a column visibility filter: Publisher, Author, Series, Volume, Title, Has cover, Updated, Genres, Sub-genres, Tags, Status.
    • List mode supports multi-select with Shift+click range selection on checkboxes.
    • Grid mode shows no selection checkboxes or bulk actions.
  • All books view supports Grid and List mode (same columns as New).
    • View mode persisted in localStorage as novela.all.viewMode.
    • Column visibility persisted in localStorage as novela.all.visibleColumns.
    • List mode has a checkbox column, column visibility filter, and multi-select with Shift+click range selection.
    • List mode has a Delete selected bulk action: confirms then calls DELETE /library/file/{filename} for each selected book.
  • Publication status values: Complete, Ongoing, Temporary Hold, Long-Term Hold (blank = unknown). Hiatus was renamed to Long-Term Hold via startup migration migrate_rename_hiatus().
  • Status badges (top-right of grid card cover): circular icon, dark fill rgba(15,14,12,0.82) + box-shadow: 0 0 0 2px #0f0e0c ring for visibility on any cover colour. Icon colour per status: Complete=green #6baa6b, Ongoing=blue #4a90b8, Temporary Hold=amber #c8a03a, Long-Term Hold=orange #c8783a. statusBadgeHtml() in library.js is the single source for badge HTML across all grid views.
  • Want-to-read star (top-left of grid card cover): same dark fill + ring as status badges.
  • Status pills in Book Detail (book.css): status-complete, status-ongoing, status-temporary-hold, status-long-term-hold — same colour scheme as badges.
  • Grabber status mapping (grabber.py): Temporary-Hold (gayauthors.org) → Temporary Hold; Long-Term Hold passes through unchanged.
  • Star ratings (15) shown under the cover in all grid views:
    • Display-only in grid cards (no click, prevents accidental taps while scrolling).
    • Interactive in Book Detail (1.1rem, clickable; clicking the active star clears the rating).
    • Amber: filled #c8a03a, unfilled rgba(200, 160, 58, 0.25).
  • Reader settings (hamburger menu):
    • Content width slider (30100 vw), persisted as reader-content-width-pct.
    • Text colour: 5 warm-tone presets #e8e2d9#938d86, persisted as reader-text-colour.
    • Hamburger and back-link separated with margin-left: 1rem on .header-back.
  • Reader supports EPUB and PDF:
    • EPUB: chapter-text rendering; progress = {chapterIndex}:{scrollFrac}; progress % = (chapterIndex + scrollFrac) / total * 100.
    • PDF: page-image rendering via /library/pdf/{filename}?page=N; page count from /api/pdf/info/{filename}; progress = {pageIndex}:0; keyboard/button navigation identical.
    • reader.html branches on FORMAT variable injected by the server.
  • Edit EPUB button in Book Detail is only shown for .epub files.
  • Backup page supports: manual run, dry-run, Dropbox root, retention count, schedule (on/off + hours), status + history.
  • Bookmarks: saved per book via POST /library/bookmarks/{filename}; shown in Library sidebar section; navigated via ?bm_ch=N&bm_scroll=F URL params on reader page.
  • Convert page: after loading metadata, if a book with the same title+author already exists in the library, a warning banner is shown (with a link to the existing book); user can still proceed with conversion. Check is done server-side in /preload response (already_exists, existing_books).
  • Duplicates view (#duplicates): groups non-archived books by (title, author) (case-insensitive); shows only groups with ≥ 2 copies; counter in sidebar shows total number of duplicate books. Detection is entirely client-side from the existing library data.
  • Incomplete view (#incomplete): shows all non-archived books where publication_status is not Complete (Ongoing, Temporary Hold, Long-Term Hold, or blank); sidebar counter included.
  • Following page (/following): dedicated page in its own sidebar section between Library and Tools; shows all library authors with their external URL; two tabs — Following (authors with URL set) and All Authors; inline URL editing with keyboard support (Enter = save, Escape = cancel); clicking Visit opens the external URL in a new tab. Author URLs are stored in the authors table. Sidebar counter shows number of followed authors.
  • Book Builder (/builder): create EPUB books from scratch; drafts stored in builder_drafts (JSONB chapters); contenteditable editor with toolbar (bold/italic/underline/blockquote/author-note/scene-break/normalize); autosave every 30 s + Ctrl+S; publish normalizes HTML via normalize_wysiwyg_html() and builds EPUB via build_epub().

Known Conventions

  • Book deletion flow: unlink file → prune_empty_dirs(parent)DELETE FROM library (cascade removes child rows).
  • Empty dir pruning: prune_empty_dirs(start) walks up from start to LIBRARY_ROOT, removing each dir if empty; stops at first non-empty dir.
  • Cover strategy:
    • EPUB: GET /library/cover/{filename} checks library_cover_cache first; on miss, extracts from ZIP and warms the cache. Cover upload (POST /library/cover/{filename}) replaces the image inside the EPUB ZIP (OPF located via META-INF/container.xml, old cover found in manifest and removed) and updates the cache so subsequent requests return the new cover immediately.
    • PDF: first page rendered as thumbnail, cached
    • CBR/CBZ: first page extracted, cached
  • Rating storage:
    • EPUB: <meta name="novela:rating" content="N"/> in OPF
    • CBZ: <NovelaRating>N</NovelaRating> in ComicInfo.xml inside the ZIP
    • CBR/PDF: DB only
    • upsert_book uses CASE WHEN EXCLUDED.rating > 0 THEN EXCLUDED.rating ELSE library.rating END to restore rating from file without overwriting existing DB value.
  • Tag types in book_tags: genre, subgenre, tag, subject. No direct genres/subgenres fields on book objects; always use helpers bookGenres(), bookSubgenres(), bookPlainTags().

Performance Notes

  • Library load is optimized for large datasets (1000+ books):
    • list_library_json() uses json_agg in the main query to inline tags per book — eliminates a separate SELECT * FROM book_tags query and Python merge loop.
    • has_cached_cover is provided directly via SQL join instead of full cache fetch.
    • reading_sessions is pre-aggregated in a subquery.
    • ETag on /api/library: cheap COUNT + MAX(updated_at) query before full load; 304 Not Modified on cache hit.
  • Front-end rendering uses IntersectionObserver to defer both cover image loading and placeholder canvas drawing until cards enter the viewport — prevents hundreds of simultaneous HTTP requests and canvas operations on initial render.
  • renderBooksGrid, renderDuplicatesView, renderSeriesDetail all use a single DOM pass: cover <img> and <canvas> are set up via card.querySelector immediately after innerHTML is set, eliminating a second full iteration with document.getElementById calls.
  • Additional migration indexes:
    • idx_library_sort_coalesce
    • idx_library_needs_review
    • idx_library_archived
    • idx_reading_sessions_filename_readat
    • idx_book_tags_filename_tag

DB-Stored Books

Books scraped via the grabber are stored entirely in PostgreSQL (storage_type = 'db'). No EPUB file is written.

New tables

Table Key columns Notes
book_chapters filename FK, chapter_index, title, content TEXT, content_tsv TSVECTOR Unique on (filename, chapter_index); GIN index on content_tsv for FTS; content_tsv is `to_tsvector('simple', title
book_images sha256 PK, ext, media_type, size_bytes Content-addressed; files live at library/images/{sha256[:2]}/{sha256}{ext}

library.storage_type

Value Meaning
'file' Book lives on disk (EPUB/PDF/CBR/CBZ); default for all existing books
'db' Book content lives in book_chapters; no file on disk

Synthetic filename for DB books

db/{publisher}/{author}/{title} — or for series: db/{publisher}/{author}/Series/{series}/{idx:03d} - {title}

Same sanitization rules as file-based paths. Uniqueness enforced via ensure_unique_db_filename (DB lookup, not filesystem).

Chapter editor for DB books

GET /library/editor/{filename} supports DB-stored books. The Monaco editor shows language: 'html' for DB books (vs 'xml' for EPUB). The header shows a title input instead of a read-only chapter name. Unsaved content and titles are preserved across chapter switches via pendingContent and pendingTitles maps. editor.focus() is called after every content load so the editor is immediately interactive.

Imagestore

Images embedded in chapter HTML are stored content-addressed at library/images/{sha256[:2]}/{sha256}{ext}.

  • Served via GET /library/db-images/{path:path}
  • URLs embedded in book_chapters.content as absolute paths: /library/db-images/...
  • book_images table registers each unique image (auto-deduplication via sha256)

EPUB → DB conversion

POST /api/library/convert-to-db/{filename} converts an on-disk EPUB to storage_type='db':

  1. Parse EPUB spine → per item: extract body HTML via _epub_body_inner, store images in imagestore via write_image_file, rewrite img[src] to /library/db-images/…
  2. Compute new synthetic db/… filename via make_rel_path(media_type="db", …) + ensure_unique_db_filename
  3. DB transaction: INSERT new library row (storage_type='db') → UPDATE all child tables (book_tags, reading_progress, reading_sessions, bookmarks, library_cover_cache, book_chapters) → DELETE old library row
  4. Delete EPUB file from disk + prune_empty_dirs

DB → EPUB export

GET /api/library/export-epub/{filename} streams an EPUB built from DB content:

  1. Query metadata, tags, chapters, cover from DB
  2. Per chapter: _rewrite_db_images_for_epub strips /library/db-images/ prefix, reads files from IMAGES_DIR, deduplicates by sha256, assigns OEBPS/Images/{sha256}{ext} paths, rewrites img[src] to ../Images/…
  3. Build EPUB via make_epub(); return as Content-Disposition: attachment

Known Bugs Fixed

  • renderGenreView and renderSearchResults in library.js referenced b.genres (non-existent). Fixed: use bookGenres(), bookSubgenres(), bookPlainTags().
  • PillInput in book.js did not handle comma as delimiter and did not flush on save. Fixed: comma keydown + flush() in saveEdit().
  • PATCH /library/book failed for PDFs: _sync_epub_metadata tried to open PDF as ZIP. Fixed: only called for .epub.
  • _make_rel_path in reader.py lacked format prefix (epub/, pdf/, comics/). Fixed: aligned with common.make_rel_path.
  • common.make_rel_path always generated .cbr extension for CBZ files (both map to media_type="cbr"). Fixed: accepts optional ext parameter; library.py import now passes actual suffix.
  • /download/{filename} was referenced in book.html but no endpoint existed (404). Fixed: added GET /download/{filename} to library.py.
  • PDF reader showed infinite loading: reader.html called EPUB-only /library/chapters/. Fixed: PDF path uses /api/pdf/info/ + page-image rendering.
  • Empty dir pruning only ran when file was moved. Fixed: prune_empty_dirs(old_path.parent) always runs after a successful metadata save.