novela/docs/TECHNICAL.md
Ivo Oskamp 00e75a6106 Add duplicate detection, Convert warning, and performance TODO
- Convert: warn when title+author already exists in library (preload check)
- Library: Duplicates sidebar section with grouped view and live counter
- Fix: Duplicates view cover loading now uses same canvas/two-pass pattern as renderBooksGrid
- Docs: add TODO-PERF-library-load.md with four identified bottlenecks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 16:22:02 +01:00

18 KiB
Raw Blame History

Novela 2.0 - Technical Status (Develop)

Scope

This document describes the current technical status of the develop codebase. It is the primary technical reference for the current implementation.

Architecture

  • Stack: FastAPI, Jinja2 templates, plain JavaScript, PostgreSQL 16, Docker.
  • Startup lifecycle (main.py):
    1. init_pool()
    2. run_migrations()
    3. start_backup_scheduler()
    4. mount routers
  • Shutdown lifecycle:
    1. stop_backup_scheduler()
    2. close_pool()
  • Source-of-truth rule: files on disk are authoritative, the database is an index/cache.

File Storage Paths

All files are stored under library/ (relative to the app working directory, mapped via Docker volume). LIBRARY_DIR = Path("library"), LIBRARY_ROOT = LIBRARY_DIR.resolve().

Path structure per format

Format Path pattern
EPUB (no series) library/epub/{publisher}/{author}/Stories/{title}.epub
EPUB (series) library/epub/{publisher}/{author}/Series/{series}/{idx:03d} - {title}.epub
PDF library/pdf/{publisher}/{author}/{title}.pdf
CBR library/comics/{publisher}/{author}/{title}.cbr
CBZ library/comics/{publisher}/{author}/{title}.cbz
  • Segments are sanitised: special chars stripped, max lengths applied (publisher/author 80, title 140, series 80).
  • Series index is zero-padded to 3 digits (001, 002, …), clamped to 1999.
  • Duplicate filenames get a (2), (3), … suffix.
  • After any file move, empty parent directories are pruned up to LIBRARY_ROOT.

Path logic

  • common.make_rel_path(media_type, publisher, author, title, series, series_index, ext) — used by import and grabber.
  • reader.py _make_rel_path(publisher, author, title, series, series_index, ext) — used by metadata PATCH; same logic, uses actual file extension.
  • Both functions produce identical paths for all formats.

Metadata save behaviour per format

Format File written? DB written?
EPUB Yes — OPF metadata updated in-place Yes
PDF No Yes
CBR No Yes
CBZ No (tags/metadata); rating written to ComicInfo.xml Yes

Router Status

routers/library.py

  • GET /library — library page
  • GET /api/library — book list JSON (fast-path by default)
  • POST /library/rescan — forced full disk rescan
  • POST /library/import — upload EPUB/PDF/CBR/CBZ
  • DELETE /library/file/{filename} — delete file + DB row + prune dirs
  • GET /download/{filename} — download file with Content-Disposition: attachment
  • GET /library/cover/{filename} — serve cover (EPUB from file; PDF/CBR from cache)
  • GET /library/cover-cached/{filename} — serve cover from DB cache only
  • POST /library/cover/{filename} — upload/replace cover (EPUB only)
  • POST /library/want-to-read/{filename} — toggle want-to-read flag
  • POST /library/archive/{filename} — toggle archived flag
  • POST /library/new/mark-reviewed — bulk set needs_review=false
  • POST /library/rating/{filename} — set/clear star rating {"rating": 0-5}
  • GET /home — home page
  • GET /api/home — home data JSON
  • GET /stats — statistics page
  • GET /api/stats — statistics data JSON
  • GET /library/list — compat alias

GET /api/library runs in fast-path mode by default (DB-only, no full disk rescan). For a forced sync: GET /api/library?rescan=true or POST /library/rescan. include_file_info=true is optional for file size/mtime enrichment.

/api/home returns:

  • continue_reading
  • shorts_unread
  • novels_unread
  • shorts_read
  • novels_read

/api/stats returns totals plus chart/history data for stats.html:

  • reads_by_month, reads_by_dow, reads_by_hour
  • genre_counts, publisher_counts, fav_genre, fav_publisher
  • top_books, history

Home sections exclude series books via:

  • COALESCE(series, '') = ''
  • filename NOT LIKE '%/Series/%'

Home read sections are ordered oldest-first:

  • shorts_read: ORDER BY MAX(read_at) ASC
  • novels_read: ORDER BY MAX(read_at) ASC

routers/reader.py

  • GET /library/epub/{filename} — serve EPUB inline (no attachment header)
  • GET /library/chapters/{filename} — EPUB spine as JSON
  • GET /library/chapter/{index}/{filename} — single EPUB chapter as HTML fragment
  • GET /library/chapter-img/{path}?filename=… — image extracted from EPUB ZIP; path is the full internal ZIP path (e.g. OEBPS/Images/cover.jpg or EPUB/images/cover.jpg); case-insensitive fallback for mismatched folder names
  • GET /library/pdf/{filename}?page=N&dpi=150 — render PDF page as PNG
  • GET /api/pdf/info/{filename}{"page_count": N}
  • GET /library/cbr/{filename}/{page} — CBR/CBZ page as image
  • GET /library/progress/{filename} — read progress
  • POST /library/progress/{filename} — save progress {"cfi": "…", "progress": N}
  • DELETE /library/progress/{filename} — clear progress
  • POST /library/mark-read/{filename} — mark as read (with optional date)
  • GET /library/book/{filename} — book detail page
  • GET /api/genres — all tags from book_tags (optional ?type=genre|subgenre|tag)
  • PATCH /library/book/{filename} — update metadata + tags; moves file if path fields change; DB-only for non-EPUB
  • POST /library/rating/{filename} — set/clear 15 star rating; writes to EPUB OPF / CBZ ComicInfo.xml; DB-only for CBR/PDF
  • GET /library/read/{filename} — reader page (EPUB or PDF); supports ?bm_ch=N&bm_scroll=F to jump to bookmark position
  • GET /library/bookmarks/{filename} — list bookmarks for a book
  • POST /library/bookmarks/{filename} — add bookmark {chapter_index, scroll_frac, chapter_title, note}
  • PATCH /library/bookmarks/{id} — update bookmark note
  • DELETE /library/bookmarks/{id} — delete bookmark
  • GET /api/bookmarks — all bookmarks across all books (includes book_title, book_author)

routers/editor.py

  • GET /library/editor/{filename} — EPUB chapter editor page
  • GET /api/edit/chapter/{index}/{filename} — get chapter HTML
  • POST /api/edit/chapter/{index}/{filename} — save chapter HTML
  • POST /api/edit/chapter/add/{filename} — add new chapter
  • DELETE /api/edit/chapter/{index}/{filename} — delete chapter

routers/grabber.py

  • GET /grabber — grabber page
  • GET /convert — convert page
  • GET /credentials-manager — credentials manager UI
  • GET /debug — debug page
  • POST /debug/run — run debug scrape
  • GET /credentials — list stored credentials
  • POST /credentials — save credential
  • DELETE /credentials/{site} — delete credential
  • POST /preload — preload book info from URL
  • POST /convert — run scrape + convert to EPUB
  • GET /events/{job_id} — SSE stream for job progress

routers/settings.py

  • GET /settings — settings page
  • GET /api/break-patterns — list chapter-break patterns
  • POST /api/break-patterns — add break pattern (type: regex or css_class)
  • PATCH /api/break-patterns/{id} — update pattern (enable/disable or change value)
  • DELETE /api/break-patterns/{id} — delete pattern
  • DELETE /api/reading-history — wipe all reading sessions

routers/builder.py

  • GET /builder — Book Builder index (draft list + new draft form)
  • POST /builder — create new draft; redirects to /builder/{id}
  • GET /builder/{draft_id} — draft editor page
  • DELETE /api/builder/{draft_id} — delete draft
  • GET /api/builder/{draft_id} — draft JSON (id, title, author, publisher, source_url, chapters)
  • POST /api/builder/{draft_id}/chapter — add chapter {title, after_index}; returns {index, count}
  • PUT /api/builder/{draft_id}/chapter/{idx} — save chapter {title?, content?}
  • DELETE /api/builder/{draft_id}/chapter/{idx} — delete chapter; returns {index, count}
  • POST /api/builder/{draft_id}/normalize/{idx} — normalize chapter HTML (preview only, does not save); returns {content}
  • POST /api/builder/{draft_id}/publish — normalize all chapters → build_epub() → write to library/epub/upsert_book() → delete draft; returns {filename}; redirects browser to /library/book/{filename}

Publish flow: all chapters are run through normalize_wysiwyg_html(), then build_epub() produces an EPUB 2.0 ZIP. The file path is computed via make_rel_path(media_type="epub", …). The book is inserted into the library with needs_review=True. The draft is deleted on success.

routers/backup.py

  • GET /backup — backup page
  • GET /api/backup/credentials — Dropbox settings (includes app_key_configured flag)
  • POST /api/backup/credentials — save Dropbox settings
  • DELETE /api/backup/credentials — remove all Dropbox credentials
  • POST /api/backup/oauth/prepare — save app key + secret, return Dropbox auth URL
  • POST /api/backup/oauth/exchange — exchange authorization code for refresh token
  • GET /api/backup/health — Dropbox connectivity check (includes schedule_enabled, schedule_interval_hours)
  • GET /api/backup/status — current backup status
  • GET /api/backup/history — backup run history (last 20)
  • GET /api/backup/progress — live progress of running backup {running, done, total, phase}
  • POST /api/backup/run — trigger backup (background task)

Backup & Security

  • Dropbox token (refresh token or legacy access token) stored encrypted in credentials (site='dropbox').
  • Dropbox app key stored encrypted in credentials (site='dropbox_app_key').
  • Dropbox app secret stored encrypted in credentials (site='dropbox_app_secret').
  • Dropbox backup root stored encrypted in credentials (site='dropbox_backup_root').
  • Retention (snapshots to keep) stored encrypted in credentials (site='dropbox_backup_retention').
  • Backup schedule (enabled + interval_hours) stored encrypted in credentials (site='dropbox_backup_schedule').
  • Encryption uses NOVELA_MASTER_KEY (Fernet).

Dropbox authentication

  • Preferred: OAuth2 refresh token (does not expire). Set up via the two-step flow on /backup:
    1. Enter App Key + App Secret → click Generate Auth URL
    2. Approve in browser → paste the code → click Save & Activate
    • _dbx() uses oauth2_refresh_token + app_key + app_secret for automatic token renewal.
  • Fallback: legacy short-lived access token (backwards compatible; works without app key/secret).

Implementation details

  • Versioned backups with deduplication:
    • file objects in Dropbox: library_objects/{sha256_prefix}/{sha256}
    • snapshots in Dropbox: library_snapshots/snapshot-YYYYMMDD-HHMMSS.json
  • Each run creates a new snapshot version and uploads only missing objects.
  • Retention removes older snapshots above the configured limit.
  • Orphan object pruning removes objects no longer referenced by retained snapshots.
  • Local manifest cache (config/backup_manifest.json) speeds up change detection.
  • Database backup is done via pg_dump to Dropbox postgres/.
  • POST /api/backup/run always starts a background task and returns immediately.
  • GET /api/backup/progress returns in-memory progress updated per file; phases: startingscanninguploadingsnapshotpg_dump.
  • Scheduler runs in the background (start_backup_scheduler) and triggers on interval when enabled.
  • Concurrency guard: only one backup can run at a time.
  • After container restart/crash, stale running logs are auto-marked as interrupted/error.

Environment

stack/novela.env should include at least:

  • POSTGRES_DB
  • POSTGRES_USER
  • POSTGRES_PASSWORD
  • NOVELA_MASTER_KEY
  • CONFIG_DIR

Dropbox settings are managed via the web UI on /backup.


UI Notes

  • Library import accepts EPUB/PDF/CBR/CBZ.
  • Home supports the same import formats.
  • Home includes search.
  • Home header/dropzone alignment matches Library (search top-right, dropzone below).
  • New view supports Grid and List mode.
    • Bulk selection + Remove from New works only in List mode.
    • List mode has a column visibility filter: Publisher, Author, Series, Volume, Title, Has cover, Updated, Genres, Sub-genres, Tags, Status.
    • List mode supports multi-select with Shift+click range selection on checkboxes.
    • Grid mode shows no selection checkboxes or bulk actions.
  • All books view supports Grid and List mode (same columns as New).
    • View mode persisted in localStorage as novela.all.viewMode.
    • Column visibility persisted in localStorage as novela.all.visibleColumns.
    • List mode has a checkbox column, column visibility filter, and multi-select with Shift+click range selection.
    • List mode has a Delete selected bulk action: confirms then calls DELETE /library/file/{filename} for each selected book.
  • Star ratings (15) shown under the cover in all grid views:
    • Display-only in grid cards (no click, prevents accidental taps while scrolling).
    • Interactive in Book Detail (1.1rem, clickable; clicking the active star clears the rating).
    • Amber: filled #c8a03a, unfilled rgba(200, 160, 58, 0.25).
  • Reader settings (hamburger menu):
    • Content width slider (30100 vw), persisted as reader-content-width-pct.
    • Text colour: 5 warm-tone presets #e8e2d9#938d86, persisted as reader-text-colour.
    • Hamburger and back-link separated with margin-left: 1rem on .header-back.
  • Reader supports EPUB and PDF:
    • EPUB: chapter-text rendering; progress = {chapterIndex}:{scrollFrac}; progress % = (chapterIndex + scrollFrac) / total * 100.
    • PDF: page-image rendering via /library/pdf/{filename}?page=N; page count from /api/pdf/info/{filename}; progress = {pageIndex}:0; keyboard/button navigation identical.
    • reader.html branches on FORMAT variable injected by the server.
  • Edit EPUB button in Book Detail is only shown for .epub files.
  • Backup page supports: manual run, dry-run, Dropbox root, retention count, schedule (on/off + hours), status + history.
  • Bookmarks: saved per book via POST /library/bookmarks/{filename}; shown in Library sidebar section; navigated via ?bm_ch=N&bm_scroll=F URL params on reader page.
  • Convert page: after loading metadata, if a book with the same title+author already exists in the library, a warning banner is shown (with a link to the existing book); user can still proceed with conversion. Check is done server-side in /preload response (already_exists, existing_books).
  • Duplicates view (#duplicates): groups non-archived books by (title, author) (case-insensitive); shows only groups with ≥ 2 copies; counter in sidebar shows total number of duplicate books. Detection is entirely client-side from the existing library data.
  • Book Builder (/builder): create EPUB books from scratch; drafts stored in builder_drafts (JSONB chapters); contenteditable editor with toolbar (bold/italic/underline/blockquote/author-note/scene-break/normalize); autosave every 30 s + Ctrl+S; publish normalizes HTML via normalize_wysiwyg_html() and builds EPUB via build_epub().

Known Conventions

  • Book deletion flow: unlink file → prune_empty_dirs(parent)DELETE FROM library (cascade removes child rows).
  • Empty dir pruning: prune_empty_dirs(start) walks up from start to LIBRARY_ROOT, removing each dir if empty; stops at first non-empty dir.
  • Cover strategy:
    • EPUB: GET /library/cover/{filename} checks library_cover_cache first; on miss, extracts from ZIP and warms the cache. Cover upload (POST /library/cover/{filename}) replaces the image inside the EPUB ZIP (OPF located via META-INF/container.xml, old cover found in manifest and removed) and updates the cache so subsequent requests return the new cover immediately.
    • PDF: first page rendered as thumbnail, cached
    • CBR/CBZ: first page extracted, cached
  • Rating storage:
    • EPUB: <meta name="novela:rating" content="N"/> in OPF
    • CBZ: <NovelaRating>N</NovelaRating> in ComicInfo.xml inside the ZIP
    • CBR/PDF: DB only
    • upsert_book uses CASE WHEN EXCLUDED.rating > 0 THEN EXCLUDED.rating ELSE library.rating END to restore rating from file without overwriting existing DB value.
  • Tag types in book_tags: genre, subgenre, tag, subject. No direct genres/subgenres fields on book objects; always use helpers bookGenres(), bookSubgenres(), bookPlainTags().

Performance Notes

  • Library load is optimized for large datasets:
    • list_library_json() uses pre-aggregation for reading_sessions.
    • has_cached_cover is provided directly via SQL join instead of full cache fetch.
  • Additional migration indexes:
    • idx_library_sort_coalesce
    • idx_library_needs_review
    • idx_library_archived
    • idx_reading_sessions_filename_readat
    • idx_book_tags_filename_tag

Known Bugs Fixed

  • renderGenreView and renderSearchResults in library.js referenced b.genres (non-existent). Fixed: use bookGenres(), bookSubgenres(), bookPlainTags().
  • PillInput in book.js did not handle comma as delimiter and did not flush on save. Fixed: comma keydown + flush() in saveEdit().
  • PATCH /library/book failed for PDFs: _sync_epub_metadata tried to open PDF as ZIP. Fixed: only called for .epub.
  • _make_rel_path in reader.py lacked format prefix (epub/, pdf/, comics/). Fixed: aligned with common.make_rel_path.
  • common.make_rel_path always generated .cbr extension for CBZ files (both map to media_type="cbr"). Fixed: accepts optional ext parameter; library.py import now passes actual suffix.
  • /download/{filename} was referenced in book.html but no endpoint existed (404). Fixed: added GET /download/{filename} to library.py.
  • PDF reader showed infinite loading: reader.html called EPUB-only /library/chapters/. Fixed: PDF path uses /api/pdf/info/ + page-image rendering.
  • Empty dir pruning only ran when file was moved. Fixed: prune_empty_dirs(old_path.parent) always runs after a successful metadata save.