novela/docs/TECHNICAL.md
2026-04-15 21:39:20 +02:00

615 lines
47 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Novela 2.0 - Technical Status (Develop)
## Scope
This document describes the current technical status of the `develop` codebase.
It is the primary technical reference for the current implementation.
## Architecture
- Stack: FastAPI, Jinja2 templates, plain JavaScript, PostgreSQL 16, Docker.
- All routers import `templates` from `shared_templates.py` (a single `Jinja2Templates` instance). This module registers a `develop_mode()` callable as a Jinja2 global, making it available in every template without passing it explicitly per route.
- Startup lifecycle (`main.py`):
1. `init_pool()`
2. `run_migrations()`
3. `start_backup_scheduler()`
4. mount routers
- Shutdown lifecycle:
1. `stop_backup_scheduler()`
2. `close_pool()`
- Source-of-truth rule: files on disk are authoritative, the database is an index/cache.
## File Storage Paths
All files are stored under `library/` (relative to the app working directory, mapped via Docker volume).
`LIBRARY_DIR = Path("library")`, `LIBRARY_ROOT = LIBRARY_DIR.resolve()`.
### Path structure per format
| Format | Path pattern |
|--------|-------------|
| EPUB (no series) | `library/epub/{publisher}/{author}/Stories/{title}.epub` |
| EPUB (series) | `library/epub/{publisher}/{author}/Series/{series}/{idx:03d}_-_{title}.epub` |
| PDF | `library/pdf/{publisher}/{author}/{title}.pdf` |
| CBR (no series) | `library/comics/{publisher}/{author}/{title}.cbr` |
| CBR (series) | `library/comics/{publisher}/{author}/Series/{series}/{idx:03d}_-_{title}.cbr` |
| CBZ (no series) | `library/comics/{publisher}/{author}/{title}.cbz` |
| CBZ (series) | `library/comics/{publisher}/{author}/Series/{series}/{idx:03d}_-_{title}.cbz` |
- Segments are sanitised: special chars stripped, spaces replaced with `_`, max lengths applied (publisher/author 80, title 140, series 80).
- Series index is zero-padded to 3 digits (`001`, `002`, …), clamped to 1999.
- Duplicate filenames get a `(2)`, `(3)`, … suffix.
- After any file move, empty parent directories are pruned up to `LIBRARY_ROOT`.
### Path logic
- `common.make_rel_path(media_type, publisher, author, title, series, series_index, series_suffix, ext)` — used by import and grabber.
- `reader.py _make_rel_path(publisher, author, title, series, series_index, series_suffix, ext)` — used by metadata PATCH; same logic, uses actual file extension.
- `series_volume` is not part of the file path; it is stored in DB and OPF only.
- Both functions produce identical paths for all formats.
### Metadata save behaviour per format
| Format | File written? | DB written? |
|--------|--------------|-------------|
| EPUB | Yes — OPF metadata updated in-place | Yes |
| PDF | No | Yes |
| CBR | No | Yes |
| CBZ | No (tags/metadata); rating written to ComicInfo.xml | Yes |
---
## Router Status
### `routers/library.py`
- `GET /library` — library page
- `GET /api/library` — book list JSON (fast-path by default)
- `POST /library/rescan` — forced full disk rescan
- `POST /library/import` — upload EPUB/PDF/CBR/CBZ
- `DELETE /library/file/{filename}` — delete file + DB row + prune dirs
- `GET /download/{filename}` — download file with `Content-Disposition: attachment`
- `GET /library/cover/{filename}` — serve cover (EPUB from file; PDF/CBR from cache)
- `GET /library/cover-cached/{filename}` — serve cover from DB cache only
- `POST /library/cover/{filename}` — upload/replace cover; for EPUB files: embeds cover in the EPUB and updates cache; for DB-stored books: stores cover directly in `library_cover_cache` and sets `has_cover = TRUE`
- `POST /library/want-to-read/{filename}` — toggle want-to-read flag
- `POST /library/archive/{filename}` — toggle archived flag
- `POST /library/archive-series` — set `archived` for all books in a series; body: `{"series": "…", "archive": true|false}`; returns `{ok, archived, count}`
- `POST /library/new/mark-reviewed` — bulk set `needs_review=false`
- `POST /library/bulk-delete` — delete multiple files; accepts `{"filenames": [...]}`, removes files from disk and DB in one query per batch; returns `{ok, deleted, skipped}`
- `POST /library/rating/{filename}` — set/clear star rating `{"rating": 0-5}`
- `GET /home` — home page
- `GET /api/home` — home data JSON
- `GET /stats` — statistics page
- `GET /api/stats` — statistics data JSON
- `GET /api/disk` — partition usage for the library directory: `{total, used, free, pct_used}`
- `POST /api/bulk-check-duplicates` — accepts `{"items": [{title, author, series, volume}, ...]}`, returns `{"duplicates": [bool, ...]}` — checks by title+author+series_index; also checks by series+author+series_index as fallback (catches duplicate detection when title format changed); when volume is absent, matches on title+author only
- `GET /library/list` — compat alias
`GET /api/library` runs in fast-path mode by default (DB-only, no full disk rescan).
For a forced sync: `GET /api/library?rescan=true` or `POST /library/rescan`.
`include_file_info=true` is optional for file size/mtime enrichment.
ETag caching: response includes `ETag: "{count}-{max_updated_at_unix}"` and `Cache-Control: no-cache`. Client sends `If-None-Match`; server returns `304 Not Modified` when nothing changed.
`/api/home` returns:
- `continue_reading`
- `shorts_unread`
- `novels_unread`
- `shorts_read`
- `novels_read`
`/api/stats` returns totals plus chart/history data for `stats.html`:
- `reads_by_month`, `reads_by_dow`, `reads_by_hour`
- `genre_counts`, `publisher_counts`, `fav_genre`, `fav_publisher`
- `top_books`, `history`
Home sections exclude series books via:
- `COALESCE(series, '') = ''`
- `filename NOT LIKE '%/Series/%'`
Home read sections are ordered oldest-first:
- `shorts_read`: `ORDER BY MAX(read_at) ASC`
- `novels_read`: `ORDER BY MAX(read_at) ASC`
### `routers/reader.py`
- `GET /library/db-images/{path:path}` — serve image from content-addressed imagestore (`library/images/`); security: path must be under `IMAGES_DIR`
- `POST /api/library/convert-to-db/{filename:path}` — convert on-disk EPUB to a DB-stored book; extracts chapters via `_epub_body_inner` (stores images in imagestore, rewrites src to `/library/db-images/…`), migrates all child tables (INSERT new library row → UPDATE children → DELETE old row), deletes EPUB file; returns `{ok, new_filename}`
- `GET /api/library/export-epub/{filename:path}` — build and stream an EPUB from a DB-stored book; `_rewrite_db_images_for_epub` rewrites `/library/db-images/…` back to `OEBPS/Images/…` paths (dedup by sha256); returns as `Content-Disposition: attachment`
- `GET /library/epub/{filename}` — serve EPUB inline (no attachment header)
- `GET /library/chapters/{filename}` — EPUB spine as JSON; for `storage_type='db'` books returns chapters from `book_chapters`
- `GET /library/chapter/{index}/{filename}` — single chapter as HTML fragment; for `storage_type='db'` books reads from `book_chapters`
- `GET /library/chapter-img/{path}?filename=…` — image extracted from EPUB ZIP; `path` is the full internal ZIP path (e.g. `OEBPS/Images/cover.jpg` or `EPUB/images/cover.jpg`); case-insensitive fallback for mismatched folder names
- `GET /library/pdf/{filename}?page=N&dpi=150` — render PDF page as PNG
- `GET /api/pdf/info/{filename}``{"page_count": N}`
- `GET /library/cbr/{filename}/{page}` — CBR/CBZ page as image
- `GET /library/progress/{filename}` — read progress
- `POST /library/progress/{filename}` — save progress `{"cfi": "…", "progress": N}`
- `DELETE /library/progress/{filename}` — clear progress
- `POST /library/mark-read/{filename}` — mark as read (with optional date)
- `GET /library/book/{filename}` — book detail page
- `GET /api/genres` — all tags from `book_tags` (optional `?type=genre|subgenre|tag`)
- `PATCH /library/book/{filename}` — update metadata + tags; moves file if path fields change; DB-only for non-EPUB; for `storage_type='db'` books: recomputes synthetic `db/…` filename, FK-safe rename (INSERT→UPDATE children→DELETE old), updates `book_chapters` + `bookmarks` as well
- `POST /library/rating/{filename}` — set/clear 15 star rating; writes to EPUB OPF / CBZ ComicInfo.xml; DB-only for CBR/PDF
- `GET /library/read/{filename}` — reader page (EPUB or PDF); supports `?bm_ch=N&bm_scroll=F` to jump to bookmark position
- `GET /api/series-nav/{filename}` — returns `{prev, next}` (`{filename, title, index, suffix}` or `null`) for the adjacent books in the same series ordered by `series_index ASC, series_suffix ASC`; used by the reader for series navigation buttons and `markRead()` redirect
- `GET /library/bookmarks/{filename}` — list bookmarks for a book
- `POST /library/bookmarks/{filename}` — add bookmark `{chapter_index, scroll_frac, chapter_title, note}`
- `PATCH /library/bookmarks/{id}` — update bookmark note
- `DELETE /library/bookmarks/{id}` — delete bookmark
- `GET /api/bookmarks` — all bookmarks across all books (includes `book_title`, `book_author`)
### `routers/bulk_import.py`
- `GET /bulk-import` — Bulk Import page
- `POST /library/bulk-import` — import files with pre-parsed metadata; accepts multipart `files[]`, `rows` (JSON array of per-file metadata), `shared` (JSON with author/publisher/status/genres/tags applied to all files)
Filename parsing is done client-side in `bulk_import.html`. The page uses a free-text `%placeholder%` pattern (e.g. `%series% - %series_volume% - %volume% - %title% - %year%`). Available placeholders: `%series%` `%series_volume%` `%volume%` `%title%` `%year%` `%month%` `%day%` `%author%` `%publisher%` `%ignore%`. Colored chips can be clicked (insert at cursor) or dragged onto the input. Pattern is converted to a regex at parse time. Shared metadata fields (including "Year/Vol." for `series_volume`) override filename-parsed values. "Auto-generate titles" checkbox fills empty title cells as `Series (Year/Vol) #Number`. Skip checkbox is always visible for every row; skipped rows are excluded from import. Files are uploaded in batches of 5 with a progress bar.
### `routers/editor.py`
- `GET /library/editor/{filename}` — chapter editor page; supports both EPUB files and DB-stored books (`db/…` filenames); passes `is_db` flag to template; DB branch queries `library` table directly (no file check)
- `GET /api/edit/chapter/{index}/{filename}` — get chapter content; DB branch reads from `book_chapters` and returns `{index, href, title, content}`
- `POST /api/edit/chapter/{index}/{filename}` — save chapter; DB branch accepts `{content, title}`, calls `upsert_chapter` (updates `content_tsv` too)
- `POST /api/edit/chapter/add/{filename}` — add new chapter after `after_index`; DB branch shifts `chapter_index` up via `UPDATE … SET chapter_index = chapter_index + 1 WHERE chapter_index >= insert_idx` then inserts
- `DELETE /api/edit/chapter/{index}/{filename}` — delete chapter; DB branch deletes and re-indexes via `UPDATE … SET chapter_index = chapter_index - 1 WHERE chapter_index > index`
### `routers/grabber.py`
- `GET /grabber` — grabber page
- `GET /convert` — convert page
- `GET /credentials-manager` — credentials manager UI
- `GET /debug` — debug page
- `POST /debug/run` — run debug scrape
- `GET /credentials` — list stored credentials
- `POST /credentials` — save credential
- `DELETE /credentials/{site}` — delete credential
- `POST /preload` — preload book info from URL
- `POST /convert` — run scrape; body may include `storage_mode: "db"` (default) or `"epub"` to control output format
- `GET /events/{job_id}` — SSE stream for job progress; `done` event includes `storage_type` (`'db'` or `'file'`)
Scrape/convert flow (DB storage — default):
1. Fetch book info + chapters via scraper
2. Per chapter: download images → write to `library/images/{sha2}/{sha256}{ext}` (content-addressed) → rewrite `img[src]` to `/library/db-images/...`; break images replaced with `<hr>` before `element_to_xhtml` runs → build `content_html` via `element_to_xhtml` with `break_img_path="/static/break.png"`
3. One DB transaction: `ensure_unique_db_filename``upsert_book` (storage_type='db') → `upsert_chapter` for each chapter → `upsert_cover_cache` if cover provided
4. Synthetic filename: `db/{publisher}/{author}/{title}` (or `db/{pub}/{auth}/Series/{series}/{idx} - {title}` for series)
Scrape/convert flow (EPUB file — `storage_mode: "epub"`):
12. Same as DB flow; `break_img_path="../Images/break.png"` passed to `element_to_xhtml`
3. Chapters converted to XHTML via `make_chapter_xhtml`; EPUB file built via `make_epub` (embeds `static/break.png` as `OEBPS/Images/break.png`) and written to `library/epub/…`
4. `upsert_book` called with `storage_type='file'`
### Scrapers (`scrapers/`)
All scrapers inherit `BaseScraper` and implement `matches(url)`, `login()`, `fetch_book_info()`, `fetch_chapter()`. Registration order in `scrapers/__init__.py` determines priority (first match wins).
| Scraper | Domain | Login | Notes |
|---|---|---|---|
| `ArchiveOfOurOwnScraper` | archiveofourown.org | Optional | Uses authenticity token; adult content gate via `?view_adult=true` |
| `AwesomeDudeScraper` | awesomedude.org | No | Chapter discovery via `.htm/.html` links in same directory; content extracted from largest non-layout block |
| `CodeysWorldScraper` | codeysworld.org | No | See below |
| `GayAuthorsScraper` | gayauthors.org | Optional | Genres + subgenres from `itemprop="genre"` links; tags from `ipsTags` list |
| `IomfatsScraper` | iomfats.org | No | See below; requires chapter URL as entry point |
| `NiftyNewScraper` | new.nifty.org | No | See below; registered before NiftyScraper |
| `NiftyScraper` | nifty.org (classic) | No | See below; excludes new.nifty.org; category/subcategory stored as tags |
| `TedLouisScraper` | tedlouis.com | No | Story index URL required as entry point; all pages use `?t=TOKEN` routing; chapter links in `<ul class="story-index-list">` |
#### NiftyNewScraper
`new.nifty.org` is a Next.js RSC application. Pages render proper HTML with semantic markup — no plain-text email format.
- URL normalisation: `_to_index_url()` strips a trailing `/N` (chapter index) so any URL (index or chapter) can be passed as entry point. Story URL pattern: `/stories/{slug}-{id}`.
- `fetch_book_info()`:
- Title from `<h1>`; fallback: `<title>` with ` - … - Nifty Archive …` suffix stripped.
- Author from `<strong itemprop="name">` inside `<a href="/authors/{id}">`.
- Publication date from `<time itemprop="datePublished" datetime="…">`, updated date from `<time itemprop="dateModified" datetime="…">`; both truncated to `YYYY-MM-DD`.
- Tags from all `<ul aria-label="Tags">` containers on the page — covers both the story category links (`/collections/…`) and the AI-generated content tags (`/search?query=tags%3A…`); deduplicated; `genres` and `subgenres` are always empty.
- Description from `<meta name="description">`.
- Chapter list: `<a>` links matching `/stories/{slug}/N` collected from page HTML; fallback: regex scan of RSC stream for `"index": N` values. URLs generated as `{index_url}/1``{index_url}/max`.
- `fetch_chapter()`:
- Content extraction order:
1. Chapter HTML (`{url}`): read `<article>` and collect `<p>` text
2. Fallback on same HTML: extract escaped Next payload paragraphs (`\u003cp...\u003c/p`)
3. Last fallback (`{url}?_rsc=1`): parse RSC line format (`{hex_id}:{json}`) for `["$","p",…]` nodes, then escaped paragraph fallback
- Chapter title uses the precomputed chapter dict title (`Chapter N`).
- Lead/tail boilerplate detection for common Nifty intro/donate text. Removed boilerplate is preserved as invisible HTML comments in chapter content:
- `<!-- NIFTY_HIDDEN_LEAD: ... -->`
- `<!-- NIFTY_HIDDEN_TAIL: ... -->`
- No email-header stripping and no plain-text line-joining (those are specific to Nifty classic).
#### NiftyScraper
Nifty classic pages are plain-text email submissions wrapped in a `<pre>` element.
- URL normalisation: `_to_index_url()` strips the chapter segment so any URL (index or chapter) can be passed as the entry point. Path structure: `/nifty/{category}/{subcategory}/{story}/` (index, 4 segments) vs `/nifty/{category}/{subcategory}/{story}/{chapter}` (chapter, 5 segments).
- `fetch_book_info()` performs up to 3 extra HTTP requests: chapter 1 (author + publication date), last chapter (`updated_date`), chapter 2 (boilerplate detection). Author and dates are extracted from the email headers (`From:`, `Date:`) embedded at the top of each chapter file. Date is parsed via `email.utils.parsedate``YYYY-MM-DD`.
- Boilerplate detection: leading paragraphs of chapters 1 and 2 (after email-header strip) are compared using normalised text (lowercase, whitespace collapsed). Consecutive matching paragraphs are recorded as `preamble_count` and stored in each chapter dict; `fetch_chapter()` skips them.
- `fetch_chapter()` pipeline:
1. Extract `<pre>` text (fallback: full body text)
2. Parse `Subject:` header → store as `<!-- Subject: … -->` comment in chapter content (invisible in reader, extractable later)
3. Strip email header block (up to first blank line after `Date:`/`From:`/`Subject:` lines)
4. Skip first `preamble_count` paragraphs
5. Split on blank lines → paragraphs; join hard-wrapped lines within each paragraph with a space
6. Detect and remove lead/tail boilerplate (common notice/disclaimer/author promo/donate blocks)
7. Persist removed boilerplate as invisible comments:
- `<!-- NIFTY_HIDDEN_LEAD: ... -->`
- `<!-- NIFTY_HIDDEN_TAIL: ... -->`
8. Scene-break patterns (`***`, `---`, `~~~`, `• • •`, etc.) → `<hr/>`
9. Build `content_el` as a BeautifulSoup `<div>` of comments + `<p>` + `<hr/>` nodes
- Genres/subgenres from URL path: `category` (e.g. `gay``Gay`) and `subcategory` (e.g. `young-friends``Young Friends`).
#### CodeysWorldScraper
- Entry point: any `codeysworld.org` URL.
- Title from `<h1>`; author from `<h2>` matching `"by …"` pattern; fallback: URL path segment `/{author}/{category}/filename`.
- Category from URL path (second-to-last segment, e.g. `remembrances` → tag `"Remembrances"`).
- Chapter discovery: `.htm/.html` links in the same directory as the entry URL; audio/image links skipped. No chapter links → single-file story (entry URL is the only chapter).
- `fetch_chapter()`: removes all `<h1>`/`<h2>` headings, back-navigation links, audio links (`.mp3`), mailto links; falls back to `<body>` when no content wrapper is found.
#### IomfatsScraper
All stories by an author are listed on a single author page (`/storyshelf/hosted/{author}/`). Individual story pages do not exist.
- Entry point must be a **chapter URL** (`/storyshelf/hosted/{author}/{story-folder}/{chapter}.html`). Passing the author page URL raises a `ValueError` with a user-visible message.
- On load: navigates to the author page and scans `<div id="content">` for the matching story.
- Two page structures detected:
- **Single story**: outer `<h3>` = book title; chapters are direct `<li><a>` children of the following `<ul>`.
- **Multi-part series**: outer `<h3>` = series name; nested `<li><h3>` = book title per part; chapters in the sub-`<ul>` matching `story_folder`.
- Series index extracted from folder name suffix: `*-part{N}` or `*-{N}`.
- Publication status from `<p><small>[…]</small></p>` after the book title heading.
- `fetch_chapter()`: content from `<div id="content">`; removes `<h2>`/`<h3>` headings, `.chapternav` divs, `div.important` footer blocks, anchor-name elements.
#### TedLouisScraper
All pages on `tedlouis.com` use opaque token-based routing: `https://tedlouis.com/?t=<TOKEN>`. There are no predictable URL patterns — tokens must be followed from the story index page.
- Entry point must be a **story index URL** (the page listing all chapters). Passing a chapter URL raises a `ValueError` with a user-visible message. Detection: story index has `<h2 class="story-page-title">`, chapter page has `<h1 class="story-title">`.
- `fetch_book_info()`:
- Title from direct `NavigableString` children of `<h2 class="story-page-title">` — the element also contains a "Back" button (`<a class="btn">`) and the author byline (`<span class="story-author-by-line">`), which are skipped.
- Author from `<span class="story-author-by-line"> <a>`.
- Publication status from `<span class="story-status-text">` with "Status: " prefix stripped.
- Updated date from `<span class="story-last-updated">` ("Last Updated: Month D, YYYY") → `YYYY-MM-DD`.
- Chapter list from all `<ul class="story-index-list">` elements (three columns on the page); relative `?t=TOKEN` hrefs resolved to absolute URLs. Order preserved; duplicates deduplicated.
- No genres, subgenres, tags or description available on the page.
- `fetch_chapter()`: content from `<div id="chapter">`; strips `<h1 class="story-title">`, `<h2 class="chapter-title">`, `div.chapter-copyright-line`, and `div.chapter-copyright-notice-text` blocks. Chapter title refined from `<h2 class="chapter-title"> <span>`.
#### `xhtml.element_to_xhtml()` — Comment handling
`bs4.Comment` objects (a `NavigableString` subclass) are now emitted as XML comments: `<!-- … -->`. The `--` sequence (illegal inside XML comments) is sanitised to `- -`. This allows scrapers to embed invisible metadata (e.g. the Nifty `Subject:` header) in chapter content without it appearing in the rendered reader.
### `routers/search.py`
- `GET /search` — full-text search page (`search.html`); Enter-to-search, `?q=` param auto-runs on load
- `GET /api/search?q=…&mode=phrase|words&filter=all|unread_novels|unread_shorts` — FTS over `book_chapters.content_tsv`; `mode=phrase` (default) uses `phraseto_tsquery` (words in order); `mode=words` uses `plainto_tsquery` (all words present, any order); `ts_rank` and `ts_headline` always use `plainto_tsquery`; also matches chapters whose `title` contains the query (case-insensitive LIKE fallback); no result limit; excludes archived books; `filter=unread_novels` restricts to books with no reading sessions/progress and no `Shorts` tag; `filter=unread_shorts` restricts to books with no reading sessions/progress and a `Shorts` tag; results include `filename`, `title`, `author`, `chapter_index`, `chapter_title`, `snippet`, `rank`
### `routers/settings.py`
- `GET /settings` — settings page
- `GET /api/app-settings` — returns `{"develop_mode": bool, "break_image_url": str|null}`
- `PATCH /api/app-settings` — accepts `{"develop_mode": bool}`, persists to `app_settings` table
- `POST /api/app-settings/break-image` — multipart file upload (PNG/JPG/WebP); stores image in imagestore + overwrites `static/break.png`; saves `break_image_sha256` + `break_image_ext` to `app_settings`; returns `{"ok": true, "url": "/library/db-images/…"}`
- `GET /api/break-patterns` — list chapter-break patterns
- `POST /api/break-patterns` — add break pattern (type: `regex` or `css_class`)
- `PATCH /api/break-patterns/{id}` — update pattern (enable/disable or change value)
- `DELETE /api/break-patterns/{id}` — delete pattern
- `DELETE /api/reading-history` — wipe all reading sessions
`app_settings` table (single row, `id = 1`): `develop_mode BOOLEAN`, `break_image_sha256 VARCHAR(64)`, `break_image_ext VARCHAR(10)`.
### `routers/builder.py`
- `GET /builder` — Book Builder index (draft list + new draft form)
- `POST /builder` — create new draft; redirects to `/builder/{id}`
- `GET /builder/{draft_id}` — draft editor page
- `DELETE /api/builder/{draft_id}` — delete draft
- `GET /api/builder/{draft_id}` — draft JSON (id, title, author, publisher, source_url, chapters)
- `POST /api/builder/{draft_id}/chapter` — add chapter `{title, after_index}`; returns `{index, count}`
- `PUT /api/builder/{draft_id}/chapter/{idx}` — save chapter `{title?, content?}`
- `DELETE /api/builder/{draft_id}/chapter/{idx}` — delete chapter; returns `{index, count}`
- `POST /api/builder/{draft_id}/normalize/{idx}` — normalize chapter HTML (preview only, does not save); returns `{content}`
- `POST /api/builder/{draft_id}/publish` — normalize all chapters → `build_epub()` → write to `library/epub/``upsert_book()` → delete draft; returns `{filename}`; redirects browser to `/library/book/{filename}`
Publish flow: all chapters are run through `normalize_wysiwyg_html()`, then `build_epub()` produces an EPUB 2.0 ZIP. The file path is computed via `make_rel_path(media_type="epub", …)`. The book is inserted into the library with `needs_review=True`. The draft is deleted on success.
### `routers/following.py`
- `GET /following` — Following page (author URL management)
- `GET /api/following` — all distinct library authors with URL (if set), book count, and last-added date
- `POST /api/following/{author_name}` — set or clear URL for an author (empty `url` removes the record)
`GET /api/following` returns one entry per non-archived author:
```json
{ "name": "Author Name", "book_count": 5, "last_added": "2026-03-27T…", "url": "https://…" }
```
URL is stored in the `authors` table (`name` unique, `url`, `created_at`, `updated_at`).
### `routers/backup.py`
- `GET /backup` — backup page
- `GET /api/backup/credentials` — Dropbox settings (includes `app_key_configured` flag)
- `POST /api/backup/credentials` — save Dropbox settings
- `DELETE /api/backup/credentials` — remove all Dropbox credentials
- `POST /api/backup/oauth/prepare` — save app key + secret, return Dropbox auth URL
- `POST /api/backup/oauth/exchange` — exchange authorization code for refresh token
- `GET /api/backup/health` — Dropbox connectivity check (includes `schedule_enabled`, `schedule_interval_hours`)
- `GET /api/backup/status` — current backup status
- `GET /api/backup/history` — backup run history (last 20)
- `GET /api/backup/progress` — live progress of running backup `{running, done, total, phase}`
- `POST /api/backup/run` — trigger backup (background task)
- `GET /api/backup/snapshots` — list available snapshots `{ok, snapshots: [{name, created_at}]}`
- `GET /api/backup/snapshots/{snapshot_name}/files` — list files in a snapshot with local existence check `{ok, snapshot, files: [{path, size, sha256, exists_locally}]}`
- `POST /api/backup/restore` — restore files from a snapshot: `{snapshot_name, files: [rel_paths]}`; downloads from Dropbox, writes to disk, re-indexes via `scan_media` + `upsert_book`; returns `{ok, restored, total, results: [{path, ok, error?}]}`
---
## Backup & Security
- Dropbox token (refresh token or legacy access token) stored encrypted in `credentials` (`site='dropbox'`).
- Dropbox app key stored encrypted in `credentials` (`site='dropbox_app_key'`).
- Dropbox app secret stored encrypted in `credentials` (`site='dropbox_app_secret'`).
- Dropbox backup root stored encrypted in `credentials` (`site='dropbox_backup_root'`).
- Retention (`snapshots to keep`) stored encrypted in `credentials` (`site='dropbox_backup_retention'`).
- Backup schedule (`enabled` + `interval_hours`) stored encrypted in `credentials` (`site='dropbox_backup_schedule'`).
- Encryption uses `NOVELA_MASTER_KEY` (Fernet).
### Dropbox authentication
- Preferred: OAuth2 refresh token (does not expire). Set up via the two-step flow on `/backup`:
1. Enter App Key + App Secret → click **Generate Auth URL**
2. Approve in browser → paste the code → click **Save & Activate**
- `_dbx()` uses `oauth2_refresh_token` + `app_key` + `app_secret` for automatic token renewal.
- Fallback: legacy short-lived access token (backwards compatible; works without app key/secret).
### Implementation details
- Versioned backups with deduplication:
- file objects in Dropbox: `library_objects/{sha256_prefix}/{sha256}`
- snapshots in Dropbox: `library_snapshots/snapshot-YYYYMMDD-HHMMSS.json`
- Each run creates a new snapshot version and uploads only missing objects.
- Retention removes older snapshots above the configured limit.
- Orphan object pruning removes objects no longer referenced by retained snapshots.
- Local manifest cache (`config/backup_manifest.json`) speeds up change detection.
- Database backup is done via `pg_dump` to Dropbox `postgres/`.
- `POST /api/backup/run` always starts a background task and returns immediately.
- `GET /api/backup/progress` returns in-memory progress updated per file; phases: `starting``scanning``uploading``snapshot``pg_dump`.
- Scheduler runs in the background (`start_backup_scheduler`) and triggers on interval when enabled.
- Concurrency guard: only one backup can run at a time.
- After container restart/crash, stale `running` logs are auto-marked as interrupted/error.
---
## Environment
`stack/novela.env` should include at least:
- `POSTGRES_DB`
- `POSTGRES_USER`
- `POSTGRES_PASSWORD`
- `NOVELA_MASTER_KEY`
- `CONFIG_DIR`
Dropbox settings are managed via the web UI on `/backup`.
---
## Branding
Static assets in `static/`:
| File | Size | Purpose |
|------|------|---------|
| `logo.png` | 546×575, transparent | Sidebar wordmark (displayed at 26px height) |
| `favicon.ico` | 16×16 | Browser tab (legacy) |
| `favicon-32.png` | 32×32 | Browser tab (modern) |
| `favicon-256.png` | 256×256 | Pinned tabs / high-DPI |
| `apple-touch-icon.png` | 180×180 | iOS/iPadOS home screen icon |
All 15 page templates include:
```html
<link rel="icon" href="/static/favicon.ico" sizes="16x16"/>
<link rel="icon" type="image/png" sizes="32x32" href="/static/favicon-32.png"/>
<link rel="icon" type="image/png" sizes="256x256" href="/static/favicon-256.png"/>
<link rel="apple-touch-icon" sizes="180x180" href="/static/apple-touch-icon.png"/>
```
Sidebar logo: `logo.png` (26px, flex-aligned) next to the "No**vela**" wordmark ("No" in `--text`, "vela" in `--accent`).
`apple-touch-icon.png` uses `#0f0e0c` background (= `--bg`) with the orange N logo centered at 60% of canvas size.
---
## Shared CSS (`static/theme.css`)
Single `:root { }` block defining all global CSS custom properties. Loaded first on every page (`<link rel="stylesheet" href="/static/theme.css"/>`). No template defines its own global colours — only page-specific layout vars stay inline.
| Variable | Value | Role |
|---|---|---|
| `--bg` | `#0f0e0c` | Page background |
| `--surface` | `#1a1815` | Card/panel background |
| `--surface2` | `#221f1b` | Nested surface |
| `--border` | `#2e2a24` | Borders |
| `--accent` | `#ffa20e` | Orange highlight (logo colour) |
| `--accent2` | `#ffb840` | Lighter orange |
| `--text` | `#e8e2d9` | Body text |
| `--text-dim` | `#8a8278` | Muted text |
| `--text-faint` | `#4a453e` | Very muted text |
| `--success` | `#6baa6b` | Success state |
| `--warning` | `#c8a03a` | Warning state |
| `--error` | `#c85a3a` | Error state |
| `--radius` | `6px` | Border radius |
| `--sidebar` | `220px` | Sidebar width |
| `--mono` | `'DM Mono', monospace` | Monospace font stack |
| `--serif` | `'Libre Baskerville', Georgia, serif` | Serif font stack |
Page-specific overrides: `reader.html` (`--header-h`, `--footer-h`, `--content-w`); `backup.html` (`--ok`, `--warn`, `--err`); `editor.css` (`--danger`, `--header-h`, `--panel-w`).
## Shared JavaScript (`static/books.js`)
Loaded before any page-specific script on every page that needs book data or UI helpers.
| Function | Purpose |
|---|---|
| `esc(s)` | HTML-escape a string for safe insertion into markup |
| `strHash(s)` | Deterministic integer hash of a string (for colour selection) |
| `COVER_PALETTES` | Array of 8 `[bg, fg]` colour pairs for placeholder covers |
| `wrapText(ctx, text, x, y, maxW, lineH)` | Canvas word-wrap helper |
| `truncate(s, n)` | Truncate string with ellipsis |
| `makePlaceholderCover(canvas, title, author)` | Draw a generated book cover on a `<canvas>` |
| `_filenameBase(filename)` | Strip path and extension from a filename |
| `bookTitle(b)` | Return display title (falls back to filename parsing) |
| `bookAuthor(b)` | Return display author (falls back to filename parsing) |
| `tagValuesByType(b, type)` | Return tag strings of a given type from `b.tags` |
| `bookGenres(b)` | Tags of type `genre`; falls back to `subject` |
| `bookSubgenres(b)` | Tags of type `subgenre` |
| `bookPlainTags(b)` | Tags of type `tag` |
| `filterBooks(books, query)` | Filter book list by query across title, author, publisher, genre, sub-genre, tag |
| `setupSearchInput(inputId, clearId, onSearch)` | Wire input: show/hide clear button on input; call `onSearch(query)` on Enter |
## Shared JavaScript (`static/conversion.js`)
Loaded by `index.html` (Convert page) and `grabber.html` (Grabber page). Requires `books.js` for `esc()`.
| Function | Purpose |
|---|---|
| `addLog(msg, cls)` | Append a log line to `#log-lines` |
| `connectConversionStream(job_id)` | Open SSE stream `/events/{job_id}` and handle all conversion events: `status`, `meta`, `chapters`, `progress`, `warning`, `error`, `done` |
## UI Notes
- Library import accepts EPUB/PDF/CBR/CBZ.
- Home supports the same import formats.
- Home includes search.
- Home header/dropzone alignment matches Library (search top-right, dropzone below).
- `New` view supports `Grid` and `List` mode.
- Bulk selection + `Remove from New` works only in `List` mode.
- `List` mode has a column visibility filter: Publisher, Author, Series, Volume, Title, Has cover, Updated, Genres, Sub-genres, Tags, Status.
- `List` mode supports multi-select with `Shift+click` range selection on checkboxes.
- `Grid` mode shows no selection checkboxes or bulk actions.
- `All books` view supports `Grid` and `List` mode (same columns as `New`).
- View mode persisted in `localStorage` as `novela.all.viewMode`.
- Column visibility persisted in `localStorage` as `novela.all.visibleColumns`.
- `List` mode has a checkbox column, column visibility filter, and multi-select with `Shift+click` range selection.
- `List` mode has a `Delete selected` bulk action: confirms then calls `DELETE /library/file/{filename}` for each selected book.
- Publication status values: `Complete`, `Ongoing`, `Temporary Hold`, `Long-Term Hold` (blank = unknown). `Hiatus` was renamed to `Long-Term Hold` via startup migration `migrate_rename_hiatus()`.
- Status badges (top-right of grid card cover): circular icon, dark fill `rgba(15,14,12,0.82)` + `box-shadow: 0 0 0 2px #0f0e0c` ring for visibility on any cover colour. Icon colour per status: Complete=green `#6baa6b`, Ongoing=blue `#4a90b8`, Temporary Hold=amber `#c8a03a`, Long-Term Hold=orange `#c8783a`. `statusBadgeHtml()` in `library.js` is the single source for badge HTML across all grid views.
- Want-to-read star (top-left of grid card cover): same dark fill + ring as status badges.
- Status pills in Book Detail (`book.css`): `status-complete`, `status-ongoing`, `status-temporary-hold`, `status-long-term-hold` — same colour scheme as badges.
- Grabber status mapping (`grabber.py`): `Temporary-Hold` (gayauthors.org) → `Temporary Hold`; `Long-Term Hold` passes through unchanged.
- Star ratings (15) shown under the cover in all grid views:
- Display-only in grid cards (no click, prevents accidental taps while scrolling).
- Interactive in Book Detail (1.1rem, clickable; clicking the active star clears the rating).
- Amber: filled `#c8a03a`, unfilled `rgba(200, 160, 58, 0.25)`.
- Reader settings (hamburger menu):
- Content width slider (30100 vw), persisted as `reader-content-width-pct`.
- Font size slider (80150%, default 105%), persisted as `reader-font-size`; applied via `--reader-font-size` CSS custom property on `#chapter-content`.
- Text colour: 5 warm-tone presets `#e8e2d9``#938d86`, persisted as `reader-text-colour`.
- Hamburger and back-link separated with `margin-left: 1rem` on `.header-back`.
- Reader supports EPUB, PDF, and CBR/CBZ:
- EPUB: chapter-text rendering; progress = `{chapterIndex}:{scrollFrac}`; progress % = `(chapterIndex + scrollFrac) / total * 100`.
- PDF: page-image rendering via `/library/pdf/{filename}?page=N`; page count from `/api/pdf/info/{filename}`; progress = `{pageIndex}:0`; keyboard/button navigation identical.
- `reader.html` branches on `FORMAT` variable injected by the server.
- Series navigation: on load, `loadSeriesNav()` fetches `/api/series-nav/{filename}` and activates prev/next volume buttons in the header (hidden when no series); `markRead()` redirects to `/library/read/{next.filename}` when a next volume exists, otherwise to the book detail page.
- `Edit EPUB` button in Book Detail is only shown for `.epub` files.
- Backup page supports: manual run, dry-run, Dropbox root, retention count, schedule (on/off + hours), status + history.
- Bookmarks: saved per book via `POST /library/bookmarks/{filename}`; shown in Library sidebar section; navigated via `?bm_ch=N&bm_scroll=F` URL params on reader page.
- Convert page: after loading metadata, if a book with the same title+author already exists in the library, a warning banner is shown (with a link to the existing book); user can still proceed with conversion. Check is done server-side in `/preload` response (`already_exists`, `existing_books`).
- Authors view (`#authors`): lists all authors across `allBooks` (active + archived); authors whose books are all archived still appear. Sidebar counter (`count-authors`) counts only active-book authors. Author detail view (`#authors/{name}`) also uses `allBooks`; archived books show the `.badge-archived` overlay on their cover.
- Publishers view (`#publishers`): same rule — `allBooks` (active + archived); publishers with only archived books still appear. Sidebar counter uses active books only. Publisher detail also uses `allBooks`.
- Series detail view (`#series/{name}`): shows all books in a series as a cover grid. Header contains an "Archive series" / "Unarchive series" button — calls `POST /library/archive-series` to set `archived` for every book in the series at once; the button label reflects whether any book is still active.
- Duplicates view (`#duplicates`): groups non-archived books by `(title, author)` (case-insensitive); shows only groups with ≥ 2 copies; counter in sidebar shows total number of duplicate books. Detection is entirely client-side from the existing library data.
- Incomplete view (`#incomplete`): shows all non-archived books where `publication_status` is not `Complete` (Ongoing, Temporary Hold, Long-Term Hold, or blank); sidebar counter included.
- Following page (`/following`): dedicated page in its own sidebar section between Library and Tools; shows all library authors with their external URL; two tabs — Following (authors with URL set) and All Authors; inline URL editing with keyboard support (Enter = save, Escape = cancel); clicking Visit opens the external URL in a new tab. Author URLs are stored in the `authors` table. Sidebar counter shows number of followed authors.
- Book Builder (`/builder`): create EPUB books from scratch; drafts stored in `builder_drafts` (JSONB chapters); contenteditable editor with toolbar (bold/italic/underline/blockquote/author-note/scene-break/normalize); autosave every 30 s + Ctrl+S; publish normalizes HTML via `normalize_wysiwyg_html()` and builds EPUB via `build_epub()`.
---
## Develop Mode
When enabled, every page shows a diagonal **DEVELOP** ribbon in the top-left corner and the browser tab title becomes **Novela Develop — …** instead of **Novela — …**.
- Persisted in `app_settings` table (single row, `id = 1`); created by `migrate_create_app_settings()`.
- `shared_templates._develop_mode()` reads this value from DB on every template render and is registered as a Jinja2 global (`develop_mode`), so all templates can use `{% if develop_mode() %}` without explicit context injection.
- Banner CSS lives in `static/sidebar.css` (`.develop-banner` / `.develop-banner-text`); rendered at the top of `templates/_sidebar.html`.
- Toggled via the **Develop mode** card on the Settings page (`/settings`); saving reloads the page so the banner and title take effect immediately.
---
## Known Conventions
- Book deletion flow: `unlink` file → `prune_empty_dirs(parent)``DELETE FROM library` (cascade removes child rows).
- Empty dir pruning: `prune_empty_dirs(start)` walks up from `start` to `LIBRARY_ROOT`, removing each dir if empty; stops at first non-empty dir.
- Cover strategy:
- EPUB: `GET /library/cover/{filename}` checks `library_cover_cache` first; on miss, extracts from ZIP and warms the cache. Cover upload (`POST /library/cover/{filename}`) replaces the image inside the EPUB ZIP (OPF located via `META-INF/container.xml`, old cover found in manifest and removed) and updates the cache so subsequent requests return the new cover immediately.
- PDF: first page rendered as thumbnail, cached
- CBR/CBZ: first page extracted, cached
- Rating storage:
- EPUB: `<meta name="novela:rating" content="N"/>` in OPF
- CBZ: `<NovelaRating>N</NovelaRating>` in `ComicInfo.xml` inside the ZIP
- CBR/PDF: DB only
- `upsert_book` uses `CASE WHEN EXCLUDED.rating > 0 THEN EXCLUDED.rating ELSE library.rating END` to restore rating from file without overwriting existing DB value.
- Tag types in `book_tags`: `genre`, `subgenre`, `tag`, `subject`. No direct `genres`/`subgenres` fields on book objects; always use helpers `bookGenres()`, `bookSubgenres()`, `bookPlainTags()`.
- `series_volume` (e.g. `"1982"`) is used for annual comic series where issue numbers restart each year. It is separate from `series_index` (issue number within the year) and `series_suffix` (letter variant like `"a"`). Stored in DB and EPUB OPF (`novela:series_volume`); not reflected in the file path. Sort order: `series → series_volume → series_index → series_suffix`. In `getSeriesSlots`, gap-detection runs per volume independently when any book has `series_volume` set; slot labels show as `(year) #index`.
---
## Performance Notes
- Library load is optimized for large datasets (1000+ books):
- `list_library_json()` uses `json_agg` in the main query to inline tags per book — eliminates a separate `SELECT * FROM book_tags` query and Python merge loop.
- `has_cached_cover` is provided directly via SQL join instead of full cache fetch.
- `reading_sessions` is pre-aggregated in a subquery.
- ETag on `/api/library`: cheap `COUNT + MAX(updated_at)` query before full load; `304 Not Modified` on cache hit.
- Front-end rendering uses `IntersectionObserver` to defer both cover image loading and placeholder canvas drawing until cards enter the viewport — prevents hundreds of simultaneous HTTP requests and canvas operations on initial render.
- `renderBooksGrid`, `renderDuplicatesView`, `renderSeriesDetail` all use a single DOM pass: cover `<img>` and `<canvas>` are set up via `card.querySelector` immediately after `innerHTML` is set, eliminating a second full iteration with `document.getElementById` calls.
- Additional migration indexes:
- `idx_library_sort_coalesce`
- `idx_library_needs_review`
- `idx_library_archived`
- `idx_reading_sessions_filename_readat`
- `idx_book_tags_filename_tag`
---
## DB-Stored Books
Books scraped via the grabber are stored entirely in PostgreSQL (`storage_type = 'db'`). No EPUB file is written.
### New tables
| Table | Key columns | Notes |
|---|---|---|
| `book_chapters` | `filename FK, chapter_index, title, content TEXT, content_tsv TSVECTOR` | Unique on `(filename, chapter_index)`; GIN index on `content_tsv` for FTS; `content_tsv` is `to_tsvector('simple', title || ' ' || stripped_html)` — title included for title-based FTS matches |
| `book_images` | `sha256 PK, ext, media_type, size_bytes` | Content-addressed; files live at `library/images/{sha256[:2]}/{sha256}{ext}` |
### `library.storage_type`
| Value | Meaning |
|---|---|
| `'file'` | Book lives on disk (EPUB/PDF/CBR/CBZ); default for all existing books |
| `'db'` | Book content lives in `book_chapters`; no file on disk |
### Synthetic filename for DB books
`db/{publisher}/{author}/{title}` — or for series: `db/{publisher}/{author}/Series/{series}/{idx:03d} - {title}`
Same sanitization rules as file-based paths. Uniqueness enforced via `ensure_unique_db_filename` (DB lookup, not filesystem).
### Chapter editor for DB books
`GET /library/editor/{filename}` supports DB-stored books. The Monaco editor shows `language: 'html'` for DB books (vs `'xml'` for EPUB). The header shows a title input instead of a read-only chapter name. Unsaved content and titles are preserved across chapter switches via `pendingContent` and `pendingTitles` maps. `editor.focus()` is called after every content load so the editor is immediately interactive.
### Imagestore
Images embedded in chapter HTML are stored content-addressed at `library/images/{sha256[:2]}/{sha256}{ext}`.
- Served via `GET /library/db-images/{path:path}`
- URLs embedded in `book_chapters.content` as absolute paths: `/library/db-images/...`
- `book_images` table registers each unique image (auto-deduplication via sha256)
### EPUB → DB conversion
`POST /api/library/convert-to-db/{filename}` converts an on-disk EPUB to `storage_type='db'`:
1. Parse EPUB spine → per item: extract body HTML via `_epub_body_inner`, store images in imagestore via `write_image_file`, rewrite `img[src]` to `/library/db-images/…`
2. Compute new synthetic `db/…` filename via `make_rel_path(media_type="db", …)` + `ensure_unique_db_filename`
3. DB transaction: INSERT new library row (storage_type='db') → UPDATE all child tables (book_tags, reading_progress, reading_sessions, bookmarks, library_cover_cache, book_chapters) → DELETE old library row
4. Delete EPUB file from disk + `prune_empty_dirs`
### DB → EPUB export
`GET /api/library/export-epub/{filename}` streams an EPUB built from DB content:
1. Query metadata, tags, chapters, cover from DB
2. Per chapter: `_rewrite_db_images_for_epub` strips `/library/db-images/` prefix, reads files from `IMAGES_DIR`, deduplicates by sha256, assigns `OEBPS/Images/{sha256}{ext}` paths, rewrites `img[src]` to `../Images/…`
3. Build EPUB via `make_epub()`; return as `Content-Disposition: attachment`
---
## Known Bugs Fixed
- `renderGenreView` and `renderSearchResults` in `library.js` referenced `b.genres` (non-existent). Fixed: use `bookGenres()`, `bookSubgenres()`, `bookPlainTags()`.
- `PillInput` in `book.js` did not handle comma as delimiter and did not flush on save. Fixed: comma keydown + `flush()` in `saveEdit()`.
- `PillInput._add` in `book.js` added a pasted comma-separated list as one tag instead of splitting it. Fixed: `_add` now splits the value on commas and pushes each trimmed, non-empty, non-duplicate part individually.
- `PATCH /library/book` failed for PDFs: `_sync_epub_metadata` tried to open PDF as ZIP. Fixed: only called for `.epub`.
- `_make_rel_path` in `reader.py` lacked format prefix (`epub/`, `pdf/`, `comics/`). Fixed: aligned with `common.make_rel_path`.
- `common.make_rel_path` always generated `.cbr` extension for CBZ files (both map to `media_type="cbr"`). Fixed: accepts optional `ext` parameter; `library.py` import now passes actual suffix.
- `/download/{filename}` was referenced in `book.html` but no endpoint existed (404). Fixed: added `GET /download/{filename}` to `library.py`.
- PDF reader showed infinite loading: `reader.html` called EPUB-only `/library/chapters/`. Fixed: PDF path uses `/api/pdf/info/` + page-image rendering.
- Empty dir pruning only ran when file was moved. Fixed: `prune_empty_dirs(old_path.parent)` always runs after a successful metadata save.