novela/docs/TECHNICAL.md
2026-03-28 01:08:25 +01:00

320 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Novela 2.0 - Technical Status (Develop)
## Scope
This document describes the current technical status of the `develop` codebase.
It is the primary technical reference for the current implementation.
## Architecture
- Stack: FastAPI, Jinja2 templates, plain JavaScript, PostgreSQL 16, Docker.
- Startup lifecycle (`main.py`):
1. `init_pool()`
2. `run_migrations()`
3. `start_backup_scheduler()`
4. mount routers
- Shutdown lifecycle:
1. `stop_backup_scheduler()`
2. `close_pool()`
- Source-of-truth rule: files on disk are authoritative, the database is an index/cache.
## File Storage Paths
All files are stored under `library/` (relative to the app working directory, mapped via Docker volume).
`LIBRARY_DIR = Path("library")`, `LIBRARY_ROOT = LIBRARY_DIR.resolve()`.
### Path structure per format
| Format | Path pattern |
|--------|-------------|
| EPUB (no series) | `library/epub/{publisher}/{author}/Stories/{title}.epub` |
| EPUB (series) | `library/epub/{publisher}/{author}/Series/{series}/{idx:03d} - {title}.epub` |
| PDF | `library/pdf/{publisher}/{author}/{title}.pdf` |
| CBR | `library/comics/{publisher}/{author}/{title}.cbr` |
| CBZ | `library/comics/{publisher}/{author}/{title}.cbz` |
- Segments are sanitised: special chars stripped, max lengths applied (publisher/author 80, title 140, series 80).
- Series index is zero-padded to 3 digits (`001`, `002`, …), clamped to 1999.
- Duplicate filenames get a `(2)`, `(3)`, … suffix.
- After any file move, empty parent directories are pruned up to `LIBRARY_ROOT`.
### Path logic
- `common.make_rel_path(media_type, publisher, author, title, series, series_index, ext)` — used by import and grabber.
- `reader.py _make_rel_path(publisher, author, title, series, series_index, ext)` — used by metadata PATCH; same logic, uses actual file extension.
- Both functions produce identical paths for all formats.
### Metadata save behaviour per format
| Format | File written? | DB written? |
|--------|--------------|-------------|
| EPUB | Yes — OPF metadata updated in-place | Yes |
| PDF | No | Yes |
| CBR | No | Yes |
| CBZ | No (tags/metadata); rating written to ComicInfo.xml | Yes |
---
## Router Status
### `routers/library.py`
- `GET /library` — library page
- `GET /api/library` — book list JSON (fast-path by default)
- `POST /library/rescan` — forced full disk rescan
- `POST /library/import` — upload EPUB/PDF/CBR/CBZ
- `DELETE /library/file/{filename}` — delete file + DB row + prune dirs
- `GET /download/{filename}` — download file with `Content-Disposition: attachment`
- `GET /library/cover/{filename}` — serve cover (EPUB from file; PDF/CBR from cache)
- `GET /library/cover-cached/{filename}` — serve cover from DB cache only
- `POST /library/cover/{filename}` — upload/replace cover (EPUB only)
- `POST /library/want-to-read/{filename}` — toggle want-to-read flag
- `POST /library/archive/{filename}` — toggle archived flag
- `POST /library/new/mark-reviewed` — bulk set `needs_review=false`
- `POST /library/rating/{filename}` — set/clear star rating `{"rating": 0-5}`
- `GET /home` — home page
- `GET /api/home` — home data JSON
- `GET /stats` — statistics page
- `GET /api/stats` — statistics data JSON
- `GET /library/list` — compat alias
`GET /api/library` runs in fast-path mode by default (DB-only, no full disk rescan).
For a forced sync: `GET /api/library?rescan=true` or `POST /library/rescan`.
`include_file_info=true` is optional for file size/mtime enrichment.
ETag caching: response includes `ETag: "{count}-{max_updated_at_unix}"` and `Cache-Control: no-cache`. Client sends `If-None-Match`; server returns `304 Not Modified` when nothing changed.
`/api/home` returns:
- `continue_reading`
- `shorts_unread`
- `novels_unread`
- `shorts_read`
- `novels_read`
`/api/stats` returns totals plus chart/history data for `stats.html`:
- `reads_by_month`, `reads_by_dow`, `reads_by_hour`
- `genre_counts`, `publisher_counts`, `fav_genre`, `fav_publisher`
- `top_books`, `history`
Home sections exclude series books via:
- `COALESCE(series, '') = ''`
- `filename NOT LIKE '%/Series/%'`
Home read sections are ordered oldest-first:
- `shorts_read`: `ORDER BY MAX(read_at) ASC`
- `novels_read`: `ORDER BY MAX(read_at) ASC`
### `routers/reader.py`
- `GET /library/epub/{filename}` — serve EPUB inline (no attachment header)
- `GET /library/chapters/{filename}` — EPUB spine as JSON
- `GET /library/chapter/{index}/{filename}` — single EPUB chapter as HTML fragment
- `GET /library/chapter-img/{path}?filename=…` — image extracted from EPUB ZIP; `path` is the full internal ZIP path (e.g. `OEBPS/Images/cover.jpg` or `EPUB/images/cover.jpg`); case-insensitive fallback for mismatched folder names
- `GET /library/pdf/{filename}?page=N&dpi=150` — render PDF page as PNG
- `GET /api/pdf/info/{filename}``{"page_count": N}`
- `GET /library/cbr/{filename}/{page}` — CBR/CBZ page as image
- `GET /library/progress/{filename}` — read progress
- `POST /library/progress/{filename}` — save progress `{"cfi": "…", "progress": N}`
- `DELETE /library/progress/{filename}` — clear progress
- `POST /library/mark-read/{filename}` — mark as read (with optional date)
- `GET /library/book/{filename}` — book detail page
- `GET /api/genres` — all tags from `book_tags` (optional `?type=genre|subgenre|tag`)
- `PATCH /library/book/{filename}` — update metadata + tags; moves file if path fields change; DB-only for non-EPUB
- `POST /library/rating/{filename}` — set/clear 15 star rating; writes to EPUB OPF / CBZ ComicInfo.xml; DB-only for CBR/PDF
- `GET /library/read/{filename}` — reader page (EPUB or PDF); supports `?bm_ch=N&bm_scroll=F` to jump to bookmark position
- `GET /library/bookmarks/{filename}` — list bookmarks for a book
- `POST /library/bookmarks/{filename}` — add bookmark `{chapter_index, scroll_frac, chapter_title, note}`
- `PATCH /library/bookmarks/{id}` — update bookmark note
- `DELETE /library/bookmarks/{id}` — delete bookmark
- `GET /api/bookmarks` — all bookmarks across all books (includes `book_title`, `book_author`)
### `routers/editor.py`
- `GET /library/editor/{filename}` — EPUB chapter editor page
- `GET /api/edit/chapter/{index}/{filename}` — get chapter HTML
- `POST /api/edit/chapter/{index}/{filename}` — save chapter HTML
- `POST /api/edit/chapter/add/{filename}` — add new chapter
- `DELETE /api/edit/chapter/{index}/{filename}` — delete chapter
### `routers/grabber.py`
- `GET /grabber` — grabber page
- `GET /convert` — convert page
- `GET /credentials-manager` — credentials manager UI
- `GET /debug` — debug page
- `POST /debug/run` — run debug scrape
- `GET /credentials` — list stored credentials
- `POST /credentials` — save credential
- `DELETE /credentials/{site}` — delete credential
- `POST /preload` — preload book info from URL
- `POST /convert` — run scrape + convert to EPUB
- `GET /events/{job_id}` — SSE stream for job progress
### `routers/settings.py`
- `GET /settings` — settings page
- `GET /api/break-patterns` — list chapter-break patterns
- `POST /api/break-patterns` — add break pattern (type: `regex` or `css_class`)
- `PATCH /api/break-patterns/{id}` — update pattern (enable/disable or change value)
- `DELETE /api/break-patterns/{id}` — delete pattern
- `DELETE /api/reading-history` — wipe all reading sessions
### `routers/builder.py`
- `GET /builder` — Book Builder index (draft list + new draft form)
- `POST /builder` — create new draft; redirects to `/builder/{id}`
- `GET /builder/{draft_id}` — draft editor page
- `DELETE /api/builder/{draft_id}` — delete draft
- `GET /api/builder/{draft_id}` — draft JSON (id, title, author, publisher, source_url, chapters)
- `POST /api/builder/{draft_id}/chapter` — add chapter `{title, after_index}`; returns `{index, count}`
- `PUT /api/builder/{draft_id}/chapter/{idx}` — save chapter `{title?, content?}`
- `DELETE /api/builder/{draft_id}/chapter/{idx}` — delete chapter; returns `{index, count}`
- `POST /api/builder/{draft_id}/normalize/{idx}` — normalize chapter HTML (preview only, does not save); returns `{content}`
- `POST /api/builder/{draft_id}/publish` — normalize all chapters → `build_epub()` → write to `library/epub/``upsert_book()` → delete draft; returns `{filename}`; redirects browser to `/library/book/{filename}`
Publish flow: all chapters are run through `normalize_wysiwyg_html()`, then `build_epub()` produces an EPUB 2.0 ZIP. The file path is computed via `make_rel_path(media_type="epub", …)`. The book is inserted into the library with `needs_review=True`. The draft is deleted on success.
### `routers/following.py`
- `GET /following` — Following page (author URL management)
- `GET /api/following` — all distinct library authors with URL (if set), book count, and last-added date
- `POST /api/following/{author_name}` — set or clear URL for an author (empty `url` removes the record)
`GET /api/following` returns one entry per non-archived author:
```json
{ "name": "Author Name", "book_count": 5, "last_added": "2026-03-27T…", "url": "https://…" }
```
URL is stored in the `authors` table (`name` unique, `url`, `created_at`, `updated_at`).
### `routers/backup.py`
- `GET /backup` — backup page
- `GET /api/backup/credentials` — Dropbox settings (includes `app_key_configured` flag)
- `POST /api/backup/credentials` — save Dropbox settings
- `DELETE /api/backup/credentials` — remove all Dropbox credentials
- `POST /api/backup/oauth/prepare` — save app key + secret, return Dropbox auth URL
- `POST /api/backup/oauth/exchange` — exchange authorization code for refresh token
- `GET /api/backup/health` — Dropbox connectivity check (includes `schedule_enabled`, `schedule_interval_hours`)
- `GET /api/backup/status` — current backup status
- `GET /api/backup/history` — backup run history (last 20)
- `GET /api/backup/progress` — live progress of running backup `{running, done, total, phase}`
- `POST /api/backup/run` — trigger backup (background task)
---
## Backup & Security
- Dropbox token (refresh token or legacy access token) stored encrypted in `credentials` (`site='dropbox'`).
- Dropbox app key stored encrypted in `credentials` (`site='dropbox_app_key'`).
- Dropbox app secret stored encrypted in `credentials` (`site='dropbox_app_secret'`).
- Dropbox backup root stored encrypted in `credentials` (`site='dropbox_backup_root'`).
- Retention (`snapshots to keep`) stored encrypted in `credentials` (`site='dropbox_backup_retention'`).
- Backup schedule (`enabled` + `interval_hours`) stored encrypted in `credentials` (`site='dropbox_backup_schedule'`).
- Encryption uses `NOVELA_MASTER_KEY` (Fernet).
### Dropbox authentication
- Preferred: OAuth2 refresh token (does not expire). Set up via the two-step flow on `/backup`:
1. Enter App Key + App Secret → click **Generate Auth URL**
2. Approve in browser → paste the code → click **Save & Activate**
- `_dbx()` uses `oauth2_refresh_token` + `app_key` + `app_secret` for automatic token renewal.
- Fallback: legacy short-lived access token (backwards compatible; works without app key/secret).
### Implementation details
- Versioned backups with deduplication:
- file objects in Dropbox: `library_objects/{sha256_prefix}/{sha256}`
- snapshots in Dropbox: `library_snapshots/snapshot-YYYYMMDD-HHMMSS.json`
- Each run creates a new snapshot version and uploads only missing objects.
- Retention removes older snapshots above the configured limit.
- Orphan object pruning removes objects no longer referenced by retained snapshots.
- Local manifest cache (`config/backup_manifest.json`) speeds up change detection.
- Database backup is done via `pg_dump` to Dropbox `postgres/`.
- `POST /api/backup/run` always starts a background task and returns immediately.
- `GET /api/backup/progress` returns in-memory progress updated per file; phases: `starting``scanning``uploading``snapshot``pg_dump`.
- Scheduler runs in the background (`start_backup_scheduler`) and triggers on interval when enabled.
- Concurrency guard: only one backup can run at a time.
- After container restart/crash, stale `running` logs are auto-marked as interrupted/error.
---
## Environment
`stack/novela.env` should include at least:
- `POSTGRES_DB`
- `POSTGRES_USER`
- `POSTGRES_PASSWORD`
- `NOVELA_MASTER_KEY`
- `CONFIG_DIR`
Dropbox settings are managed via the web UI on `/backup`.
---
## UI Notes
- Library import accepts EPUB/PDF/CBR/CBZ.
- Home supports the same import formats.
- Home includes search.
- Home header/dropzone alignment matches Library (search top-right, dropzone below).
- `New` view supports `Grid` and `List` mode.
- Bulk selection + `Remove from New` works only in `List` mode.
- `List` mode has a column visibility filter: Publisher, Author, Series, Volume, Title, Has cover, Updated, Genres, Sub-genres, Tags, Status.
- `List` mode supports multi-select with `Shift+click` range selection on checkboxes.
- `Grid` mode shows no selection checkboxes or bulk actions.
- `All books` view supports `Grid` and `List` mode (same columns as `New`).
- View mode persisted in `localStorage` as `novela.all.viewMode`.
- Column visibility persisted in `localStorage` as `novela.all.visibleColumns`.
- `List` mode has a checkbox column, column visibility filter, and multi-select with `Shift+click` range selection.
- `List` mode has a `Delete selected` bulk action: confirms then calls `DELETE /library/file/{filename}` for each selected book.
- Star ratings (15) shown under the cover in all grid views:
- Display-only in grid cards (no click, prevents accidental taps while scrolling).
- Interactive in Book Detail (1.1rem, clickable; clicking the active star clears the rating).
- Amber: filled `#c8a03a`, unfilled `rgba(200, 160, 58, 0.25)`.
- Reader settings (hamburger menu):
- Content width slider (30100 vw), persisted as `reader-content-width-pct`.
- Text colour: 5 warm-tone presets `#e8e2d9``#938d86`, persisted as `reader-text-colour`.
- Hamburger and back-link separated with `margin-left: 1rem` on `.header-back`.
- Reader supports EPUB and PDF:
- EPUB: chapter-text rendering; progress = `{chapterIndex}:{scrollFrac}`; progress % = `(chapterIndex + scrollFrac) / total * 100`.
- PDF: page-image rendering via `/library/pdf/{filename}?page=N`; page count from `/api/pdf/info/{filename}`; progress = `{pageIndex}:0`; keyboard/button navigation identical.
- `reader.html` branches on `FORMAT` variable injected by the server.
- `Edit EPUB` button in Book Detail is only shown for `.epub` files.
- Backup page supports: manual run, dry-run, Dropbox root, retention count, schedule (on/off + hours), status + history.
- Bookmarks: saved per book via `POST /library/bookmarks/{filename}`; shown in Library sidebar section; navigated via `?bm_ch=N&bm_scroll=F` URL params on reader page.
- Convert page: after loading metadata, if a book with the same title+author already exists in the library, a warning banner is shown (with a link to the existing book); user can still proceed with conversion. Check is done server-side in `/preload` response (`already_exists`, `existing_books`).
- Duplicates view (`#duplicates`): groups non-archived books by `(title, author)` (case-insensitive); shows only groups with ≥ 2 copies; counter in sidebar shows total number of duplicate books. Detection is entirely client-side from the existing library data.
- Incomplete view (`#incomplete`): shows all non-archived books where `publication_status` is not `Complete` (Ongoing, Hiatus, or blank); sidebar counter included.
- Following page (`/following`): dedicated page in its own sidebar section between Library and Tools; shows all library authors with their external URL; two tabs — Following (authors with URL set) and All Authors; inline URL editing with keyboard support (Enter = save, Escape = cancel); clicking Visit opens the external URL in a new tab. Author URLs are stored in the `authors` table. Sidebar counter shows number of followed authors.
- Book Builder (`/builder`): create EPUB books from scratch; drafts stored in `builder_drafts` (JSONB chapters); contenteditable editor with toolbar (bold/italic/underline/blockquote/author-note/scene-break/normalize); autosave every 30 s + Ctrl+S; publish normalizes HTML via `normalize_wysiwyg_html()` and builds EPUB via `build_epub()`.
---
## Known Conventions
- Book deletion flow: `unlink` file → `prune_empty_dirs(parent)``DELETE FROM library` (cascade removes child rows).
- Empty dir pruning: `prune_empty_dirs(start)` walks up from `start` to `LIBRARY_ROOT`, removing each dir if empty; stops at first non-empty dir.
- Cover strategy:
- EPUB: `GET /library/cover/{filename}` checks `library_cover_cache` first; on miss, extracts from ZIP and warms the cache. Cover upload (`POST /library/cover/{filename}`) replaces the image inside the EPUB ZIP (OPF located via `META-INF/container.xml`, old cover found in manifest and removed) and updates the cache so subsequent requests return the new cover immediately.
- PDF: first page rendered as thumbnail, cached
- CBR/CBZ: first page extracted, cached
- Rating storage:
- EPUB: `<meta name="novela:rating" content="N"/>` in OPF
- CBZ: `<NovelaRating>N</NovelaRating>` in `ComicInfo.xml` inside the ZIP
- CBR/PDF: DB only
- `upsert_book` uses `CASE WHEN EXCLUDED.rating > 0 THEN EXCLUDED.rating ELSE library.rating END` to restore rating from file without overwriting existing DB value.
- Tag types in `book_tags`: `genre`, `subgenre`, `tag`, `subject`. No direct `genres`/`subgenres` fields on book objects; always use helpers `bookGenres()`, `bookSubgenres()`, `bookPlainTags()`.
---
## Performance Notes
- Library load is optimized for large datasets (1000+ books):
- `list_library_json()` uses `json_agg` in the main query to inline tags per book — eliminates a separate `SELECT * FROM book_tags` query and Python merge loop.
- `has_cached_cover` is provided directly via SQL join instead of full cache fetch.
- `reading_sessions` is pre-aggregated in a subquery.
- ETag on `/api/library`: cheap `COUNT + MAX(updated_at)` query before full load; `304 Not Modified` on cache hit.
- Front-end rendering uses `IntersectionObserver` to defer both cover image loading and placeholder canvas drawing until cards enter the viewport — prevents hundreds of simultaneous HTTP requests and canvas operations on initial render.
- `renderBooksGrid`, `renderDuplicatesView`, `renderSeriesDetail` all use a single DOM pass: cover `<img>` and `<canvas>` are set up via `card.querySelector` immediately after `innerHTML` is set, eliminating a second full iteration with `document.getElementById` calls.
- Additional migration indexes:
- `idx_library_sort_coalesce`
- `idx_library_needs_review`
- `idx_library_archived`
- `idx_reading_sessions_filename_readat`
- `idx_book_tags_filename_tag`
---
## Known Bugs Fixed
- `renderGenreView` and `renderSearchResults` in `library.js` referenced `b.genres` (non-existent). Fixed: use `bookGenres()`, `bookSubgenres()`, `bookPlainTags()`.
- `PillInput` in `book.js` did not handle comma as delimiter and did not flush on save. Fixed: comma keydown + `flush()` in `saveEdit()`.
- `PATCH /library/book` failed for PDFs: `_sync_epub_metadata` tried to open PDF as ZIP. Fixed: only called for `.epub`.
- `_make_rel_path` in `reader.py` lacked format prefix (`epub/`, `pdf/`, `comics/`). Fixed: aligned with `common.make_rel_path`.
- `common.make_rel_path` always generated `.cbr` extension for CBZ files (both map to `media_type="cbr"`). Fixed: accepts optional `ext` parameter; `library.py` import now passes actual suffix.
- `/download/{filename}` was referenced in `book.html` but no endpoint existed (404). Fixed: added `GET /download/{filename}` to `library.py`.
- PDF reader showed infinite loading: `reader.html` called EPUB-only `/library/chapters/`. Fixed: PDF path uses `/api/pdf/info/` + page-image rendering.
- Empty dir pruning only ran when file was moved. Fixed: `prune_empty_dirs(old_path.parent)` always runs after a successful metadata save.