novela/docs/TECHNICAL.md

# Novela 2.0 - Technical Status (Develop)

## Scope
This document describes the current technical status of the `develop` codebase.
It is the primary technical reference for the current implementation.

## Architecture
- Stack: FastAPI, Jinja2 templates, plain JavaScript, PostgreSQL 16, Docker.
- Startup lifecycle (`main.py`):
  1. `init_pool()`
  2. `run_migrations()`
  3. `start_backup_scheduler()`
  4. mount routers
- Shutdown lifecycle:
  1. `stop_backup_scheduler()`
  2. `close_pool()`
- Source-of-truth rule: files on disk are authoritative, the database is an index/cache.

## File Storage Paths

All files are stored under `library/` (relative to the app working directory, mapped via Docker volume).
`LIBRARY_DIR = Path("library")`, `LIBRARY_ROOT = LIBRARY_DIR.resolve()`.

### Path structure per format

| Format | Path pattern |
|--------|-------------|
| EPUB (no series) | `library/epub/{publisher}/{author}/Stories/{title}.epub` |
| EPUB (series) | `library/epub/{publisher}/{author}/Series/{series}/{idx:03d} - {title}.epub` |
| PDF | `library/pdf/{publisher}/{author}/{title}.pdf` |
| CBR | `library/comics/{publisher}/{author}/{title}.cbr` |
| CBZ | `library/comics/{publisher}/{author}/{title}.cbz` |

- Segments are sanitised: special chars stripped, max lengths applied (publisher/author 80, title 140, series 80).
- Series index is zero-padded to 3 digits (`001`, `002`, …), clamped to 1–999.
- Duplicate filenames get a `(2)`, `(3)`, … suffix.
- After any file move, empty parent directories are pruned up to `LIBRARY_ROOT`.

### Path logic

- `common.make_rel_path(media_type, publisher, author, title, series, series_index, ext)` — used by import and grabber.
- `reader.py _make_rel_path(publisher, author, title, series, series_index, ext)` — used by metadata PATCH; same logic, uses actual file extension.
- Both functions produce identical paths for all formats.

### Metadata save behaviour per format

| Format | File written? | DB written? |
|--------|--------------|-------------|
| EPUB | Yes — OPF metadata updated in-place | Yes |
| PDF | No | Yes |
| CBR | No | Yes |
| CBZ | No (tags/metadata); rating written to ComicInfo.xml | Yes |

---

## Router Status

### `routers/library.py`
- `GET /library` — library page
- `GET /api/library` — book list JSON (fast-path by default)
- `POST /library/rescan` — forced full disk rescan
- `POST /library/import` — upload EPUB/PDF/CBR/CBZ
- `DELETE /library/file/{filename}` — delete file + DB row + prune dirs
- `GET /download/{filename}` — download file with `Content-Disposition: attachment`
- `GET /library/cover/{filename}` — serve cover (EPUB from file; PDF/CBR from cache)
- `GET /library/cover-cached/{filename}` — serve cover from DB cache only
- `POST /library/cover/{filename}` — upload/replace cover (EPUB only)
- `POST /library/want-to-read/{filename}` — toggle want-to-read flag
- `POST /library/archive/{filename}` — toggle archived flag
- `POST /library/new/mark-reviewed` — bulk set `needs_review=false`
- `POST /library/rating/{filename}` — set/clear star rating `{"rating": 0-5}`
- `GET /home` — home page
- `GET /api/home` — home data JSON
- `GET /stats` — statistics page
- `GET /api/stats` — statistics data JSON
- `GET /library/list` — compat alias

`GET /api/library` runs in fast-path mode by default (DB-only, no full disk rescan).
For a forced sync: `GET /api/library?rescan=true` or `POST /library/rescan`.
`include_file_info=true` is optional for file size/mtime enrichment.

`/api/home` returns:
- `continue_reading`
- `shorts_unread`
- `novels_unread`
- `shorts_read`
- `novels_read`

`/api/stats` returns totals plus chart/history data for `stats.html`:
- `reads_by_month`, `reads_by_dow`, `reads_by_hour`
- `genre_counts`, `publisher_counts`, `fav_genre`, `fav_publisher`
- `top_books`, `history`

Home sections exclude series books via:
- `COALESCE(series, '') = ''`
- `filename NOT LIKE '%/Series/%'`

Home read sections are ordered oldest-first:
- `shorts_read`: `ORDER BY MAX(read_at) ASC`
- `novels_read`: `ORDER BY MAX(read_at) ASC`

### `routers/reader.py`
- `GET /library/epub/{filename}` — serve EPUB inline (no attachment header)
- `GET /library/chapters/{filename}` — EPUB spine as JSON
- `GET /library/chapter/{index}/{filename}` — single EPUB chapter as HTML fragment
- `GET /library/chapter-img/{path}?filename=…` — image extracted from EPUB ZIP
- `GET /library/pdf/{filename}?page=N&dpi=150` — render PDF page as PNG
- `GET /api/pdf/info/{filename}` — `{"page_count": N}`
- `GET /library/cbr/{filename}/{page}` — CBR/CBZ page as image
- `GET /library/progress/{filename}` — read progress
- `POST /library/progress/{filename}` — save progress `{"cfi": "…", "progress": N}`
- `DELETE /library/progress/{filename}` — clear progress
- `POST /library/mark-read/{filename}` — mark as read (with optional date)
- `GET /library/book/{filename}` — book detail page
- `GET /api/genres` — all tags from `book_tags` (optional `?type=genre|subgenre|tag`)
- `PATCH /library/book/{filename}` — update metadata + tags; moves file if path fields change; DB-only for non-EPUB
- `POST /library/rating/{filename}` — set/clear 1–5 star rating; writes to EPUB OPF / CBZ ComicInfo.xml; DB-only for CBR/PDF
- `GET /library/read/{filename}` — reader page (EPUB or PDF)

### `routers/editor.py`
- `GET /library/editor/{filename}` — EPUB chapter editor page
- `GET /api/edit/chapter/{index}/{filename}` — get chapter HTML
- `POST /api/edit/chapter/{index}/{filename}` — save chapter HTML
- `POST /api/edit/chapter/add/{filename}` — add new chapter
- `DELETE /api/edit/chapter/{index}/{filename}` — delete chapter

### `routers/grabber.py`
- `GET /grabber` — grabber page
- `GET /convert` — convert page
- `GET /credentials-manager` — credentials manager UI
- `GET /debug` — debug page
- `POST /debug/run` — run debug scrape
- `GET /credentials` — list stored credentials
- `POST /credentials` — save credential
- `DELETE /credentials/{site}` — delete credential
- `POST /preload` — preload book info from URL
- `POST /convert` — run scrape + convert to EPUB
- `GET /events/{job_id}` — SSE stream for job progress

### `routers/settings.py`
- `GET /settings` — settings page
- `GET /api/break-patterns` — list chapter-break patterns
- `POST /api/break-patterns` — add break pattern (type: `regex` or `css_class`)
- `PATCH /api/break-patterns/{id}` — update pattern (enable/disable or change value)
- `DELETE /api/break-patterns/{id}` — delete pattern
- `DELETE /api/reading-history` — wipe all reading sessions

### `routers/backup.py`
- `GET /backup` — backup page
- `GET /POST /DELETE /api/backup/credentials` — Dropbox settings
- `GET /api/backup/health` — Dropbox connectivity check
- `GET /api/backup/status` — current backup status
- `GET /api/backup/history` — backup run history
- `POST /api/backup/run` — trigger backup (background task)

---

## Backup & Security
- Dropbox token is stored encrypted-at-rest in `credentials` (`site='dropbox'`).
- Dropbox backup root is stored encrypted in `credentials` (`site='dropbox_backup_root'`).
- Retention (`snapshots to keep`) is stored encrypted in `credentials` (`site='dropbox_backup_retention'`).
- Backup schedule (`enabled` + `interval_hours`) is stored encrypted in `credentials` (`site='dropbox_backup_schedule'`).
- Encryption uses `NOVELA_MASTER_KEY` (Fernet).

Implementation details:
- Versioned backups with deduplication:
  - file objects in Dropbox: `library_objects/{sha256_prefix}/{sha256}`
  - snapshots in Dropbox: `library_snapshots/snapshot-YYYYMMDD-HHMMSS.json`
- Each run creates a new snapshot version and uploads only missing objects.
- Retention removes older snapshots above the configured limit.
- Orphan object pruning removes objects no longer referenced by retained snapshots.
- Local manifest cache (`config/backup_manifest.json`) speeds up change detection.
- Database backup is done via `pg_dump` to Dropbox `postgres/`.
- `POST /api/backup/run` always starts a background task and returns immediately.
- Scheduler runs in the background (`start_backup_scheduler`) and triggers on interval when enabled.
- Concurrency guard: only one backup can run at a time.
- After container restart/crash, stale `running` logs are auto-marked as interrupted/error.

---

## Environment
`stack/novela.env` should include at least:
- `POSTGRES_DB`
- `POSTGRES_USER`
- `POSTGRES_PASSWORD`
- `NOVELA_MASTER_KEY`
- `CONFIG_DIR`

Dropbox settings are managed via the web UI on `/backup`.

---

## UI Notes
- Library import accepts EPUB/PDF/CBR/CBZ.
- Home supports the same import formats.
- Home includes search.
- Home header/dropzone alignment matches Library (search top-right, dropzone below).
- `New` view supports `Grid` and `List` mode.
  - Bulk selection + `Remove from New` works only in `List` mode.
  - `List` mode has a column visibility filter: Publisher, Author, Series, Volume, Title, Has cover, Updated, Genres, Sub-genres, Tags, Status.
  - `List` mode supports multi-select with `Shift+click` range selection on checkboxes.
  - `Grid` mode shows no selection checkboxes or bulk actions.
- `All books` view supports `Grid` and `List` mode (same columns as `New`, no selection/bulk actions).
  - View mode persisted in `localStorage` as `novela.all.viewMode`.
  - Column visibility persisted in `localStorage` as `novela.all.visibleColumns`.
- Star ratings (1–5) shown under the cover in all grid views:
  - Display-only in grid cards (no click, prevents accidental taps while scrolling).
  - Interactive in Book Detail (1.1rem, clickable; clicking the active star clears the rating).
  - Amber: filled `#c8a03a`, unfilled `rgba(200, 160, 58, 0.25)`.
- Reader settings (hamburger menu):
  - Content width slider (30–100 vw), persisted as `reader-content-width-pct`.
  - Text colour: 5 warm-tone presets `#e8e2d9` → `#938d86`, persisted as `reader-text-colour`.
  - Hamburger and back-link separated with `margin-left: 1rem` on `.header-back`.
- Reader supports EPUB and PDF:
  - EPUB: chapter-text rendering; progress = `{chapterIndex}:{scrollFrac}`.
  - PDF: page-image rendering via `/library/pdf/{filename}?page=N`; page count from `/api/pdf/info/{filename}`; progress = `{pageIndex}:0`; keyboard/button navigation identical.
  - `reader.html` branches on `FORMAT` variable injected by the server.
- `Edit EPUB` button in Book Detail is only shown for `.epub` files.
- Backup page supports: manual run, dry-run, Dropbox root, retention count, schedule (on/off + hours), status + history.

---

## Known Conventions
- Book deletion flow: `unlink` file → `prune_empty_dirs(parent)` → `DELETE FROM library` (cascade removes child rows).
- Empty dir pruning: `prune_empty_dirs(start)` walks up from `start` to `LIBRARY_ROOT`, removing each dir if empty; stops at first non-empty dir.
- Cover strategy:
  - EPUB: extracted from ZIP + cached in `library_cover_cache`
  - PDF: first page rendered as thumbnail, cached
  - CBR/CBZ: first page extracted, cached
- Rating storage:
  - EPUB: `<meta name="novela:rating" content="N"/>` in OPF
  - CBZ: `<NovelaRating>N</NovelaRating>` in `ComicInfo.xml` inside the ZIP
  - CBR/PDF: DB only
  - `upsert_book` uses `CASE WHEN EXCLUDED.rating > 0 THEN EXCLUDED.rating ELSE library.rating END` to restore rating from file without overwriting existing DB value.
- Tag types in `book_tags`: `genre`, `subgenre`, `tag`, `subject`. No direct `genres`/`subgenres` fields on book objects; always use helpers `bookGenres()`, `bookSubgenres()`, `bookPlainTags()`.

---

## Performance Notes
- Library load is optimized for large datasets:
  - `list_library_json()` uses pre-aggregation for `reading_sessions`.
  - `has_cached_cover` is provided directly via SQL join instead of full cache fetch.
- Additional migration indexes:
  - `idx_library_sort_coalesce`
  - `idx_library_needs_review`
  - `idx_library_archived`
  - `idx_reading_sessions_filename_readat`
  - `idx_book_tags_filename_tag`

---

## Known Bugs Fixed
- `renderGenreView` and `renderSearchResults` in `library.js` referenced `b.genres` (non-existent). Fixed: use `bookGenres()`, `bookSubgenres()`, `bookPlainTags()`.
- `PillInput` in `book.js` did not handle comma as delimiter and did not flush on save. Fixed: comma keydown + `flush()` in `saveEdit()`.
- `PATCH /library/book` failed for PDFs: `_sync_epub_metadata` tried to open PDF as ZIP. Fixed: only called for `.epub`.
- `_make_rel_path` in `reader.py` lacked format prefix (`epub/`, `pdf/`, `comics/`). Fixed: aligned with `common.make_rel_path`.
- `common.make_rel_path` always generated `.cbr` extension for CBZ files (both map to `media_type="cbr"`). Fixed: accepts optional `ext` parameter; `library.py` import now passes actual suffix.
- `/download/{filename}` was referenced in `book.html` but no endpoint existed (404). Fixed: added `GET /download/{filename}` to `library.py`.
- PDF reader showed infinite loading: `reader.html` called EPUB-only `/library/chapters/`. Fixed: PDF path uses `/api/pdf/info/` + page-image rendering.
- Empty dir pruning only ran when file was moved. Fixed: `prune_empty_dirs(old_path.parent)` always runs after a successful metadata save.