novela/docs/TECHNICAL.md
Ivo Oskamp 92cd301658 Add PDF reader/editor support, fix metadata save and dir cleanup
- PDF reader: page-image rendering via /library/pdf/{filename}?page=N;
  new /api/pdf/info/{filename} endpoint returns page count; reader.html
  branches on FORMAT (epub/pdf) injected by server
- PDF metadata edit: PATCH /library/book now updates DB for all formats;
  _sync_epub_metadata only called for .epub; non-EPUB formats skip file write
- Fix file path on metadata save: _make_rel_path now includes format prefix
  (epub/, pdf/, comics/) matching common.make_rel_path used during import;
  previously files were moved outside their format directory
- Fix empty dir cleanup: prune_empty_dirs always runs after successful
  metadata save, not only when file was moved
- Hide Edit EPUB button for non-EPUB files in book detail
- Docs: TECHNICAL.md and changelog-develop.md updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 08:47:01 +01:00

198 lines
8.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Novela 2.0 - Technical Status (Develop)
## Scope
This document describes the current technical status of the `develop` codebase.
It is the primary technical reference for the current implementation.
## Architecture
- Stack: FastAPI, Jinja2 templates, plain JavaScript, PostgreSQL 16, Docker.
- Startup lifecycle (`main.py`):
1. `init_pool()`
2. `run_migrations()`
3. `start_backup_scheduler()`
4. mount routers
- Shutdown lifecycle:
1. `stop_backup_scheduler()`
2. `close_pool()`
- Source-of-truth rule: files on disk are authoritative, the database is an index/cache.
## Router Status
### `routers/library.py`
- `GET /library`
- `GET /api/library`
- `POST /library/rescan`
- `POST /library/import` (EPUB/PDF/CBR/CBZ)
- `DELETE /library/file/{filename}`
- `GET /library/cover/{filename}`
- `GET /library/cover-cached/{filename}`
- `POST /library/cover/{filename}` (EPUB)
- `POST /library/want-to-read/{filename}`
- `POST /library/archive/{filename}`
- `POST /library/new/mark-reviewed` (bulk `needs_review=false`)
- `POST /library/rating/{filename}` (set/clear star rating, body: `{"rating": 0-5}`)
- `GET /home`
- `GET /api/home`
- `GET /stats`
- `GET /api/stats`
- `GET /library/list` (compat)
`GET /api/library` runs in fast-path mode by default (DB-only, no full disk rescan).
For a forced sync: `GET /api/library?rescan=true` or `POST /library/rescan`.
`include_file_info=true` is optional for file size/mtime enrichment.
`/api/home` returns:
- `continue_reading`
- `shorts_unread`
- `novels_unread`
- `shorts_read`
- `novels_read`
`/api/stats` returns totals plus chart/history data for `stats.html`:
- `reads_by_month`, `reads_by_dow`, `reads_by_hour`
- `genre_counts`, `publisher_counts`, `fav_genre`, `fav_publisher`
- `top_books`, `history`
Home sections exclude series books via:
- `COALESCE(series, '') = ''`
- `filename NOT LIKE '%/Series/%'`
Home read sections are ordered oldest-first:
- `shorts_read`: `ORDER BY MAX(read_at) ASC`
- `novels_read`: `ORDER BY MAX(read_at) ASC`
### `routers/reader.py`
- EPUB serving/chapters/images
- Reader page + book detail
- Metadata patch (`PATCH /library/book/{filename}`): updates DB for all formats; writes to file only for EPUB
- Progress read/write/delete
- Mark-as-read
- Star rating (`POST /library/rating/{filename}`): validates 05, writes to file (EPUB OPF / CBZ ComicInfo.xml) and DB; DB-only for CBR/PDF
- PDF render endpoint (`GET /library/pdf/{filename}?page=N&dpi=150`) — returns page as PNG
- PDF info endpoint (`GET /api/pdf/info/{filename}`) — returns `{"page_count": N}`
- CBR/CBZ page endpoint
- Genres endpoint
### `routers/editor.py`
- Editor page
- Chapter get/save
- Chapter add
- Chapter delete
### `routers/grabber.py`
- Grabber page + convert/debug flows
- SSE events
- Credential management for scraper sites
- Credentials manager UI (`/credentials-manager`)
### `routers/backup.py`
- `GET /backup`
- `GET/POST/DELETE /api/backup/credentials`
- `GET /api/backup/health`
- `GET /api/backup/status`
- `GET /api/backup/history`
- `POST /api/backup/run`
## Backup & Security
- Dropbox token is stored encrypted-at-rest in `credentials` (`site='dropbox'`).
- Dropbox backup root is stored encrypted in `credentials` (`site='dropbox_backup_root'`).
- Retention (`snapshots to keep`) is stored encrypted in `credentials` (`site='dropbox_backup_retention'`).
- Backup schedule (`enabled` + `interval_hours`) is stored encrypted in `credentials` (`site='dropbox_backup_schedule'`).
- Encryption uses `NOVELA_MASTER_KEY` (Fernet).
Implementation details:
- Versioned backups with deduplication:
- file objects in Dropbox: `library_objects/{sha256_prefix}/{sha256}`
- snapshots in Dropbox: `library_snapshots/snapshot-YYYYMMDD-HHMMSS.json`
- Each run creates a new snapshot version and uploads only missing objects.
- Retention removes older snapshots above the configured limit.
- Orphan object pruning removes objects no longer referenced by retained snapshots.
- Local manifest cache (`config/backup_manifest.json`) speeds up change detection.
- Database backup is done via `pg_dump` to Dropbox `postgres/`.
- `POST /api/backup/run` always starts a background task and returns immediately.
- Scheduler runs in the background (`start_backup_scheduler`) and triggers on interval when enabled.
- Concurrency guard: only one backup can run at a time.
- After container restart/crash, stale `running` logs are auto-marked as interrupted/error.
## Environment
`stack/novela.env` should include at least:
- `POSTGRES_DB`
- `POSTGRES_USER`
- `POSTGRES_PASSWORD`
- `NOVELA_MASTER_KEY`
- `CONFIG_DIR`
Dropbox settings are managed via the web UI on `/backup`.
## UI Notes
- Library import accepts EPUB/PDF/CBR/CBZ.
- Home supports the same import formats.
- Home includes search.
- Home header/dropzone alignment matches Library (search top-right, dropzone below).
- `New` view supports `Grid` and `List` mode.
- Bulk selection + `Remove from New` works only in `List` mode.
- `List` mode has a column visibility filter with columns:
- Publisher
- Author
- Series
- Volume
- Title
- Has cover
- Updated
- Genres
- Sub-genres
- Tags
- Status
- `List` mode supports multi-select with `Shift+click` range selection on checkboxes.
- `Grid` mode shows no selection checkboxes or bulk actions.
- `All books` view supports `Grid` and `List` mode (same columns as `New`, no selection/bulk actions).
- View mode persisted in `localStorage` as `novela.all.viewMode`.
- Column visibility persisted in `localStorage` as `novela.all.visibleColumns`.
- Star ratings (15) are shown under the cover in all grid views (Library, Home):
- Display-only in grid cards (no click handler, prevents accidental taps).
- Interactive in Book Detail (1.1rem, clickable; clicking the active star clears the rating).
- Amber color: filled `#c8a03a`, unfilled `rgba(200, 160, 58, 0.25)`.
- Reader has a text colour setting in the hamburger menu:
- 5 presets from `#e8e2d9` (bright) to `#938d86` (dim), persisted in `localStorage` as `reader-text-colour`.
- Hamburger and back-link are visually separated with `margin-left: 1rem` on `.header-back`.
- Backup page supports:
- manual run and dry-run
- Dropbox root settings
- snapshot retention count
- scheduled backup (on/off + interval in hours)
- status + history overview
- Reader supports EPUB and PDF:
- EPUB: chapter-text rendering (existing flow)
- PDF: page-image rendering via `/library/pdf/{filename}?page=N`; page count fetched from `/api/pdf/info/{filename}`; progress tracked per page; keyboard/button navigation identical to EPUB
- `reader.html` branches on `FORMAT` variable injected by the server
- `Edit EPUB` button in Book Detail is only shown for `.epub` files.
## Known Bugs Fixed
- `renderGenreView` and `renderSearchResults` in `library.js` referenced `b.genres` (non-existent field on the book object). All tag data lives in `b.tags` as `{tag, tag_type}` objects; the correct helpers are `bookGenres()`, `bookSubgenres()`, `bookPlainTags()`.
- `PillInput` in `book.js` did not handle comma as a delimiter and did not flush pending input on save. Fixed with comma keydown handler and `flush()` called in `saveEdit()`.
- `PATCH /library/book/{filename}` failed for PDFs: `_sync_epub_metadata` tried to open the PDF as a ZIP, throwing an exception that aborted the entire save (including the DB update). Fixed by only calling `_sync_epub_metadata` when `ext == ".epub"`.
- `_make_rel_path` in `reader.py` lacked the format prefix (`epub/`, `pdf/`, `comics/`) used by `common.make_rel_path`, causing files to be moved outside their format directory on metadata save. Fixed by aligning the path logic: EPUB → `epub/{publisher}/{author}/…`, PDF → `pdf/{author}/{title}.pdf`, CBR/CBZ → `comics/{author}/{title}{ext}`.
- PDF reader showed infinite loading: `reader.html` always called `/library/chapters/{filename}` (EPUB-only) and tried to render chapter text. PDF reader now fetches page count and renders page images.
## Known Conventions
- Book deletion flow: delete file, prune empty directories, then `DELETE FROM library` (cascade removes child rows).
- Cover strategy:
- EPUB: cover from file + cache
- PDF/CBR: thumbnail via cover cache
- Rating storage:
- EPUB: `<meta name="novela:rating" content="N"/>` in OPF
- CBZ: `<NovelaRating>N</NovelaRating>` in `ComicInfo.xml` inside the ZIP
- CBR/PDF: DB only
- `upsert_book` uses `CASE WHEN EXCLUDED.rating > 0 THEN EXCLUDED.rating ELSE library.rating END` to restore rating from file without overwriting existing DB value
## Performance Notes
- Library load is optimized for large datasets:
- `list_library_json()` uses pre-aggregation for `reading_sessions`.
- `has_cached_cover` is provided directly via SQL join instead of full cache fetch.
- Additional migration indexes:
- `idx_library_sort_coalesce`
- `idx_library_needs_review`
- `idx_library_archived`
- `idx_reading_sessions_filename_readat`
- `idx_book_tags_filename_tag`