Ivo Oskamp 39eef0a388 Add publisher to PDF and CBR/CBZ storage paths

All formats now use {publisher}/{author} consistently:
- pdf/{publisher}/{author}/{title}.pdf
- comics/{publisher}/{author}/{title}.cbr|cbz
Previously PDF and comics only had {author}, unlike EPUB.
Updated TECHNICAL.md path table accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-25 08:56:53 +01:00

13 KiB

Raw Blame History

Novela 2.0 - Technical Status (Develop)

Scope

This document describes the current technical status of the develop codebase. It is the primary technical reference for the current implementation.

Architecture

Stack: FastAPI, Jinja2 templates, plain JavaScript, PostgreSQL 16, Docker.
Startup lifecycle (main.py):
1. init_pool()
2. run_migrations()
3. start_backup_scheduler()
4. mount routers
Shutdown lifecycle:
1. stop_backup_scheduler()
2. close_pool()
Source-of-truth rule: files on disk are authoritative, the database is an index/cache.

File Storage Paths

All files are stored under library/ (relative to the app working directory, mapped via Docker volume). LIBRARY_DIR = Path("library"), LIBRARY_ROOT = LIBRARY_DIR.resolve().

Path structure per format

Format	Path pattern
EPUB (no series)	`library/epub/{publisher}/{author}/Stories/{title}.epub`
EPUB (series)	`library/epub/{publisher}/{author}/Series/{series}/{idx:03d} - {title}.epub`
PDF	`library/pdf/{publisher}/{author}/{title}.pdf`
CBR	`library/comics/{publisher}/{author}/{title}.cbr`
CBZ	`library/comics/{publisher}/{author}/{title}.cbz`

Segments are sanitised: special chars stripped, max lengths applied (publisher/author 80, title 140, series 80).
Series index is zero-padded to 3 digits (001, 002, …), clamped to 1–999.
Duplicate filenames get a (2), (3), … suffix.
After any file move, empty parent directories are pruned up to LIBRARY_ROOT.

Path logic

common.make_rel_path(media_type, publisher, author, title, series, series_index, ext) — used by import and grabber.
reader.py _make_rel_path(publisher, author, title, series, series_index, ext) — used by metadata PATCH; same logic, uses actual file extension.
Both functions produce identical paths for all formats.

Metadata save behaviour per format

Format	File written?	DB written?
EPUB	Yes — OPF metadata updated in-place	Yes
PDF	No	Yes
CBR	No	Yes
CBZ	No (tags/metadata); rating written to ComicInfo.xml	Yes

Router Status

`routers/library.py`

GET /library — library page
GET /api/library — book list JSON (fast-path by default)
POST /library/rescan — forced full disk rescan
POST /library/import — upload EPUB/PDF/CBR/CBZ
DELETE /library/file/{filename} — delete file + DB row + prune dirs
GET /download/{filename} — download file with Content-Disposition: attachment
GET /library/cover/{filename} — serve cover (EPUB from file; PDF/CBR from cache)
GET /library/cover-cached/{filename} — serve cover from DB cache only
POST /library/cover/{filename} — upload/replace cover (EPUB only)
POST /library/want-to-read/{filename} — toggle want-to-read flag
POST /library/archive/{filename} — toggle archived flag
POST /library/new/mark-reviewed — bulk set needs_review=false
POST /library/rating/{filename} — set/clear star rating {"rating": 0-5}
GET /home — home page
GET /api/home — home data JSON
GET /stats — statistics page
GET /api/stats — statistics data JSON
GET /library/list — compat alias

GET /api/library runs in fast-path mode by default (DB-only, no full disk rescan). For a forced sync: GET /api/library?rescan=true or POST /library/rescan. include_file_info=true is optional for file size/mtime enrichment.

/api/home returns:

continue_reading
shorts_unread
novels_unread
shorts_read
novels_read

/api/stats returns totals plus chart/history data for stats.html:

reads_by_month, reads_by_dow, reads_by_hour
genre_counts, publisher_counts, fav_genre, fav_publisher
top_books, history

Home sections exclude series books via:

COALESCE(series, '') = ''
filename NOT LIKE '%/Series/%'

Home read sections are ordered oldest-first:

shorts_read: ORDER BY MAX(read_at) ASC
novels_read: ORDER BY MAX(read_at) ASC

`routers/reader.py`

GET /library/epub/{filename} — serve EPUB inline (no attachment header)
GET /library/chapters/{filename} — EPUB spine as JSON
GET /library/chapter/{index}/{filename} — single EPUB chapter as HTML fragment
GET /library/chapter-img/{path}?filename=… — image extracted from EPUB ZIP
GET /library/pdf/{filename}?page=N&dpi=150 — render PDF page as PNG
GET /api/pdf/info/{filename} — {"page_count": N}
GET /library/cbr/{filename}/{page} — CBR/CBZ page as image
GET /library/progress/{filename} — read progress
POST /library/progress/{filename} — save progress {"cfi": "…", "progress": N}
DELETE /library/progress/{filename} — clear progress
POST /library/mark-read/{filename} — mark as read (with optional date)
GET /library/book/{filename} — book detail page
GET /api/genres — all tags from book_tags (optional ?type=genre|subgenre|tag)
PATCH /library/book/{filename} — update metadata + tags; moves file if path fields change; DB-only for non-EPUB
POST /library/rating/{filename} — set/clear 1–5 star rating; writes to EPUB OPF / CBZ ComicInfo.xml; DB-only for CBR/PDF
GET /library/read/{filename} — reader page (EPUB or PDF)

`routers/editor.py`

GET /library/editor/{filename} — EPUB chapter editor page
GET /api/edit/chapter/{index}/{filename} — get chapter HTML
POST /api/edit/chapter/{index}/{filename} — save chapter HTML
POST /api/edit/chapter/add/{filename} — add new chapter
DELETE /api/edit/chapter/{index}/{filename} — delete chapter

`routers/grabber.py`

GET /grabber — grabber page
GET /convert — convert page
GET /credentials-manager — credentials manager UI
GET /debug — debug page
POST /debug/run — run debug scrape
GET /credentials — list stored credentials
POST /credentials — save credential
DELETE /credentials/{site} — delete credential
POST /preload — preload book info from URL
POST /convert — run scrape + convert to EPUB
GET /events/{job_id} — SSE stream for job progress

`routers/settings.py`

GET /settings — settings page
GET /api/break-patterns — list chapter-break patterns
POST /api/break-patterns — add break pattern (type: regex or css_class)
PATCH /api/break-patterns/{id} — update pattern (enable/disable or change value)
DELETE /api/break-patterns/{id} — delete pattern
DELETE /api/reading-history — wipe all reading sessions

`routers/backup.py`

GET /backup — backup page
GET /POST /DELETE /api/backup/credentials — Dropbox settings
GET /api/backup/health — Dropbox connectivity check
GET /api/backup/status — current backup status
GET /api/backup/history — backup run history
POST /api/backup/run — trigger backup (background task)

Backup & Security

Dropbox token is stored encrypted-at-rest in credentials (site='dropbox').
Dropbox backup root is stored encrypted in credentials (site='dropbox_backup_root').
Retention (snapshots to keep) is stored encrypted in credentials (site='dropbox_backup_retention').
Backup schedule (enabled + interval_hours) is stored encrypted in credentials (site='dropbox_backup_schedule').
Encryption uses NOVELA_MASTER_KEY (Fernet).

Implementation details:

Versioned backups with deduplication:
- file objects in Dropbox: library_objects/{sha256_prefix}/{sha256}
- snapshots in Dropbox: library_snapshots/snapshot-YYYYMMDD-HHMMSS.json
Each run creates a new snapshot version and uploads only missing objects.
Retention removes older snapshots above the configured limit.
Orphan object pruning removes objects no longer referenced by retained snapshots.
Local manifest cache (config/backup_manifest.json) speeds up change detection.
Database backup is done via pg_dump to Dropbox postgres/.
POST /api/backup/run always starts a background task and returns immediately.
Scheduler runs in the background (start_backup_scheduler) and triggers on interval when enabled.
Concurrency guard: only one backup can run at a time.
After container restart/crash, stale running logs are auto-marked as interrupted/error.

Environment

stack/novela.env should include at least:

POSTGRES_DB
POSTGRES_USER
POSTGRES_PASSWORD
NOVELA_MASTER_KEY
CONFIG_DIR

Dropbox settings are managed via the web UI on /backup.

UI Notes

Library import accepts EPUB/PDF/CBR/CBZ.
Home supports the same import formats.
Home includes search.
Home header/dropzone alignment matches Library (search top-right, dropzone below).
New view supports Grid and List mode.
- Bulk selection + Remove from New works only in List mode.
- List mode has a column visibility filter: Publisher, Author, Series, Volume, Title, Has cover, Updated, Genres, Sub-genres, Tags, Status.
- List mode supports multi-select with Shift+click range selection on checkboxes.
- Grid mode shows no selection checkboxes or bulk actions.
All books view supports Grid and List mode (same columns as New, no selection/bulk actions).
- View mode persisted in localStorage as novela.all.viewMode.
- Column visibility persisted in localStorage as novela.all.visibleColumns.
Star ratings (1–5) shown under the cover in all grid views:
- Display-only in grid cards (no click, prevents accidental taps while scrolling).
- Interactive in Book Detail (1.1rem, clickable; clicking the active star clears the rating).
- Amber: filled #c8a03a, unfilled rgba(200, 160, 58, 0.25).
Reader settings (hamburger menu):
- Content width slider (30–100 vw), persisted as reader-content-width-pct.
- Text colour: 5 warm-tone presets #e8e2d9 → #938d86, persisted as reader-text-colour.
- Hamburger and back-link separated with margin-left: 1rem on .header-back.
Reader supports EPUB and PDF:
- EPUB: chapter-text rendering; progress = {chapterIndex}:{scrollFrac}.
- PDF: page-image rendering via /library/pdf/{filename}?page=N; page count from /api/pdf/info/{filename}; progress = {pageIndex}:0; keyboard/button navigation identical.
- reader.html branches on FORMAT variable injected by the server.
Edit EPUB button in Book Detail is only shown for .epub files.
Backup page supports: manual run, dry-run, Dropbox root, retention count, schedule (on/off + hours), status + history.

Known Conventions

Book deletion flow: unlink file → prune_empty_dirs(parent) → DELETE FROM library (cascade removes child rows).
Empty dir pruning: prune_empty_dirs(start) walks up from start to LIBRARY_ROOT, removing each dir if empty; stops at first non-empty dir.
Cover strategy:
- EPUB: extracted from ZIP + cached in library_cover_cache
- PDF: first page rendered as thumbnail, cached
- CBR/CBZ: first page extracted, cached
Rating storage:
- EPUB: <meta name="novela:rating" content="N"/> in OPF
- CBZ: <NovelaRating>N</NovelaRating> in ComicInfo.xml inside the ZIP
- CBR/PDF: DB only
- upsert_book uses CASE WHEN EXCLUDED.rating > 0 THEN EXCLUDED.rating ELSE library.rating END to restore rating from file without overwriting existing DB value.
Tag types in book_tags: genre, subgenre, tag, subject. No direct genres/subgenres fields on book objects; always use helpers bookGenres(), bookSubgenres(), bookPlainTags().

Performance Notes

Library load is optimized for large datasets:
- list_library_json() uses pre-aggregation for reading_sessions.
- has_cached_cover is provided directly via SQL join instead of full cache fetch.
Additional migration indexes:
- idx_library_sort_coalesce
- idx_library_needs_review
- idx_library_archived
- idx_reading_sessions_filename_readat
- idx_book_tags_filename_tag

Known Bugs Fixed

renderGenreView and renderSearchResults in library.js referenced b.genres (non-existent). Fixed: use bookGenres(), bookSubgenres(), bookPlainTags().
PillInput in book.js did not handle comma as delimiter and did not flush on save. Fixed: comma keydown + flush() in saveEdit().
PATCH /library/book failed for PDFs: _sync_epub_metadata tried to open PDF as ZIP. Fixed: only called for .epub.
_make_rel_path in reader.py lacked format prefix (epub/, pdf/, comics/). Fixed: aligned with common.make_rel_path.
common.make_rel_path always generated .cbr extension for CBZ files (both map to media_type="cbr"). Fixed: accepts optional ext parameter; library.py import now passes actual suffix.
/download/{filename} was referenced in book.html but no endpoint existed (404). Fixed: added GET /download/{filename} to library.py.
PDF reader showed infinite loading: reader.html called EPUB-only /library/chapters/. Fixed: PDF path uses /api/pdf/info/ + page-image rendering.
Empty dir pruning only ran when file was moved. Fixed: prune_empty_dirs(old_path.parent) always runs after a successful metadata save.

13 KiB Raw Blame History Unescape Escape