clearview/docs/changelog-develop.md

193 lines
22 KiB
Markdown

# Changelog - Develop
This file documents changes on the develop branch of this project.
## 2026-05-26 — Build/version number in the UI (Dropkeep-style)
### Added
- **Version metadata module `clearview_app/version.py`** — single source of truth mirroring Dropkeep: `VERSION = "v0.1.0"` (release) + `BUILD = 0` (explicit dev/test build segment, source state, not git-derived). `display_version()` returns `vX.Y.Z.N` when `BUILD > 0`, else `vX.Y.Z`; `cache_version()` strips the leading `v`.
- **`GET /api/version` endpoint** — returns `{"version": display_version()}`. The FastAPI app `version=` is also sourced from `version.py` (was hardcoded `"0.1.0"`).
- **Version shown in the UI** — the sidebar footer version (previously a hardcoded `v0.1.0` in `index.html`) is now populated at load time from `/api/version` via a new `loadVersion()` in `app.js` (span `id="appVersion"`). Operators see exactly which image build is running, e.g. `v0.1.0.3`.
- **Build wrapper `build.sh` + `scripts/`** — `./build.sh t` runs `scripts/bump-dev-build.py` (increments `BUILD`) then `./build-and-push.sh t`; `./build.sh r` runs `scripts/check-release-version.py` (asserts `BUILD == 0` and that `version.py` matches the top `docs/changelog.md` release heading) then `./build-and-push.sh r`. `scripts/set-release-version.py vX.Y.Z` sets a new release version and resets `BUILD = 0`. Build numbers are committed in source so the image carries the exact build with no Docker build args.
## 2026-05-26 — Root report: expand Entra/M365 groups & readable direct users
### Added
- **Entra/AAD & M365 group expansion at site root** — the "Resolve groups" action now also expands Azure AD security groups and Microsoft 365 groups that are assigned **directly** at the site root, not just classic SharePoint site groups. Previously these claim-encoded principals (`c:0t.c|tenant|<guid>`, `c:0o.c|federateddirectoryclaimprovider|<guid>`) were skipped by `is_sharepoint_group_principal`, so the root report showed only the group name and never the people inside — making the inventory incomplete. New helpers in `scanners/sharepoint.py`: `_extract_aad_group_object_id` (parses the Entra object id out of the claim, incl. the `_o` owners suffix), `is_aad_group_principal`, `resolve_aad_group_members`, and `_expand_aad_group_by_id` (extracted from `_expand_aad_group_via_graph` so both mail-based and id-based lookups share the `/groups/{id}/members` + `/owners` Graph path, depth-limited to 3 with a per-resolve `seen` set). `POST /api/scan-jobs/{id}/resolve-groups` now routes AAD/M365 group principals to the Graph resolver and SharePoint groups to the existing `getbyname` resolver. Requires `GroupMember.Read.All` (or `Group.Read.All`) on Microsoft Graph; without it the group stays visible by name and counts as "skipped" — no crash.
### Changed
- **Readable principals for directly-assigned users** — individual users granted rights directly on the site root now render as their UPN/email (e.g. `jan@contoso.com`) instead of the raw claim string `i:0#.f|membership|jan@contoso.com`. New helpers `_extract_user_upn` and `_display_principal` in `scanners/sharepoint.py`, applied in `_get_role_assignments` (so both the root scan and the deviation scan benefit, consistently on both sides of the root-vs-child set comparison). Only users with an `@`-shaped UPN are rewritten; groups, on-prem (`i:0#.w|domain\\user`) and built-in/system accounts keep their original LoginName so claim object ids stay resolvable and the site-root noise filter (`SHAREPOINT\\system`, `NT AUTHORITY\\*`, etc.) keeps matching.
## [2026-04-28]
### Changed
- **Excel export sheet name + columns adapt to scan type** — second sheet is now named `Mailbox Permissions` for mailbox jobs, `Group Memberships` for Entra-group jobs, `Root Permissions` for SharePoint-root jobs, and `Deviations` for the original SharePoint deviation scan. Column sets are tailored per type so headers like "Object URL" / "Link Risk" / "Delta" no longer appear on exports where they don't apply. Targets sheet first column label switches between Site URL / Mailbox / Group based on the job.
### Added
- **Entra Group Scan** — new scan type `entra_groups` dedicated to enumerating Microsoft 365 / Azure AD group memberships. New `scanners/entra.py` resolves a target (Object ID, mail, or display name) via Microsoft Graph and stores one deviation per user with role `Member` or `Owner` (with `(via group > nested-group)` chain when expanded recursively). Group classification (Microsoft 365 / Security / Mail-enabled Security / Distribution) is stored in `permission_type`. New helper `entra.list_all_groups` for the "All groups in tenant" option. New CSV parser `parse_entra_groups_csv` reads the `Object ID` column from the Entra portal Groups export. New sidebar route `#/scan/entra` with three forms (manual IDs, CSV import, all-tenant). New filter option in the Scan Jobs type dropdown. Job Details renders Group / Group Type / User / Role columns for these jobs. Requires `Group.Read.All` on Microsoft Graph.
- **Recursive group expansion via Microsoft Graph** — when a SharePoint group member is itself a Microsoft 365 / Azure AD group, the resolver now expands it transitively. New helpers `_expand_aad_group_via_graph` and `_graph_collect` in `scanners/sharepoint.py` call `/groups?$filter=mail eq …` to look up the group, then `/groups/{id}/members` and `/groups/{id}/owners` to enumerate users. Owners are tagged with `(owner)` in the output. Recursion is depth-limited to 3 with a per-resolve `seen` set to break cycles. Output format puts nested members in square brackets after the group name, e.g. `Pharmacology@contoso.onmicrosoft.com [alice@contoso.com, bob@contoso.com (owner)]`. Requires the new `Group.Read.All` Application permission on Microsoft Graph (added to the onboarding instructions). Without it, group lines remain collapsed and labelled `(group, no readable members)`.
- **Resolve SharePoint groups** — new "Resolve groups" action on the Job Details panel for SharePoint and SharePoint-root jobs. Expands every SharePoint group principal (Owners / Members / Visitors / custom site groups) to its underlying user list via `/_api/web/sitegroups/getbyname/<group>/users` and writes the comma-separated members to `permission_deviations.resolved_members`. Members are rendered below the principal in the Deviations table and included in the Excel export. Azure AD security groups and federated claims (principals starting with `c:0…` / `i:0…` or containing `|`) are skipped — those would need `Group.Read.All` on Microsoft Graph. New endpoint `POST /api/scan-jobs/{id}/resolve-groups`, helper `sharepoint.is_sharepoint_group_principal()`.
- **SharePoint root-permissions scan mode** — new `scan_type='sharepoint_root'` that lists role assignments on the site root only, without traversing libraries/folders/files. Much faster (~1 HTTP call per target) and useful for an inventory of who has site-level access. New scanner function `sharepoint.scan_site_root_permissions`. Records are stored with `delta_type='root'` and `object_type='Site'`. Selectable on the New SharePoint Scan page via a "Scan mode" dropdown that controls both the manual-URL and CSV-import forms. New filter option in the Scan Jobs type filter. Noise filter `_is_noise_principal` excludes SharingLinks groups, `SHAREPOINT\system`/`NT AUTHORITY\*` accounts, and "Limited Access System Group" entries — these are SharePoint plumbing surfaced at site-root by spotted-item shares and are not part of a meaningful root inventory.
- **Tenant `primary_domain` field** — new column on `tenant_profiles`, exposed in the Add Tenant form (e.g. `contoso.onmicrosoft.com`). When set, the Mailbox scan page auto-fills the Organization field on tenant selection, and the API falls back to it when `organization` is omitted on a `scan_all_mailboxes` request. SharePoint scans are unaffected.
- **Expanded mailbox-scan onboarding instructions** — new "Enable mailbox scanning" section in the Add Tenant form covers adding the `Exchange.ManageAsApp` API permission, granting admin consent, assigning the Exchange Administrator Entra role to the service principal, certificate generation/upload, and primary-domain entry. Always visible (independent of automated/manual onboarding mode).
- **Scan all mailboxes in a tenant** — third option on the Mailbox scan page next to manual UPNs and CSV import. Clearview enumerates every mailbox via `Get-EXOMailbox -ResultSize Unlimited` and queues one target per mailbox. Requires the tenant's primary domain (e.g. `contoso.onmicrosoft.com`) and a tenant certificate. New PowerShell script `exo_scripts/list-mailboxes.ps1`, new Python helper `mailbox.list_mailboxes()`, new request fields `scan_all_mailboxes` and `organization`. Job source type is recorded as `tenant_all`.
### Changed
- **Sidebar logo** — replaced with a dark-background variant (`assets/clearview-logo-dark.svg`) so the "view" wordmark stays legible on the dark sidebar (previously rendered in `#141413` and was invisible).
- **English-only UI** — replaced remaining Dutch labels in the application with English equivalents: probe status `Nog niet getest`/`Mislukt` → `Not tested yet`/`Failed`, button label `Testen…``Testing…`, error toast `Test mislukt:``Test failed:`, and probe hints in `scanners/sharepoint.py` + `scanners/mailbox.py`. The Dutch→English role-name mapping table in `sharepoint.py` is unchanged (it normalizes incoming SharePoint role names).
- **Mailbox permission scanning** — Clearview can now scan Exchange Online mailboxes for delegated access alongside SharePoint sites.
- Permission categories collected: Full Access (`Get-MailboxPermission`), Send As (`Get-RecipientPermission`), Send on Behalf (`GrantSendOnBehalfTo` mailbox property), and folder delegations on Calendar and Inbox (`Get-MailboxFolderPermission`).
- Implementation: `pwsh` subprocess invoking the `ExchangeOnlineManagement` module with certificate-based app-only authentication (same tenant profile cert as SharePoint scans).
- Default principals (`NT AUTHORITY\SELF`, `S-1-5-*`, folder `Default`/`Anonymous=None`) are filtered out at scan time; only non-default permissions become deviations.
- Mailbox scans require a tenant certificate plus the `Office 365 Exchange Online → Exchange.ManageAsApp` API permission and the **Exchange Administrator** Entra role on the scan app's service principal. Client-secret auth is not supported by Exchange Online.
- **Frontend sidebar layout** — single-page UI replaced with a fixed left sidebar (200px, dark) and routed pages, mirroring the AlertHub layout convention.
- Routes via hash-based router: `#/dashboard`, `#/jobs`, `#/scan/sharepoint`, `#/scan/mailbox`, `#/tenants`, `#/settings`. Implementation stays vanilla HTML/JS/CSS (no React introduction).
- Job Details panel adapts column labels and headers based on `scan_type`: SharePoint shows Site/Object/Type/Principal/Role/Delta; Mailbox shows Mailbox/Object/Permission Type/Principal/Access Rights. SharingLinks resolution is hidden for mailbox jobs.
- Jobs list gets a **Type** column (SharePoint / Mailbox) and a type filter.
- **Scanners package** — `clearview_app/scanner.py` split into `clearview_app/scanners/{__init__.py, common.py, sharepoint.py, mailbox.py, exo_scripts/}`. Public dispatcher `scanners.scan(scan_type, target, auth, progress)` and `scanners.probe(scan_type, target, auth)`. The original `scanner.py` remains as a thin compatibility shim re-exporting the SharePoint API.
- **Datamodel changes** (auto-migrated on startup):
- `scan_jobs.scan_type VARCHAR(32) NOT NULL DEFAULT 'sharepoint'`
- `permission_deviations.permission_type VARCHAR(32)` — populated by mailbox scans (`FullAccess`, `SendAs`, `SendOnBehalf`, `Folder:Calendar`, `Folder:Inbox`)
- `tenant_profiles.cert_public_pem TEXT` — public PEM is now stored alongside the private key so the mailbox scanner can build a `.pfx` for `Connect-ExchangeOnline -CertificateFilePath`. Existing tenants need to regenerate the certificate before mailbox scanning is available; SharePoint scans keep working with the existing key.
- **Mailbox CSV import** — `parse_mailboxes_csv` accepts `UserPrincipalName` / `UPN` / `Email` / `Mailbox` / `Primary SMTP Address` columns with case-insensitive matching, dedup, and email-shape validation.
- **API additions**:
- `POST /api/scan-jobs` payload extended with `scan_type` and `mailboxes[]` next to the existing `site_urls[]`.
- `POST /api/scan-jobs/import-csv` accepts a `scan_type` form field (`sharepoint`|`mailbox`).
- `GET /api/scan-jobs?scan_type=…` filter.
- `ScanJobSummary.scan_type` and `PermissionDeviationItem.permission_type` returned.
- **Dockerfile** now installs Microsoft PowerShell 7 from the official Microsoft repository plus the `ExchangeOnlineManagement` PowerShell module from PSGallery. Adds ~150 MB to the image.
- **Build script migration** — replaced the local `build-and-push.sh` with the shared version from `/docker/develop/shared-integrations/tooling/docker-build-and-push/`. Reads the version from `docs/changelog.md` (release-summary file) instead of `version.txt`.
- **`docs/changelog.md`** — new release-summary changelog file used by the new build script. The development log (`changelog-develop.md`) remains the append-only source of truth for individual changes.
## [2026-04-23]
### Added
- **Connection preflight per scan target** — before a target is scanned, a lightweight probe validates that the configured credentials can reach the site and read role assignments (`/_api/web` + `/_api/web/roleassignments?$top=1`). Targets that fail preflight are marked `failed` with a clear reason (401/403/404 hints) instead of attempting the full scan. Fixes the previous silent-failure behaviour when admin consent or the certificate upload was missing in Azure.
- **Manual "Test" button** — new button in the Targets table in Job Details that re-runs the probe on demand. New endpoint: `POST /api/scan-jobs/{id}/targets/{target_id}/test-connection`. Blocked while the job is still queued or running.
- **Probe status in UI** — each target row shows the last probe result (OK / Mislukt / Nog niet getest) with timestamp and error message. Fields persist until the next test, so "last known status" remains visible even after permissions are later revoked.
- `scan_targets` table extended with `last_probe_at`, `last_probe_ok`, `last_probe_message` (auto-migrated on startup).
## [2026-04-13]
### Added
- **Site filter in Job Details** — dropdown in the Selected Job Details panel to filter Targets and Deviations tables by site URL (client-side, no extra API call).
- **Excel export** — `GET /api/scan-jobs/{id}/export` endpoint (optional `?site_url=` filter) returns a `.xlsx` file with two sheets:
- _Targets_: URL, status, attempts, error, timestamps.
- _Deviations_: Site URL, relative Object URL, Object Type, Principal, Link Risk (colour-coded), Resolved Members, Role, Delta — sorted by Site URL → Object URL → Principal.
- **Hierarchical deduplication** — after scanning a target, deviations are post-processed to suppress child-level entries already covered by a parent (library/folder). Prevents result explosion on large sites with deeply inherited permissions. No additional API calls.
- **SharingLinks classification and colour coding** — SharePoint sharing-link principals are parsed and displayed with a risk badge in the Deviations table:
- `Anonymous*` → Critical (red)
- `Flexible` → High (orange)
- `Organization*` → Low (blue)
- `Direct*` → Low (green)
- **Resolve Sharing Links** — post-scan action in the Job Details panel. Fetches the actual member list of sharing-link groups via `/_api/web/sitegroups/getbyname/users`. Stored in new `permission_deviations.resolved_members` column. Anonymous links produce an empty member list (shown as `(public link)`). New endpoint: `POST /api/scan-jobs/{id}/resolve-sharing-links`.
- **Role name normalisation** — common Dutch SharePoint role names (e.g. "Volledig beheer", "Bijdragen") are translated to their English equivalents at scan time before being stored.
- **`openpyxl` dependency** added to `requirements.txt`.
- **Favicon** replaced with a dedicated icon (blue rounded square with eye/keyhole symbol) instead of the concept design SVG.
### Changed
- `SCAN_TARGET_TIMEOUT_SEC` default raised from 180 s to 3600 s (1 hour) to accommodate large sites with tens of thousands of files.
- `permission_deviations` table extended with `resolved_members TEXT` column (auto-migrated on startup).
- Object URL in the Deviations table and Excel export is now shown relative to the site URL (site URL prefix stripped).
- Principal display in the Deviations table strips the SharePoint claim prefix (e.g. `i:0#.f|membership|`) and shows only the email/name; full value visible on hover.
- Site URL in the Deviations table is abbreviated to the last path segment with full URL on hover.
- Deviations table uses `table-layout: fixed` with column widths sized to fit on a 1080p display.
- `docs/TECHNICAL.md` and `README.md` updated to reflect all new functionality.
## [2026-04-13]
### Added
- Certificate-based authentication for SharePoint app-only access:
- Clearview generates a self-signed RSA-2048 certificate per tenant (no external CA required).
- New endpoint `POST /api/tenants/{id}/generate-certificate` stores the private key and returns the public cert.
- Public certificate downloadable as a `.cer` file from the UI, named after the tenant.
- Scanner uses MSAL with certificate when available; client secret remains as fallback.
- Resolves SharePoint error "Unsupported app only token" when using client secret authentication.
- `TenantProfile` extended with `cert_private_key`, `cert_thumbprint`, and `cert_expires_at`.
- Tenant table shows auth method (cert with expiry date or secret).
- Client secret is now optional when creating a tenant profile (can be omitted when a certificate will be used).
- Job deletion: `DELETE /api/scan-jobs/{id}` endpoint added (not allowed for queued or running jobs).
- Delete button per job in the UI; cascades to targets and deviations.
### Fixed
- SharePoint REST API error when fetching list items: removed `$filter=HasUniqueRoleAssignments eq true` as SharePoint does not support this field as an OData filter. The check is now performed client-side.
## [2026-04-13]
### Added
- Multi-tenant support: Clearview now manages multiple customer tenants from a single instance.
- New `TenantProfile` data model (`tenant_profiles` table) for storing customer credentials.
- `ScanJob` linked to a tenant profile via `tenant_profile_id` FK.
- API endpoints for tenant profile management: `GET/POST /api/tenants`, `DELETE /api/tenants/{id}`.
- `GET /api/scan-jobs` supports filtering by `tenant_profile_id`.
- UI fully redesigned for multi-tenant use:
- New **Tenants** panel with a table of configured customers, Add/Delete actions, and a Scan shortcut per tenant.
- Onboarding flow (Connect Microsoft / manual instructions) moved into the Add Tenant form.
- Scan form uses a tenant profile dropdown; manual credentials only shown as a fallback option.
- Jobs table extended with a **Tenant** column and a tenant filter dropdown.
- Hero stats now show: Tenants / Jobs / Active Jobs.
- XSS escaping added for all user-supplied data rendered in the jobs and deviations tables.
### Changed
- `TECHNICAL.md` updated with multi-tenant model documentation, tenant profile API, and redesigned onboarding flow.
## [2026-04-13]
### Added
- Initial repository structure.
- `containers/` directory added with the `clearview` starter service.
- `build-and-push.sh` added for container build and push.
- `docs/TECHNICAL.md` added.
- `docs/changelog-develop.md` added.
- `version.txt` added with initial version.
- `.last-branch` added for branch tracking in the build script.
## [2026-04-13]
### Added
- FastAPI backend integrated into the `clearview` container (single-container app runtime).
- PostgreSQL-backed scan job model (`scan_jobs`, `scan_targets`, `permission_deviations`).
- Background scan worker with queue processing, retries, and per-target timeout controls.
- API endpoints for manual URL scan creation, CSV import, job listing, and job detail retrieval.
- CSV parsing support for Microsoft Sites export format with URL normalization and de-duplication.
- Default-site skip rules for tenant root and app catalog paths.
- Frontend replaced with production-oriented scan UI:
- Manual URL submission
- CSV upload
- Job status overview
- Target-level result view
- Deviation table view
- Stack configuration extended with scan worker runtime environment settings.
### Changed
- `containers/clearview/Dockerfile` switched from static nginx hosting to Python FastAPI runtime.
### Added
- Real SharePoint scan implementation for app-only authentication mode (`SHAREPOINT_SCAN_MODE=sharepoint_app_only`):
- OAuth2 client credentials token acquisition via Microsoft Entra ID.
- Site root permission baseline loading through SharePoint REST `roleassignments`.
- Document library, folder, and file traversal with unique-permission detection (`HasUniqueRoleAssignments`).
- Deviation persistence only for rights not present on site root (`delta_type=added`).
- HTTP retry/backoff and throttle handling (429/503), plus list-level scan caps.
- Scanner HTTP retry/backoff and list-limit controls added in backend configuration.
### Changed
- Authentication flow updated to universal multi-tenant style:
- Azure credentials are now supplied per scan job from the web UI/API payload.
- No Azure tenant/client/secret dependency in stack `.env`.
- Added UI and technical documentation guidance for one-time Entra app setup and required SharePoint permission (`Sites.FullControl.All` + admin consent).
### Added
- Automated onboarding endpoint `POST /api/onboarding/create-scan-app`:
- Creates a dedicated scan app in Entra for the connected tenant.
- Configures SharePoint app permission `Sites.FullControl.All`.
- Creates service principal and assigns app role consent.
- Generates and returns a new client secret.
- Microsoft connect/admin-consent flow endpoints:
- `GET /api/onboarding/microsoft/connect-url`
- `GET /api/onboarding/microsoft/callback`
- UI onboarding flow updated:
- `Connect Microsoft` button for admin consent redirect
- Callback handling to capture tenant id
- Automatic scan-app creation without manual bootstrap app input