# TECHNICAL ## Scope Clearview scans SharePoint sites for permission deviations from the site root permission baseline. Designed to monitor multiple customer tenants from a single instance. ## Runtime Architecture - Single `clearview` application container (no separate API container). - `postgres` service for persistent job and result storage. - `adminer` service for direct database inspection. All services are defined in `stack/docker-compose.yml` for Portainer deployment. ## Application Layout - `containers/clearview/site/` - Frontend UI (tenant management, manual URL input, CSV import, jobs, deviations) - `containers/clearview/src/clearview_app/` - FastAPI backend - SQLAlchemy models - CSV parser - Default-site filtering - Background worker for long-running scans ## Multi-Tenant Model Clearview uses **Tenant Profiles** to manage multiple customer tenants from one instance. ### Tenant Profiles A tenant profile stores the Azure app credentials for one customer tenant: | Field | Description | |---|---| | `name` | Label for internal reference (e.g. "Contoso") | | `tenant_id` | Azure Directory (tenant) ID | | `client_id` | Azure App (client) ID | | `client_secret` | Azure App client secret | Profiles are managed via the **Tenants** panel in the UI or directly via the API. When starting a scan, you select a profile from the dropdown — no manual credential entry needed. Ad-hoc scans without a saved profile are still supported via **Manual credentials** in the scan form. ### API Endpoints — Tenants ``` GET /api/tenants List all profiles (client_secret not returned) POST /api/tenants Create a new profile DELETE /api/tenants/{id} Delete a profile (jobs are retained, tenant link is cleared) POST /api/tenants/{id}/generate-certificate Generate a self-signed certificate for this tenant ``` ### Certificate Authentication Clearview supports app-only authentication via a self-signed certificate (recommended) or a client secret. **Generating a certificate:** 1. Click **Certificate** in the Tenants table. 2. Clearview generates a self-signed RSA-2048 certificate valid for 2 years. 3. Download the `.cer` file and upload it in Azure Portal → App registration → Certificates & secrets → Certificates. 4. The private key is stored internally; Clearview uses it automatically when starting a scan. The scanner uses the certificate path when `cert_thumbprint` is present on the tenant profile; otherwise the client secret is used. `TenantProfile` authentication fields: | Field | Description | |---|---| | `client_secret` | Azure client secret (optional when a certificate is available) | | `cert_private_key` | PEM-encoded private key (internal, never exposed via API) | | `cert_thumbprint` | SHA-1 thumbprint (used by MSAL) | | `cert_expires_at` | Certificate expiry date | ## Scan Processing Model Scans run asynchronously through a DB-backed job queue: 1. User selects a tenant profile (or enters manual credentials) and submits URLs or a CSV. 2. API validates and normalizes URLs. 3. Default sites are skipped by rule (tenant root and app catalog). 4. A scan job is queued in PostgreSQL, linked to the tenant profile when applicable. 5. Background worker processes targets with retries and per-target timeout. 6. API/UI expose progress and deviations per job. ### Timeout and Retry Controls Configured through environment variables (defaults shown): | Variable | Default | Description | |---|---|---| | `SCAN_TARGET_TIMEOUT_SEC` | `3600` | Max seconds per target before it is marked failed | | `SCAN_TARGET_MAX_RETRIES` | `2` | Number of retries on transient failure | | `SCAN_RETRY_BASE_DELAY_SEC` | `2` | Base delay for exponential back-off between retries | | `SCAN_JOB_POLL_INTERVAL_SEC` | `3` | How often the worker polls for new queued jobs | | `SCAN_HTTP_TIMEOUT_SEC` | `30` | Per-request HTTP timeout toward SharePoint | | `SCAN_HTTP_MAX_RETRIES` | `3` | Retries on HTTP 429/503 or connection errors | | `SCAN_LIST_PAGE_SIZE` | `200` | Items per page when listing library contents | | `SCAN_MAX_ITEMS_PER_LIST` | `10000` | Cap on items with unique permissions per library | ## Deviation Detection The scanner retrieves SharePoint REST role assignments at four levels: - Site root - Document library - Folder - File Only permissions **added** relative to the site root are stored as deviations (`delta_type=added`). No filesystem/NTFS permission model is used. ### Hierarchical Deduplication After all deviations for a target are collected they are post-processed: if a `(principal, role)` deviation is already reported at a parent URL (library or folder), the same deviation on child items is suppressed. This prevents an explosion of results when a single folder grant propagates to thousands of files. Deduplication is pure in-memory post-processing — no additional API calls are made. ### Role Name Normalisation SharePoint returns role names in the language configured for the tenant. Clearview normalises common Dutch role names to their English equivalents before storing them: | Dutch | English | |---|---| | Volledig beheer | Full Control | | Bijdragen | Contribute | | Lezen | Read | | Bewerken | Edit | | Ontwerpen | Design | | Beperkte toegang | Limited Access | | Goedkeuren | Approve | | Hiërarchieën beheren | Manage Hierarchy | | Weergeven alleen | View Only | | Beperkt lezen | Restricted Read | Unknown role names are stored as-is. ### SharingLinks SharePoint creates internal groups named `SharingLinks.{guid}.{LinkType}.{guid}` whenever a user shares a file or folder via a sharing link. Clearview detects these and classifies them by risk: | Link type | Risk | UI colour | |---|---|---| | `Anonymous*` | Critical | Red | | `Flexible` | High | Orange | | `Organization*` | Low | Blue | | `Direct*` | Low | Green | **Resolve Sharing Links** — after a scan completes, the Job Details panel shows a _Resolve Sharing Links_ section listing all SharingLinks types found in the job. The user selects which types to resolve and clicks **Resolve**. Clearview calls `/_api/web/sitegroups/getbyname('{name}')/users` for each unique group using the job's stored credentials and writes the member list to `permission_deviations.resolved_members`. Anonymous links have no resolvable members; their `resolved_members` field is stored as an empty string, displayed as `(public link)` in the UI. Anonymous and Flexible types are pre-selected by default. Organization and Direct types are available but unchecked by default. ### API Endpoints — Scan Jobs ``` GET /api/scan-jobs List jobs (optional ?tenant_profile_id=) POST /api/scan-jobs Create job from URLs POST /api/scan-jobs/import-csv Create job from CSV upload GET /api/scan-jobs/{id} Get job detail (targets + deviations) POST /api/scan-jobs/{id}/cancel Cancel a queued or running job DELETE /api/scan-jobs/{id} Delete a completed job and all its data POST /api/scan-jobs/{id}/resolve-sharing-links Resolve SharingLinks group members post-scan GET /api/scan-jobs/{id}/export Download deviations as .xlsx (optional ?site_url=) ``` ## Job Details UI The **Selected Job Details** panel provides: - **Site filter** — dropdown populated from the job's targets; filters both the Targets and Deviations tables client-side without a new API call. - **Export Excel** — downloads a `.xlsx` with two sheets: - _Targets_: URL, status, attempts, error, timestamps - _Deviations_: Site URL, Object URL (relative to site), Object Type, Principal, Link Risk (colour-coded), Resolved Members, Role, Delta — sorted by Site URL → Object URL → Principal - **Resolve Sharing Links** — see SharingLinks section above. ## CSV Import Expected input is Microsoft Sites export format. - URL column is auto-detected (`URL` / `Site URL` / `SiteUrl`). - UTF-8 BOM is supported. - Duplicate URLs are de-duplicated. ## Data Model Main tables: | Table | Key columns | |---|---| | `tenant_profiles` | credentials, `cert_private_key`, `cert_thumbprint`, `cert_expires_at` | | `scan_jobs` | `status`, `tenant_profile_id`, progress counters, auth credentials | | `scan_targets` | `job_id`, `site_url`, `status`, `attempts`, `error_message` | | `permission_deviations` | `job_id`, `site_url`, `object_url`, `object_type`, `principal`, `role_name`, `delta_type`, `resolved_members` | Scan jobs, targets, and deviations are cascade-deleted when a job is removed via `DELETE /api/scan-jobs/{id}`. Jobs with status `queued` or `running` cannot be deleted. Schema migrations for new columns are applied automatically on startup via `_ensure_schema_columns()` in `main.py`. ## Build and Release Use `./build-and-push.sh` from repo root. - `./build-and-push.sh t` for test build (`:dev` tag only) - `./build-and-push.sh 1` patch release - `./build-and-push.sh 2` minor release - `./build-and-push.sh 3` major release ## Current Scan Mode `SHAREPOINT_SCAN_MODE=sharepoint_app_only` is active by default. Azure app-only credentials are resolved per scan job from the linked tenant profile, or from the raw credentials submitted with the job when no profile is used. ### Entra App Registration — two modes The UI automatically detects which mode is active via `GET /api/onboarding/status`. The onboarding flow is accessed from the **Add Tenant** form in the Tenants panel. #### Mode A — Automated (platform app configured) Requires a pre-registered Clearview platform app in Azure AD with permission to create apps in customer tenants (`Application.ReadWrite.All` on Microsoft Graph). Set the following in `stack/.env`: ``` ONBOARDING_CLIENT_ID= ONBOARDING_CLIENT_SECRET= ONBOARDING_REDIRECT_URI=https:///api/onboarding/microsoft/callback ``` Flow per customer tenant: 1. Click **Add Tenant** in the UI and then **Connect Microsoft**. 2. Approve admin consent in the customer's Microsoft tenant. 3. UI receives tenant context from the OAuth callback and pre-fills the tenant ID. 4. Click **Create Scan App Automatically** to create a tenant-local scan app via Graph API. 5. Clearview assigns SharePoint `Sites.FullControl.All` and generates a client secret. 6. Enter a name and click **Save Tenant** to store the profile. #### Mode B — Manual (no platform app configured) When `ONBOARDING_*` env vars are empty the UI shows step-by-step instructions to create the scan app manually per customer tenant: 1. Azure Portal → Entra ID → App registrations → New registration (Single tenant). 2. Copy Directory (tenant) ID and Application (client) ID from the Overview page. 3. API permissions → Add → SharePoint → Application permissions → `Sites.FullControl.All` → Grant admin consent. 4. Certificates & secrets → New client secret → copy the value (shown once). 5. Enter the details in **Add Tenant** and click **Save Tenant**.