Full application including FastAPI backend, PostgreSQL data model, background scan worker, multi-tenant support, certificate authentication, SharePoint REST scanner with hierarchical deduplication, SharingLinks classification and post-scan resolve, Excel export, site filter in job details, role name normalisation, and updated documentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
11 KiB
TECHNICAL
Scope
Clearview scans SharePoint sites for permission deviations from the site root permission baseline. Designed to monitor multiple customer tenants from a single instance.
Runtime Architecture
- Single
clearviewapplication container (no separate API container). postgresservice for persistent job and result storage.adminerservice for direct database inspection.
All services are defined in stack/docker-compose.yml for Portainer deployment.
Application Layout
containers/clearview/site/- Frontend UI (tenant management, manual URL input, CSV import, jobs, deviations)
containers/clearview/src/clearview_app/- FastAPI backend
- SQLAlchemy models
- CSV parser
- Default-site filtering
- Background worker for long-running scans
Multi-Tenant Model
Clearview uses Tenant Profiles to manage multiple customer tenants from one instance.
Tenant Profiles
A tenant profile stores the Azure app credentials for one customer tenant:
| Field | Description |
|---|---|
name |
Label for internal reference (e.g. "Contoso") |
tenant_id |
Azure Directory (tenant) ID |
client_id |
Azure App (client) ID |
client_secret |
Azure App client secret |
Profiles are managed via the Tenants panel in the UI or directly via the API. When starting a scan, you select a profile from the dropdown — no manual credential entry needed.
Ad-hoc scans without a saved profile are still supported via Manual credentials in the scan form.
API Endpoints — Tenants
GET /api/tenants List all profiles (client_secret not returned)
POST /api/tenants Create a new profile
DELETE /api/tenants/{id} Delete a profile (jobs are retained, tenant link is cleared)
POST /api/tenants/{id}/generate-certificate Generate a self-signed certificate for this tenant
Certificate Authentication
Clearview supports app-only authentication via a self-signed certificate (recommended) or a client secret.
Generating a certificate:
- Click Certificate in the Tenants table.
- Clearview generates a self-signed RSA-2048 certificate valid for 2 years.
- Download the
.cerfile and upload it in Azure Portal → App registration → Certificates & secrets → Certificates. - The private key is stored internally; Clearview uses it automatically when starting a scan.
The scanner uses the certificate path when cert_thumbprint is present on the tenant profile; otherwise the client secret is used.
TenantProfile authentication fields:
| Field | Description |
|---|---|
client_secret |
Azure client secret (optional when a certificate is available) |
cert_private_key |
PEM-encoded private key (internal, never exposed via API) |
cert_thumbprint |
SHA-1 thumbprint (used by MSAL) |
cert_expires_at |
Certificate expiry date |
Scan Processing Model
Scans run asynchronously through a DB-backed job queue:
- User selects a tenant profile (or enters manual credentials) and submits URLs or a CSV.
- API validates and normalizes URLs.
- Default sites are skipped by rule (tenant root and app catalog).
- A scan job is queued in PostgreSQL, linked to the tenant profile when applicable.
- Background worker processes targets with retries and per-target timeout.
- API/UI expose progress and deviations per job.
Timeout and Retry Controls
Configured through environment variables (defaults shown):
| Variable | Default | Description |
|---|---|---|
SCAN_TARGET_TIMEOUT_SEC |
3600 |
Max seconds per target before it is marked failed |
SCAN_TARGET_MAX_RETRIES |
2 |
Number of retries on transient failure |
SCAN_RETRY_BASE_DELAY_SEC |
2 |
Base delay for exponential back-off between retries |
SCAN_JOB_POLL_INTERVAL_SEC |
3 |
How often the worker polls for new queued jobs |
SCAN_HTTP_TIMEOUT_SEC |
30 |
Per-request HTTP timeout toward SharePoint |
SCAN_HTTP_MAX_RETRIES |
3 |
Retries on HTTP 429/503 or connection errors |
SCAN_LIST_PAGE_SIZE |
200 |
Items per page when listing library contents |
SCAN_MAX_ITEMS_PER_LIST |
10000 |
Cap on items with unique permissions per library |
Deviation Detection
The scanner retrieves SharePoint REST role assignments at four levels:
- Site root
- Document library
- Folder
- File
Only permissions added relative to the site root are stored as deviations (delta_type=added).
No filesystem/NTFS permission model is used.
Hierarchical Deduplication
After all deviations for a target are collected they are post-processed: if a (principal, role) deviation is already reported at a parent URL (library or folder), the same deviation on child items is suppressed. This prevents an explosion of results when a single folder grant propagates to thousands of files.
Deduplication is pure in-memory post-processing — no additional API calls are made.
Role Name Normalisation
SharePoint returns role names in the language configured for the tenant. Clearview normalises common Dutch role names to their English equivalents before storing them:
| Dutch | English |
|---|---|
| Volledig beheer | Full Control |
| Bijdragen | Contribute |
| Lezen | Read |
| Bewerken | Edit |
| Ontwerpen | Design |
| Beperkte toegang | Limited Access |
| Goedkeuren | Approve |
| Hiërarchieën beheren | Manage Hierarchy |
| Weergeven alleen | View Only |
| Beperkt lezen | Restricted Read |
Unknown role names are stored as-is.
SharingLinks
SharePoint creates internal groups named SharingLinks.{guid}.{LinkType}.{guid} whenever a user shares a file or folder via a sharing link. Clearview detects these and classifies them by risk:
| Link type | Risk | UI colour |
|---|---|---|
Anonymous* |
Critical | Red |
Flexible |
High | Orange |
Organization* |
Low | Blue |
Direct* |
Low | Green |
Resolve Sharing Links — after a scan completes, the Job Details panel shows a Resolve Sharing Links section listing all SharingLinks types found in the job. The user selects which types to resolve and clicks Resolve. Clearview calls /_api/web/sitegroups/getbyname('{name}')/users for each unique group using the job's stored credentials and writes the member list to permission_deviations.resolved_members. Anonymous links have no resolvable members; their resolved_members field is stored as an empty string, displayed as (public link) in the UI.
Anonymous and Flexible types are pre-selected by default. Organization and Direct types are available but unchecked by default.
API Endpoints — Scan Jobs
GET /api/scan-jobs List jobs (optional ?tenant_profile_id=)
POST /api/scan-jobs Create job from URLs
POST /api/scan-jobs/import-csv Create job from CSV upload
GET /api/scan-jobs/{id} Get job detail (targets + deviations)
POST /api/scan-jobs/{id}/cancel Cancel a queued or running job
DELETE /api/scan-jobs/{id} Delete a completed job and all its data
POST /api/scan-jobs/{id}/resolve-sharing-links Resolve SharingLinks group members post-scan
GET /api/scan-jobs/{id}/export Download deviations as .xlsx (optional ?site_url=)
Job Details UI
The Selected Job Details panel provides:
- Site filter — dropdown populated from the job's targets; filters both the Targets and Deviations tables client-side without a new API call.
- Export Excel — downloads a
.xlsxwith two sheets:- Targets: URL, status, attempts, error, timestamps
- Deviations: Site URL, Object URL (relative to site), Object Type, Principal, Link Risk (colour-coded), Resolved Members, Role, Delta — sorted by Site URL → Object URL → Principal
- Resolve Sharing Links — see SharingLinks section above.
CSV Import
Expected input is Microsoft Sites export format.
- URL column is auto-detected (
URL/Site URL/SiteUrl). - UTF-8 BOM is supported.
- Duplicate URLs are de-duplicated.
Data Model
Main tables:
| Table | Key columns |
|---|---|
tenant_profiles |
credentials, cert_private_key, cert_thumbprint, cert_expires_at |
scan_jobs |
status, tenant_profile_id, progress counters, auth credentials |
scan_targets |
job_id, site_url, status, attempts, error_message |
permission_deviations |
job_id, site_url, object_url, object_type, principal, role_name, delta_type, resolved_members |
Scan jobs, targets, and deviations are cascade-deleted when a job is removed via DELETE /api/scan-jobs/{id}. Jobs with status queued or running cannot be deleted.
Schema migrations for new columns are applied automatically on startup via _ensure_schema_columns() in main.py.
Build and Release
Use ./build-and-push.sh from repo root.
./build-and-push.sh tfor test build (:devtag only)./build-and-push.sh 1patch release./build-and-push.sh 2minor release./build-and-push.sh 3major release
Current Scan Mode
SHAREPOINT_SCAN_MODE=sharepoint_app_only is active by default.
Azure app-only credentials are resolved per scan job from the linked tenant profile, or from the raw credentials submitted with the job when no profile is used.
Entra App Registration — two modes
The UI automatically detects which mode is active via GET /api/onboarding/status.
The onboarding flow is accessed from the Add Tenant form in the Tenants panel.
Mode A — Automated (platform app configured)
Requires a pre-registered Clearview platform app in Azure AD with permission to create
apps in customer tenants (Application.ReadWrite.All on Microsoft Graph).
Set the following in stack/.env:
ONBOARDING_CLIENT_ID=<platform-app-client-id>
ONBOARDING_CLIENT_SECRET=<platform-app-client-secret>
ONBOARDING_REDIRECT_URI=https://<your-clearview-domain>/api/onboarding/microsoft/callback
Flow per customer tenant:
- Click Add Tenant in the UI and then Connect Microsoft.
- Approve admin consent in the customer's Microsoft tenant.
- UI receives tenant context from the OAuth callback and pre-fills the tenant ID.
- Click Create Scan App Automatically to create a tenant-local scan app via Graph API.
- Clearview assigns SharePoint
Sites.FullControl.Alland generates a client secret. - Enter a name and click Save Tenant to store the profile.
Mode B — Manual (no platform app configured)
When ONBOARDING_* env vars are empty the UI shows step-by-step instructions to create
the scan app manually per customer tenant:
- Azure Portal → Entra ID → App registrations → New registration (Single tenant).
- Copy Directory (tenant) ID and Application (client) ID from the Overview page.
- API permissions → Add → SharePoint → Application permissions →
Sites.FullControl.All→ Grant admin consent. - Certificates & secrets → New client secret → copy the value (shown once).
- Enter the details in Add Tenant and click Save Tenant.