- Split scanner.py into scanners/ package (entra, mailbox, sharepoint, common) - Add Exchange Online PowerShell probe scripts under scanners/exo_scripts - Frontend overhaul: AlertHub-style sidebar layout, dark logo asset, expanded app.js/index.html/styles.css - Backend updates across main.py, worker.py, models.py, schemas.py, csv_import.py - Update Dockerfile and build-and-push.sh - Update TECHNICAL.md, changelog-develop.md, add summary changelog.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 KiB
TECHNICAL
Scope
Clearview scans Microsoft 365 for permission deviations across two domains:
- SharePoint sites — deviations relative to the site root permission baseline (libraries, folders, files).
- Exchange Online mailboxes — non-default permissions: Full Access, Send As, Send on Behalf, and folder delegations (Calendar, Inbox).
Designed to monitor multiple customer tenants from a single instance.
Runtime Architecture
- Single
clearviewapplication container (no separate API container). postgresservice for persistent job and result storage.adminerservice for direct database inspection.
All services are defined in stack/docker-compose.yml for Portainer deployment.
Application Layout
containers/clearview/site/- Frontend UI: vanilla HTML/JS/CSS with a fixed sidebar and hash-based routing.
- Routes:
#/dashboard,#/jobs,#/scan/sharepoint,#/scan/mailbox,#/tenants,#/settings.
containers/clearview/src/clearview_app/- FastAPI backend
- SQLAlchemy models
- CSV parser (SharePoint URLs and mailbox UPNs)
- Default-site filtering (SharePoint only)
- Background worker for long-running scans
containers/clearview/src/clearview_app/scanners/common.py—AuthConfig,DeviationRecord,ScanResult,ProbeResult, shared helpers.sharepoint.py— SharePoint REST scanner, MSAL token cache, hierarchical dedup, SharingLinks helpers.mailbox.py— Exchange Online scanner; spawnspwshwith the EXO scripts.exo_scripts/— PowerShell scripts (probe.ps1,get-permissions.ps1).- Dispatcher:
scanners.scan(scan_type, target, auth, progress)andscanners.probe(scan_type, target, auth).
Multi-Tenant Model
Clearview uses Tenant Profiles to manage multiple customer tenants from one instance.
Tenant Profiles
A tenant profile stores the Azure app credentials for one customer tenant:
| Field | Description |
|---|---|
name |
Label for internal reference (e.g. "Contoso") |
tenant_id |
Azure Directory (tenant) ID |
client_id |
Azure App (client) ID |
client_secret |
Azure App client secret |
Profiles are managed via the Tenants panel in the UI or directly via the API. When starting a scan, you select a profile from the dropdown — no manual credential entry needed.
Ad-hoc scans without a saved profile are still supported via Manual credentials in the scan form.
API Endpoints — Tenants
GET /api/tenants List all profiles (client_secret not returned)
POST /api/tenants Create a new profile
DELETE /api/tenants/{id} Delete a profile (jobs are retained, tenant link is cleared)
POST /api/tenants/{id}/generate-certificate Generate a self-signed certificate for this tenant
Certificate Authentication
Clearview supports app-only authentication via a self-signed certificate (recommended) or a client secret.
Generating a certificate:
- Click Certificate in the Tenants table.
- Clearview generates a self-signed RSA-2048 certificate valid for 2 years.
- Download the
.cerfile and upload it in Azure Portal → App registration → Certificates & secrets → Certificates. - The private key is stored internally; Clearview uses it automatically when starting a scan.
The scanner uses the certificate path when cert_thumbprint is present on the tenant profile; otherwise the client secret is used.
TenantProfile authentication fields:
| Field | Description |
|---|---|
client_secret |
Azure client secret (optional when a certificate is available) |
cert_private_key |
PEM-encoded private key (internal, never exposed via API) |
cert_public_pem |
PEM-encoded public certificate (used to build a PFX for Exchange Online PowerShell) |
cert_thumbprint |
SHA-1 thumbprint (used by MSAL) |
cert_expires_at |
Certificate expiry date |
Scan Processing Model
Scans run asynchronously through a DB-backed job queue:
- User selects a tenant profile (or enters manual credentials) and submits URLs or a CSV.
- API validates and normalizes URLs.
- Default sites are skipped by rule (tenant root and app catalog).
- A scan job is queued in PostgreSQL, linked to the tenant profile when applicable.
- Background worker processes targets with retries and per-target timeout.
- API/UI expose progress and deviations per job.
Connection Preflight
Before the full scan of a target runs, the worker performs a lightweight probe to verify that the configured credentials can actually reach the site and read role assignments. This catches the common setup errors (missing admin consent, certificate not yet uploaded to Azure, wrong tenant/client ID) early and with a clear message, instead of producing a silent 401 during the full scan.
The probe issues two calls:
GET /_api/web?$select=Title— validates token + tenant + site URL.GET /_api/web/roleassignments?$top=1&$select=PrincipalId— validates that the app actually has permission to read role assignments (not only basic read).
The result is persisted per target in last_probe_at, last_probe_ok, and last_probe_message. If the probe fails, the target is marked failed with error_message = "Preflight: <hint>" and the full scan is skipped. Hints interpret common HTTP codes:
| Code | Hint |
|---|---|
401 on /_api/web |
Certificate not uploaded in Azure, or wrong tenant/client ID |
401 on /roleassignments |
Admin consent missing, or granted permission too low |
| 403 | App has no access to this site (e.g. Sites.Selected without a per-site grant) |
| 404 | Site not found |
The same probe is exposed as an on-demand Test connection action on each target in the Job Details UI (see API Endpoints below). The action is blocked while the job is still queued or running.
Timeout and Retry Controls
Configured through environment variables (defaults shown):
| Variable | Default | Description |
|---|---|---|
SCAN_TARGET_TIMEOUT_SEC |
3600 |
Max seconds per target before it is marked failed |
SCAN_TARGET_MAX_RETRIES |
2 |
Number of retries on transient failure |
SCAN_RETRY_BASE_DELAY_SEC |
2 |
Base delay for exponential back-off between retries |
SCAN_JOB_POLL_INTERVAL_SEC |
3 |
How often the worker polls for new queued jobs |
SCAN_HTTP_TIMEOUT_SEC |
30 |
Per-request HTTP timeout toward SharePoint |
SCAN_HTTP_MAX_RETRIES |
3 |
Retries on HTTP 429/503 or connection errors |
SCAN_LIST_PAGE_SIZE |
200 |
Items per page when listing library contents |
SCAN_MAX_ITEMS_PER_LIST |
10000 |
Cap on items with unique permissions per library |
Deviation Detection
The scanner retrieves SharePoint REST role assignments at four levels:
- Site root
- Document library
- Folder
- File
Only permissions added relative to the site root are stored as deviations (delta_type=added).
No filesystem/NTFS permission model is used.
Hierarchical Deduplication
After all deviations for a target are collected they are post-processed: if a (principal, role) deviation is already reported at a parent URL (library or folder), the same deviation on child items is suppressed. This prevents an explosion of results when a single folder grant propagates to thousands of files.
Deduplication is pure in-memory post-processing — no additional API calls are made.
Role Name Normalisation
SharePoint returns role names in the language configured for the tenant. Clearview normalises common Dutch role names to their English equivalents before storing them:
| Dutch | English |
|---|---|
| Volledig beheer | Full Control |
| Bijdragen | Contribute |
| Lezen | Read |
| Bewerken | Edit |
| Ontwerpen | Design |
| Beperkte toegang | Limited Access |
| Goedkeuren | Approve |
| Hiërarchieën beheren | Manage Hierarchy |
| Weergeven alleen | View Only |
| Beperkt lezen | Restricted Read |
Unknown role names are stored as-is.
SharingLinks
SharePoint creates internal groups named SharingLinks.{guid}.{LinkType}.{guid} whenever a user shares a file or folder via a sharing link. Clearview detects these and classifies them by risk:
| Link type | Risk | UI colour |
|---|---|---|
Anonymous* |
Critical | Red |
Flexible |
High | Orange |
Organization* |
Low | Blue |
Direct* |
Low | Green |
Resolve Sharing Links — after a scan completes, the Job Details panel shows a Resolve Sharing Links section listing all SharingLinks types found in the job. The user selects which types to resolve and clicks Resolve. Clearview calls /_api/web/sitegroups/getbyname('{name}')/users for each unique group using the job's stored credentials and writes the member list to permission_deviations.resolved_members. Anonymous links have no resolvable members; their resolved_members field is stored as an empty string, displayed as (public link) in the UI.
Anonymous and Flexible types are pre-selected by default. Organization and Direct types are available but unchecked by default.
API Endpoints — Scan Jobs
GET /api/scan-jobs List jobs (optional ?tenant_profile_id=)
POST /api/scan-jobs Create job from URLs
POST /api/scan-jobs/import-csv Create job from CSV upload
GET /api/scan-jobs/{id} Get job detail (targets + deviations)
POST /api/scan-jobs/{id}/cancel Cancel a queued or running job
DELETE /api/scan-jobs/{id} Delete a completed job and all its data
POST /api/scan-jobs/{id}/resolve-sharing-links Resolve SharingLinks group members post-scan
POST /api/scan-jobs/{id}/targets/{tid}/test-connection Re-run the connection preflight for one target
GET /api/scan-jobs/{id}/export Download deviations as .xlsx (optional ?site_url=)
Job Details UI
The Selected Job Details panel provides:
- Site filter — dropdown populated from the job's targets; filters both the Targets and Deviations tables client-side without a new API call.
- Export Excel — downloads a
.xlsxwith two sheets:- Targets: URL, status, attempts, error, timestamps
- Deviations: Site URL, Object URL (relative to site), Object Type, Principal, Link Risk (colour-coded), Resolved Members, Role, Delta — sorted by Site URL → Object URL → Principal
- Resolve Sharing Links — see SharingLinks section above.
CSV Import
Expected input is Microsoft Sites export format.
- URL column is auto-detected (
URL/Site URL/SiteUrl). - UTF-8 BOM is supported.
- Duplicate URLs are de-duplicated.
Data Model
Main tables:
| Table | Key columns |
|---|---|
tenant_profiles |
credentials, cert_private_key, cert_public_pem, cert_thumbprint, cert_expires_at |
scan_jobs |
status, scan_type (sharepoint/mailbox), tenant_profile_id, progress counters, auth credentials |
scan_targets |
job_id, site_url (holds UPN for mailbox jobs), status, attempts, error_message, last_probe_at, last_probe_ok, last_probe_message |
permission_deviations |
job_id, site_url, object_url, object_type, principal, role_name, delta_type, permission_type, resolved_members |
Scan jobs, targets, and deviations are cascade-deleted when a job is removed via DELETE /api/scan-jobs/{id}. Jobs with status queued or running cannot be deleted.
Schema migrations for new columns are applied automatically on startup via _ensure_schema_columns() in main.py.
Mailbox Scanning
Mailbox scans use Exchange Online PowerShell with certificate-based app-only auth.
What is collected
| Permission | PowerShell source | permission_type value |
|---|---|---|
| Full Access (and other mailbox-level rights) | Get-MailboxPermission |
FullAccess |
| Send As | Get-RecipientPermission (AccessControlType=Allow) |
SendAs |
| Send on Behalf | mailbox property GrantSendOnBehalfTo |
SendOnBehalf |
| Folder delegation — Calendar | Get-MailboxFolderPermission "<upn>:\Calendar" |
Folder:Calendar |
| Folder delegation — Inbox | Get-MailboxFolderPermission "<upn>:\Inbox" |
Folder:Inbox |
The scanner filters out NT AUTHORITY\SELF, S-1-5-* SIDs, inherited mailbox permissions, and the default folder principals (Default, Anonymous with None rights). What remains is stored as deviations on the job — there is no SharePoint-style root baseline; every non-default principal counts.
Authentication
Mailbox scanning uses the same tenant certificate as SharePoint, but Exchange Online requires a .pfx rather than a thumbprint + raw private key. At scan time Clearview builds an in-memory PFX from cert_private_key + cert_public_pem (random password), writes it to a tempdir, and removes it immediately after the pwsh process exits.
Targets
Three ways to seed a mailbox scan job:
- Manual UPNs — paste one UPN per line.
- CSV import — column
UserPrincipalName/Email/Mailbox/Primary SMTP Address(auto-detected, case-insensitive). - All mailboxes in tenant — Clearview enumerates every mailbox via
Get-EXOMailbox -ResultSize Unlimitedand queues one target per mailbox. Requires the tenant's primary domain (e.g.contoso.onmicrosoft.com) soConnect-ExchangeOnline -Organizationcan authenticate. Capped at 50000 mailboxes per job.
Required Azure permissions
In addition to the SharePoint setup the scan app needs:
- API permission: Office 365 Exchange Online → Application permissions →
Exchange.ManageAsApp(admin-consented). - Entra role assigned to the app's service principal: Exchange Administrator (cannot be granted via Microsoft Graph; must be assigned in Azure Portal → Entra ID → Roles and administrators).
Runtime requirements
The container image installs:
- PowerShell 7 (
pwsh) from the official Microsoft package repo. ExchangeOnlineManagementmodule from PSGallery (Install-Module -Scope AllUsers).
Adds roughly 150 MB to the image. Without these, mailbox probes return pwsh not available in runtime and scans fail.
Probe
Mailbox preflight runs probe.ps1 which connects to Exchange Online and calls Get-EXOMailbox -Identity <upn> -PropertySets Minimum. Failure hints map common errors:
| Error fragment | Hint |
|---|---|
Unauthorized / 401 / AADSTS* |
Check Exchange.ManageAsApp permission, admin consent, and the Exchange Administrator role assignment |
Couldn't find object / not found |
Mailbox does not exist in this tenant |
module not available |
ExchangeOnlineManagement PS module missing in the container |
Build and Release
./build-and-push.sh from the repo root, sourced from the shared script in /docker/develop/shared-integrations/tooling/docker-build-and-push/.
./build-and-push.sh t— test build, push:devtag only../build-and-push.sh r— release build, parses the version fromdocs/changelog.md(first## vX.Y.Zheading), pushes:<version>,:dev, and:latest.
The script performs no git operations. After a successful release, run the git commit / git tag / git push --tags commands the script prints in its summary.
Current Scan Mode
SHAREPOINT_SCAN_MODE=sharepoint_app_only is active by default.
Azure app-only credentials are resolved per scan job from the linked tenant profile, or from the raw credentials submitted with the job when no profile is used.
Entra App Registration — two modes
The UI automatically detects which mode is active via GET /api/onboarding/status.
The onboarding flow is accessed from the Add Tenant form in the Tenants panel.
Mode A — Automated (platform app configured)
Requires a pre-registered Clearview platform app in Azure AD with permission to create
apps in customer tenants (Application.ReadWrite.All on Microsoft Graph).
Set the following in stack/.env:
ONBOARDING_CLIENT_ID=<platform-app-client-id>
ONBOARDING_CLIENT_SECRET=<platform-app-client-secret>
ONBOARDING_REDIRECT_URI=https://<your-clearview-domain>/api/onboarding/microsoft/callback
Flow per customer tenant:
- Click Add Tenant in the UI and then Connect Microsoft.
- Approve admin consent in the customer's Microsoft tenant.
- UI receives tenant context from the OAuth callback and pre-fills the tenant ID.
- Click Create Scan App Automatically to create a tenant-local scan app via Graph API.
- Clearview assigns SharePoint
Sites.FullControl.Alland generates a client secret. - Enter a name and click Save Tenant to store the profile.
Mode B — Manual (no platform app configured)
When ONBOARDING_* env vars are empty the UI shows step-by-step instructions to create
the scan app manually per customer tenant:
- Azure Portal → Entra ID → App registrations → New registration (Single tenant).
- Copy Directory (tenant) ID and Application (client) ID from the Overview page.
- API permissions → Add → SharePoint → Application permissions →
Sites.FullControl.All→ Grant admin consent. - Certificates & secrets → New client secret → copy the value (shown once).
- Enter the details in Add Tenant and click Save Tenant.