Refactor scanner into modular package and add AlertHub-style frontend

- Split scanner.py into scanners/ package (entra, mailbox, sharepoint, common)
- Add Exchange Online PowerShell probe scripts under scanners/exo_scripts
- Frontend overhaul: AlertHub-style sidebar layout, dark logo asset, expanded app.js/index.html/styles.css
- Backend updates across main.py, worker.py, models.py, schemas.py, csv_import.py
- Update Dockerfile and build-and-push.sh
- Update TECHNICAL.md, changelog-develop.md, add summary changelog.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Ivo Oskamp 2026-05-06 13:49:04 +02:00
parent bccc39b185
commit e304b2b3d4
20 changed files with 3709 additions and 963 deletions

View File

@ -1,52 +1,56 @@
#!/usr/bin/env bash #!/usr/bin/env bash
set -euo pipefail set -euo pipefail
# ============================================================================
# build-and-push.sh
#
# Purpose:
# - Build & push Docker images for each service under ./containers/*
# - Two modes:
# t (test) = only push :dev
# r (release) = push :<version>, :dev, :latest
# version is read from the top of changelog.md
#
# No git operations: committing and tagging is done manually.
#
# Usage:
# ./build-and-push.sh [mode]
# - mode = t -> test build, push :dev only
# - mode = r -> release build, version taken from changelog.md
# - omitted -> prompt (default: t)
#
# Requirements:
# - docs/changelog.md (relative to repo root), with the most recent release
# at the top as:
# ## vX.Y.Z — YYYY-MM-DD
# (the version is parsed from the first such line)
# - One Dockerfile per service under ./containers/<service>/Dockerfile
# ============================================================================
DOCKER_REGISTRY="gitea.oskamp.info" DOCKER_REGISTRY="gitea.oskamp.info"
DOCKER_NAMESPACE="ivooskamp" DOCKER_NAMESPACE="ivooskamp"
VERSION_FILE="version.txt" CHANGELOG_FILE="docs/changelog.md"
START_VERSION="v0.1.0"
CONTAINERS_DIR="containers" CONTAINERS_DIR="containers"
LAST_BRANCH_FILE=".last-branch"
BUMP="${1:-}" # --- Input: prompt if missing ------------------------------------------------
if [[ -z "${BUMP}" ]]; then MODE="${1:-}"
echo "Select bump type: [1] patch, [2] minor, [3] major, [t] test (default: t)" if [[ -z "${MODE}" ]]; then
read -r BUMP echo "Select build type: [t] test build (push :dev only), [r] release build (default: t)"
BUMP="${BUMP:-t}" read -r MODE
MODE="${MODE:-t}"
fi fi
if [[ "$BUMP" != "1" && "$BUMP" != "2" && "$BUMP" != "3" && "$BUMP" != "t" ]]; then case "$MODE" in
echo "[ERROR] Unknown bump type '$BUMP' (use 1, 2, 3, or t)." t|test) MODE="t" ;;
r|release) MODE="r" ;;
*)
echo "[ERROR] Unknown mode '$MODE' (use 't' for test or 'r' for release)."
exit 1 exit 1
fi ;;
read_version() {
if [[ -f "$VERSION_FILE" ]]; then
tr -d ' \t\n\r' < "$VERSION_FILE"
else
echo "$START_VERSION"
fi
}
write_version() {
echo "$1" > "$VERSION_FILE"
}
bump_version() {
local cur="$1"
local kind="$2"
local core="${cur#v}"
IFS='.' read -r MA MI PA <<< "$core"
case "$kind" in
1) PA=$((PA + 1));;
2) MI=$((MI + 1)); PA=0;;
3) MA=$((MA + 1)); MI=0; PA=0;;
*) echo "[ERROR] Unknown bump kind"; exit 1;;
esac esac
echo "v${MA}.${MI}.${PA}"
}
# --- Helpers -----------------------------------------------------------------
check_docker_ready() { check_docker_ready() {
if ! docker info >/dev/null 2>&1; then if ! docker info >/dev/null 2>&1; then
echo "[ERROR] Docker daemon not reachable. Is Docker running and do you have permission to use it?" echo "[ERROR] Docker daemon not reachable. Is Docker running and do you have permission to use it?"
@ -70,7 +74,7 @@ validate_repo_component() {
local comp="$1" local comp="$1"
if [[ ! "$comp" =~ ^[a-z0-9]+([._-][a-z0-9]+)*$ ]]; then if [[ ! "$comp" =~ ^[a-z0-9]+([._-][a-z0-9]+)*$ ]]; then
echo "[ERROR] Invalid repository component '$comp'." echo "[ERROR] Invalid repository component '$comp'."
echo " Must match: ^[a-z0-9]+([._-][a-z0-9]+)*$" echo " Must match: ^[a-z0-9]+([._-][a-z0-9]+)*$ (lowercase, digits, ., _, - as separators)."
return 1 return 1
fi fi
} }
@ -88,11 +92,33 @@ validate_tag() {
fi fi
} }
if [[ ! -d ".git" ]]; then # Parse the first "## vX.Y.Z ..." heading from changelog.md.
echo "[ERROR] Not a git repository (.git missing)." # Accepts: ## v1.0.3 — 2026-04-24
# ## v1.0.3 - 2026-04-24
# ## v1.0.3
read_version_from_changelog() {
if [[ ! -f "$CHANGELOG_FILE" ]]; then
echo "[ERROR] $CHANGELOG_FILE not found in $(pwd)." >&2
exit 1 exit 1
fi fi
local line
# Match lines starting with "## v<digits>.<digits>.<digits>"
line="$(grep -m1 -E '^##[[:space:]]+v[0-9]+\.[0-9]+\.[0-9]+' "$CHANGELOG_FILE" || true)"
if [[ -z "$line" ]]; then
echo "[ERROR] No release heading found in $CHANGELOG_FILE (expected e.g. '## v1.0.3 — 2026-04-24' near the top)." >&2
exit 1
fi
# Extract the vX.Y.Z token
local version
version="$(echo "$line" | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+' | head -n1)"
if [[ -z "$version" ]]; then
echo "[ERROR] Could not parse version from line: $line" >&2
exit 1
fi
echo "$version"
}
# --- Preflight ---------------------------------------------------------------
if [[ ! -d "$CONTAINERS_DIR" ]]; then if [[ ! -d "$CONTAINERS_DIR" ]]; then
echo "[ERROR] '$CONTAINERS_DIR' directory missing. Expected ./${CONTAINERS_DIR}/<service>/ with a Dockerfile." echo "[ERROR] '$CONTAINERS_DIR' directory missing. Expected ./${CONTAINERS_DIR}/<service>/ with a Dockerfile."
exit 1 exit 1
@ -102,59 +128,40 @@ check_docker_ready
ensure_registry_login ensure_registry_login
validate_repo_component "$DOCKER_NAMESPACE" validate_repo_component "$DOCKER_NAMESPACE"
DETECTED_BRANCH="$(git branch --show-current 2>/dev/null || true)" # Informational: show branch and HEAD if this happens to be a git repo.
if [[ -z "$DETECTED_BRANCH" ]]; then BRANCH_INFO=""
DETECTED_BRANCH="$(git symbolic-ref --quiet --short HEAD 2>/dev/null || true)" HEAD_INFO=""
fi if [[ -d ".git" ]]; then
if [[ -z "$DETECTED_BRANCH" ]]; then BRANCH_INFO="$(git branch --show-current 2>/dev/null || echo unknown)"
DETECTED_BRANCH="main" HEAD_INFO="$(git rev-parse --short HEAD 2>/dev/null || echo unknown)"
fi
UPSTREAM_REF="$(git rev-parse --abbrev-ref --symbolic-full-name @{u} 2>/dev/null || echo "origin/$DETECTED_BRANCH")"
HEAD_SHA="$(git rev-parse --short HEAD 2>/dev/null || echo "unknown")"
LAST_BRANCH_FILE_PATH="$(pwd)/$LAST_BRANCH_FILE"
echo "[INFO] Repo: $(pwd)" echo "[INFO] Repo: $(pwd)"
echo "[INFO] Current branch: $DETECTED_BRANCH" echo "[INFO] Current branch: $BRANCH_INFO"
echo "[INFO] Upstream: $UPSTREAM_REF" echo "[INFO] HEAD (sha): $HEAD_INFO"
echo "[INFO] HEAD (sha): $HEAD_SHA"
CURRENT_VERSION="$(read_version)"
NEW_VERSION="$CURRENT_VERSION"
DO_TAG_AND_BUMP=true
if [[ "$BUMP" == "t" ]]; then
echo "[INFO] Test build: keeping version $CURRENT_VERSION; will only update :dev."
DO_TAG_AND_BUMP=false
else else
NEW_VERSION="$(bump_version "$CURRENT_VERSION" "$BUMP")" echo "[INFO] Repo: $(pwd) (not a git checkout)"
echo "[INFO] New version: $NEW_VERSION"
fi fi
if $DO_TAG_AND_BUMP; then # --- Determine version (release only) ----------------------------------------
validate_tag "$NEW_VERSION" VERSION=""
if [[ "$MODE" == "r" ]]; then
VERSION="$(read_version_from_changelog)"
echo "[INFO] Release version (from $CHANGELOG_FILE): $VERSION"
validate_tag "$VERSION"
validate_tag "latest" validate_tag "latest"
# Ask for confirmation so you never accidentally re-push an old version or a wrong one.
read -r -p "Proceed building & pushing as ${VERSION}? [y/N] " CONFIRM
CONFIRM="${CONFIRM:-N}"
if [[ ! "$CONFIRM" =~ ^[Yy]$ ]]; then
echo "[INFO] Aborted by user."
exit 0
fi
else
echo "[INFO] Test build: only :dev will be pushed."
fi fi
validate_tag "dev" validate_tag "dev"
if $DO_TAG_AND_BUMP; then # --- Build & push per service ------------------------------------------------
echo "[INFO] Writing $NEW_VERSION to $VERSION_FILE"
write_version "$NEW_VERSION"
echo "[INFO] Git add + commit (branch: $DETECTED_BRANCH)"
git add "$VERSION_FILE"
git commit -m "Release $NEW_VERSION on branch $DETECTED_BRANCH (bump type $BUMP)"
echo "[INFO] Git tag $NEW_VERSION"
git tag -a "$NEW_VERSION" -m "Release $NEW_VERSION"
echo "[INFO] Git push + tags"
git push origin "$DETECTED_BRANCH"
git push --tags
else
echo "[INFO] Skipping commit/tagging (test build)."
fi
shopt -s nullglob shopt -s nullglob
services=( "$CONTAINERS_DIR"/* ) services=( "$CONTAINERS_DIR"/* )
if [[ ${#services[@]} -eq 0 ]]; then if [[ ${#services[@]} -eq 0 ]]; then
@ -178,21 +185,21 @@ for svc_path in "${services[@]}"; do
IMAGE_BASE="${DOCKER_REGISTRY}/${DOCKER_NAMESPACE}/${svc}" IMAGE_BASE="${DOCKER_REGISTRY}/${DOCKER_NAMESPACE}/${svc}"
if $DO_TAG_AND_BUMP; then if [[ "$MODE" == "r" ]]; then
echo "============================================================" echo "============================================================"
echo "[INFO] Building ${svc} -> tags: ${NEW_VERSION}, dev, latest" echo "[INFO] Building ${svc} -> tags: ${VERSION}, dev, latest"
echo "============================================================" echo "============================================================"
docker build \ docker build \
-t "${IMAGE_BASE}:${NEW_VERSION}" \ -t "${IMAGE_BASE}:${VERSION}" \
-t "${IMAGE_BASE}:dev" \ -t "${IMAGE_BASE}:dev" \
-t "${IMAGE_BASE}:latest" \ -t "${IMAGE_BASE}:latest" \
"$svc_path" "$svc_path"
docker push "${IMAGE_BASE}:${NEW_VERSION}" docker push "${IMAGE_BASE}:${VERSION}"
docker push "${IMAGE_BASE}:dev" docker push "${IMAGE_BASE}:dev"
docker push "${IMAGE_BASE}:latest" docker push "${IMAGE_BASE}:latest"
BUILT_IMAGES+=("${IMAGE_BASE}:${NEW_VERSION}" "${IMAGE_BASE}:dev" "${IMAGE_BASE}:latest") BUILT_IMAGES+=("${IMAGE_BASE}:${VERSION}" "${IMAGE_BASE}:dev" "${IMAGE_BASE}:latest")
else else
echo "============================================================" echo "============================================================"
echo "[INFO] Test build ${svc} -> tag: dev" echo "[INFO] Test build ${svc} -> tag: dev"
@ -203,18 +210,27 @@ for svc_path in "${services[@]}"; do
fi fi
done done
echo "$DETECTED_BRANCH" > "$LAST_BRANCH_FILE_PATH" # --- Summary -----------------------------------------------------------------
echo "" echo ""
echo "============================================================" echo "============================================================"
echo "[SUMMARY] Build & push complete (branch: $DETECTED_BRANCH)" if [[ "$MODE" == "r" ]]; then
if $DO_TAG_AND_BUMP; then echo "[SUMMARY] Release build & push complete: $VERSION"
echo "[INFO] Release version: $NEW_VERSION"
else else
echo "[INFO] Test build (no version bump)" echo "[SUMMARY] Test build & push complete (:dev only)"
fi
if [[ -n "$BRANCH_INFO" ]]; then
echo "[INFO] Branch: $BRANCH_INFO HEAD: $HEAD_INFO"
fi fi
echo "[INFO] Images pushed:" echo "[INFO] Images pushed:"
for img in "${BUILT_IMAGES[@]}"; do for img in "${BUILT_IMAGES[@]}"; do
echo " - $img" echo " - $img"
done done
echo "============================================================" echo "============================================================"
echo ""
echo "[REMINDER] No git operations were performed. If this was a release,"
echo " commit and tag manually, e.g.:"
if [[ "$MODE" == "r" ]]; then
echo " git add -A && git commit -m \"Release ${VERSION}\""
echo " git tag -a ${VERSION} -m \"Release ${VERSION}\""
echo " git push && git push --tags"
fi

View File

@ -1,11 +1,33 @@
FROM python:3.12-slim FROM python:3.12-slim-bookworm
ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1 ENV PYTHONUNBUFFERED=1
ENV PYTHONPATH=/app/src ENV PYTHONPATH=/app/src
# Suppress PowerShell telemetry inside the container
ENV POWERSHELL_TELEMETRY_OPTOUT=1
ENV DOTNET_CLI_TELEMETRY_OPTOUT=1
WORKDIR /app WORKDIR /app
# ---------------------------------------------------------------------------
# PowerShell 7 + ExchangeOnlineManagement module
# Required for Exchange Online mailbox permission scanning.
# ---------------------------------------------------------------------------
RUN apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates curl \
&& curl -fsSL https://packages.microsoft.com/config/debian/12/packages-microsoft-prod.deb \
-o /tmp/packages-microsoft-prod.deb \
&& dpkg -i /tmp/packages-microsoft-prod.deb \
&& rm /tmp/packages-microsoft-prod.deb \
&& apt-get update \
&& apt-get install -y --no-install-recommends powershell \
&& pwsh -NoProfile -NonInteractive -Command \
"Set-PSRepository -Name PSGallery -InstallationPolicy Trusted; \
Install-Module -Name ExchangeOnlineManagement -Scope AllUsers -Force -AllowClobber" \
&& apt-get purge -y curl \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt ./requirements.txt COPY requirements.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt RUN pip install --no-cache-dir -r requirements.txt

View File

@ -4,6 +4,18 @@
selectedJobData: null, selectedJobData: null,
refreshTimer: null, refreshTimer: null,
tenants: [], tenants: [],
sharingLinkSelectionByJob: {},
currentRoute: 'dashboard',
};
const ROUTE_TITLES = {
'dashboard': 'Dashboard',
'jobs': 'Scan Jobs',
'scan-sharepoint': 'New SharePoint Scan',
'scan-mailbox': 'New Mailbox Scan',
'scan-entra': 'New Entra Group Scan',
'tenants': 'Tenants',
'settings': 'Settings',
}; };
const els = { const els = {
@ -24,6 +36,7 @@
scanAppDisplayName: document.getElementById('scanAppDisplayName'), scanAppDisplayName: document.getElementById('scanAppDisplayName'),
newTenantName: document.getElementById('newTenantName'), newTenantName: document.getElementById('newTenantName'),
newTenantTenantId: document.getElementById('newTenantTenantId'), newTenantTenantId: document.getElementById('newTenantTenantId'),
newTenantPrimaryDomain: document.getElementById('newTenantPrimaryDomain'),
newTenantClientId: document.getElementById('newTenantClientId'), newTenantClientId: document.getElementById('newTenantClientId'),
newTenantClientSecret: document.getElementById('newTenantClientSecret'), newTenantClientSecret: document.getElementById('newTenantClientSecret'),
saveTenantBtn: document.getElementById('saveTenantBtn'), saveTenantBtn: document.getElementById('saveTenantBtn'),
@ -43,11 +56,34 @@
csvFile: document.getElementById('csvFile'), csvFile: document.getElementById('csvFile'),
csvSkipDefaults: document.getElementById('csvSkipDefaults'), csvSkipDefaults: document.getElementById('csvSkipDefaults'),
submitFeedback: document.getElementById('submitFeedback'), submitFeedback: document.getElementById('submitFeedback'),
sharepointScanMode: document.getElementById('sharepointScanMode'),
// Mailbox scan panel
entraScanTenantSelect: document.getElementById('entraScanTenantSelect'),
manualEntraForm: document.getElementById('manualEntraForm'),
csvEntraForm: document.getElementById('csvEntraForm'),
allEntraForm: document.getElementById('allEntraForm'),
manualEntraIds: document.getElementById('manualEntraIds'),
csvEntraFile: document.getElementById('csvEntraFile'),
entraSubmitFeedback: document.getElementById('entraSubmitFeedback'),
mailboxScanTenantSelect: document.getElementById('mailboxScanTenantSelect'),
manualMailboxForm: document.getElementById('manualMailboxForm'),
csvMailboxForm: document.getElementById('csvMailboxForm'),
allMailboxesForm: document.getElementById('allMailboxesForm'),
allMailboxesOrg: document.getElementById('allMailboxesOrg'),
manualMailboxes: document.getElementById('manualMailboxes'),
csvMailboxFile: document.getElementById('csvMailboxFile'),
mailboxSubmitFeedback: document.getElementById('mailboxSubmitFeedback'),
// Jobs panel // Jobs panel
refreshJobsBtn: document.getElementById('refreshJobsBtn'), refreshJobsBtn: document.getElementById('refreshJobsBtn'),
jobTenantFilter: document.getElementById('jobTenantFilter'), jobTenantFilter: document.getElementById('jobTenantFilter'),
jobTypeFilter: document.getElementById('jobTypeFilter'),
jobsTableBody: document.getElementById('jobsTableBody'), jobsTableBody: document.getElementById('jobsTableBody'),
jobAutoRefresh: document.getElementById('jobAutoRefresh'), jobAutoRefresh: document.getElementById('jobAutoRefresh'),
// Sidebar / routing
contentTitle: document.getElementById('contentTitle'),
targetsTableHead: document.getElementById('targetsTableHead'),
targetsHeading: document.getElementById('targetsHeading'),
deviationsTableHead: document.getElementById('deviationsTableHead'),
// Job detail panel // Job detail panel
targetsTableBody: document.getElementById('targetsTableBody'), targetsTableBody: document.getElementById('targetsTableBody'),
deviationsTableBody: document.getElementById('deviationsTableBody'), deviationsTableBody: document.getElementById('deviationsTableBody'),
@ -60,6 +96,9 @@
sharingLinksTypes: document.getElementById('sharingLinksTypes'), sharingLinksTypes: document.getElementById('sharingLinksTypes'),
resolveSharingLinksBtn: document.getElementById('resolveSharingLinksBtn'), resolveSharingLinksBtn: document.getElementById('resolveSharingLinksBtn'),
resolveFeedback: document.getElementById('resolveFeedback'), resolveFeedback: document.getElementById('resolveFeedback'),
resolveGroupsBlock: document.getElementById('resolveGroupsBlock'),
resolveGroupsBtn: document.getElementById('resolveGroupsBtn'),
resolveGroupsFeedback: document.getElementById('resolveGroupsFeedback'),
// Hero stats // Hero stats
statTenants: document.getElementById('statTenants'), statTenants: document.getElementById('statTenants'),
statJobs: document.getElementById('statJobs'), statJobs: document.getElementById('statJobs'),
@ -97,6 +136,43 @@
return date.toLocaleString(); return date.toLocaleString();
} }
function renderProbeStatus(target) {
if (!target.last_probe_at) {
return '<span class="risk info">Not tested yet</span>';
}
const when = formatDate(target.last_probe_at);
const msg = target.last_probe_message || '';
if (target.last_probe_ok) {
return '<span class="risk ok" title="' + escHtml(msg) + '">OK</span> <span class="cell-members">' + escHtml(when) + '</span>';
}
return '<span class="risk critical" title="' + escHtml(msg) + '">Failed</span> <span class="cell-members">' + escHtml(when) + '</span><br><span class="cell-members">' + escHtml(msg) + '</span>';
}
async function testTargetConnection(targetId, button) {
if (!state.selectedJobId) return;
const originalLabel = button.textContent;
button.disabled = true;
button.textContent = 'Testing…';
try {
const resp = await fetch(
'/api/scan-jobs/' + encodeURIComponent(state.selectedJobId) +
'/targets/' + encodeURIComponent(targetId) + '/test-connection',
{ method: 'POST' }
);
if (!resp.ok) {
const body = await resp.text();
throw new Error('HTTP ' + resp.status + ': ' + body);
}
await resp.json();
await refreshSelectedJob();
} catch (err) {
window.alert('Test failed: ' + err.message);
} finally {
button.disabled = false;
button.textContent = originalLabel;
}
}
function statusBadge(status) { function statusBadge(status) {
const cls = status === 'completed' ? 'ok' const cls = status === 'completed' ? 'ok'
: status === 'running' ? 'warn' : status === 'running' ? 'warn'
@ -176,11 +252,9 @@
els.tenantsTableBody.querySelectorAll('[data-tenant-scan]').forEach(function (btn) { els.tenantsTableBody.querySelectorAll('[data-tenant-scan]').forEach(function (btn) {
btn.addEventListener('click', function () { btn.addEventListener('click', function () {
const id = btn.getAttribute('data-tenant-scan'); const id = btn.getAttribute('data-tenant-scan');
// Pre-select this tenant in the scan form
els.scanTenantSelect.value = id; els.scanTenantSelect.value = id;
onScanTenantChange(); onScanTenantChange();
// Scroll to scan panel navigateTo('scan-sharepoint');
els.manualScanForm.closest('.panel').scrollIntoView({ behavior: 'smooth' });
}); });
}); });
@ -198,7 +272,7 @@
} }
function populateTenantDropdowns() { function populateTenantDropdowns() {
// Scan tenant select // SharePoint scan tenant select (supports manual creds)
const scanVal = els.scanTenantSelect.value; const scanVal = els.scanTenantSelect.value;
els.scanTenantSelect.innerHTML = els.scanTenantSelect.innerHTML =
'<option value="">-- Select a tenant --</option>' + '<option value="">-- Select a tenant --</option>' +
@ -210,6 +284,36 @@
els.scanTenantSelect.value = scanVal; els.scanTenantSelect.value = scanVal;
} }
// Entra scan tenant select (cert required for Graph)
if (els.entraScanTenantSelect) {
const ev = els.entraScanTenantSelect.value;
els.entraScanTenantSelect.innerHTML =
'<option value="">-- Select a tenant --</option>' +
state.tenants.map(function (t) {
var label = escHtml(t.name);
if (!t.has_certificate) label += ' (no certificate)';
return '<option value="' + escHtml(t.id) + '"' + (t.has_certificate ? '' : ' disabled') + '>' + label + '</option>';
}).join('');
if (ev) els.entraScanTenantSelect.value = ev;
}
// Mailbox scan tenant select (cert only, no manual creds)
if (els.mailboxScanTenantSelect) {
const mbVal = els.mailboxScanTenantSelect.value;
els.mailboxScanTenantSelect.innerHTML =
'<option value="">-- Select a tenant --</option>' +
state.tenants.map(function (t) {
var label = escHtml(t.name);
if (!t.has_certificate) {
label += ' (no certificate)';
}
return '<option value="' + escHtml(t.id) + '"' + (t.has_certificate ? '' : ' disabled') + '>' + label + '</option>';
}).join('');
if (mbVal) {
els.mailboxScanTenantSelect.value = mbVal;
}
}
// Job tenant filter select // Job tenant filter select
const filterVal = els.jobTenantFilter.value; const filterVal = els.jobTenantFilter.value;
els.jobTenantFilter.innerHTML = els.jobTenantFilter.innerHTML =
@ -244,6 +348,7 @@
async function saveTenant() { async function saveTenant() {
const name = (els.newTenantName.value || '').trim(); const name = (els.newTenantName.value || '').trim();
const tenantId = (els.newTenantTenantId.value || '').trim(); const tenantId = (els.newTenantTenantId.value || '').trim();
const primaryDomain = (els.newTenantPrimaryDomain ? els.newTenantPrimaryDomain.value : '').trim().toLowerCase();
const clientId = (els.newTenantClientId.value || '').trim(); const clientId = (els.newTenantClientId.value || '').trim();
const clientSecret = (els.newTenantClientSecret.value || '').trim(); const clientSecret = (els.newTenantClientSecret.value || '').trim();
@ -256,7 +361,13 @@
await requestJson('/api/tenants', { await requestJson('/api/tenants', {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: name, tenant_id: tenantId, client_id: clientId, client_secret: clientSecret }), body: JSON.stringify({
name: name,
tenant_id: tenantId,
primary_domain: primaryDomain || null,
client_id: clientId,
client_secret: clientSecret,
}),
}); });
showFeedback(els.tenantFeedback, 'Tenant "' + name + '" saved.', 'ok'); showFeedback(els.tenantFeedback, 'Tenant "' + name + '" saved.', 'ok');
closeTenantForm(); closeTenantForm();
@ -287,6 +398,7 @@
els.addTenantBtn.removeAttribute('hidden'); els.addTenantBtn.removeAttribute('hidden');
els.newTenantName.value = ''; els.newTenantName.value = '';
els.newTenantTenantId.value = ''; els.newTenantTenantId.value = '';
if (els.newTenantPrimaryDomain) els.newTenantPrimaryDomain.value = '';
els.newTenantClientId.value = ''; els.newTenantClientId.value = '';
els.newTenantClientSecret.value = ''; els.newTenantClientSecret.value = '';
if (els.connectedTenantId) els.connectedTenantId.value = ''; if (els.connectedTenantId) els.connectedTenantId.value = '';
@ -371,6 +483,7 @@
if (status === 'connected') { if (status === 'connected') {
const tenantId = params.get('tenant_id') || ''; const tenantId = params.get('tenant_id') || '';
navigateTo('tenants');
openTenantForm(); openTenantForm();
if (tenantId && els.newTenantTenantId) { if (tenantId && els.newTenantTenantId) {
els.newTenantTenantId.value = tenantId; els.newTenantTenantId.value = tenantId;
@ -422,7 +535,8 @@
return; return;
} }
try { try {
const payload = Object.assign({ site_urls: urls, skip_default_sites: !!els.manualSkipDefaults.checked }, auth); const mode = (els.sharepointScanMode && els.sharepointScanMode.value) || 'sharepoint';
const payload = Object.assign({ scan_type: mode, site_urls: urls, skip_default_sites: !!els.manualSkipDefaults.checked }, auth);
const result = await requestJson('/api/scan-jobs', { const result = await requestJson('/api/scan-jobs', {
method: 'POST', method: 'POST',
headers: { 'Content-Type': 'application/json' }, headers: { 'Content-Type': 'application/json' },
@ -459,8 +573,10 @@
return; return;
} }
const skipDefaults = els.csvSkipDefaults.checked ? 'true' : 'false'; const skipDefaults = els.csvSkipDefaults.checked ? 'true' : 'false';
const mode = (els.sharepointScanMode && els.sharepointScanMode.value) || 'sharepoint';
const formData = new FormData(); const formData = new FormData();
formData.append('file', file); formData.append('file', file);
formData.append('scan_type', mode);
if (auth.tenant_profile_id) { if (auth.tenant_profile_id) {
formData.append('tenant_profile_id', auth.tenant_profile_id); formData.append('tenant_profile_id', auth.tenant_profile_id);
} else { } else {
@ -495,10 +611,14 @@
async function refreshJobs() { async function refreshJobs() {
const filterTenant = els.jobTenantFilter.value; const filterTenant = els.jobTenantFilter.value;
const filterType = els.jobTypeFilter ? els.jobTypeFilter.value : '';
let url = '/api/scan-jobs?limit=50'; let url = '/api/scan-jobs?limit=50';
if (filterTenant) { if (filterTenant) {
url += '&tenant_profile_id=' + encodeURIComponent(filterTenant); url += '&tenant_profile_id=' + encodeURIComponent(filterTenant);
} }
if (filterType) {
url += '&scan_type=' + encodeURIComponent(filterType);
}
const jobs = await requestJson(url); const jobs = await requestJson(url);
els.statJobs.textContent = String(jobs.length); els.statJobs.textContent = String(jobs.length);
@ -507,7 +627,7 @@
}).length); }).length);
if (!jobs.length) { if (!jobs.length) {
els.jobsTableBody.innerHTML = '<tr><td colspan="8">No jobs yet.</td></tr>'; els.jobsTableBody.innerHTML = '<tr><td colspan="9">No jobs yet.</td></tr>';
return; return;
} }
@ -522,9 +642,21 @@
const tenantLabel = job.tenant_name const tenantLabel = job.tenant_name
? '<span class="tenant-tag">' + escHtml(job.tenant_name) + '</span>' ? '<span class="tenant-tag">' + escHtml(job.tenant_name) + '</span>'
: '<span style="color:var(--cv-text-secondary);font-size:0.82rem">manual</span>'; : '<span style="color:var(--cv-text-secondary);font-size:0.82rem">manual</span>';
const scanType = job.scan_type || 'sharepoint';
var typeLabel;
if (scanType === 'mailbox') {
typeLabel = '<span class="risk info">Mailbox</span>';
} else if (scanType === 'sharepoint_root') {
typeLabel = '<span class="risk warn">SP Root</span>';
} else if (scanType === 'entra_groups') {
typeLabel = '<span class="risk high">Entra</span>';
} else {
typeLabel = '<span class="risk ok">SharePoint</span>';
}
return ( return (
'<tr>' + '<tr>' +
'<td><code>' + job.id + '</code></td>' + '<td><code>' + job.id + '</code></td>' +
'<td>' + typeLabel + '</td>' +
'<td>' + tenantLabel + '</td>' + '<td>' + tenantLabel + '</td>' +
'<td>' + job.source_type + '</td>' + '<td>' + job.source_type + '</td>' +
'<td>' + statusBadge(job.status) + '</td>' + '<td>' + statusBadge(job.status) + '</td>' +
@ -546,6 +678,7 @@
els.jobsTableBody.querySelectorAll('[data-job-inspect]').forEach(function (button) { els.jobsTableBody.querySelectorAll('[data-job-inspect]').forEach(function (button) {
button.addEventListener('click', function () { button.addEventListener('click', function () {
state.selectedJobId = button.getAttribute('data-job-inspect'); state.selectedJobId = button.getAttribute('data-job-inspect');
navigateTo('jobs');
refreshSelectedJob().catch(function () { refreshSelectedJob().catch(function () {
showFeedback(els.submitFeedback, 'Failed to load selected job details.', 'error'); showFeedback(els.submitFeedback, 'Failed to load selected job details.', 'error');
}); });
@ -586,7 +719,7 @@
state.selectedJobData = null; state.selectedJobData = null;
els.selectedJobId.textContent = 'No selection'; els.selectedJobId.textContent = 'No selection';
els.jobSummary.textContent = 'Select a job to inspect targets and deviations.'; els.jobSummary.textContent = 'Select a job to inspect targets and deviations.';
els.targetsTableBody.innerHTML = '<tr><td colspan="4">No job selected.</td></tr>'; els.targetsTableBody.innerHTML = '<tr><td colspan="6">No job selected.</td></tr>';
els.deviationsTableBody.innerHTML = '<tr><td colspan="6">No deviation data yet.</td></tr>'; els.deviationsTableBody.innerHTML = '<tr><td colspan="6">No deviation data yet.</td></tr>';
els.jobSiteFilter.innerHTML = '<option value="">All sites</option>'; els.jobSiteFilter.innerHTML = '<option value="">All sites</option>';
els.exportJobBtn.setAttribute('hidden', ''); els.exportJobBtn.setAttribute('hidden', '');
@ -599,6 +732,9 @@
} }
function renderJobTables(job) { function renderJobTables(job) {
const scanTypeNow = job.scan_type || 'sharepoint';
const isMailbox = scanTypeNow === 'mailbox';
const isEntra = scanTypeNow === 'entra_groups';
const siteFilter = els.jobSiteFilter.value; const siteFilter = els.jobSiteFilter.value;
const filteredTargets = siteFilter const filteredTargets = siteFilter
@ -609,18 +745,91 @@
? job.deviations.filter(function (d) { return d.site_url === siteFilter; }) ? job.deviations.filter(function (d) { return d.site_url === siteFilter; })
: job.deviations; : job.deviations;
// Header swap based on scan type
if (els.targetsHeading) {
els.targetsHeading.textContent = isMailbox ? 'Mailboxes' : isEntra ? 'Groups' : 'Targets';
}
if (els.targetsTableHead) {
var targetsHead;
if (isMailbox) {
targetsHead = '<tr><th>Mailbox</th><th>Status</th><th>Attempts</th><th>Error</th><th>Connection test</th><th></th></tr>';
} else if (isEntra) {
targetsHead = '<tr><th>Group</th><th>Status</th><th>Attempts</th><th>Error</th><th>Connection test</th><th></th></tr>';
} else {
targetsHead = '<tr><th>URL</th><th>Status</th><th>Attempts</th><th>Error</th><th>Connection test</th><th></th></tr>';
}
els.targetsTableHead.innerHTML = targetsHead;
}
if (els.deviationsTableHead) {
var devHead;
if (isMailbox) {
devHead = '<tr><th>Mailbox</th><th>Object</th><th>Permission Type</th><th>Principal</th><th>Access Rights</th><th></th></tr>';
} else if (isEntra) {
devHead = '<tr><th>Group</th><th>Group Type</th><th>User</th><th>Role</th><th></th><th></th></tr>';
} else {
devHead = '<tr><th>Site</th><th>Object</th><th>Type</th><th>Principal</th><th>Role</th><th>Delta</th></tr>';
}
els.deviationsTableHead.innerHTML = devHead;
}
// Hide SharingLinks/Resolve Groups for non-SharePoint jobs
if (isMailbox || isEntra) {
if (els.sharingLinksResolveBlock) els.sharingLinksResolveBlock.setAttribute('hidden', '');
if (els.resolveGroupsBlock) els.resolveGroupsBlock.setAttribute('hidden', '');
} else if (els.resolveGroupsBlock) {
els.resolveGroupsBlock.removeAttribute('hidden');
}
els.targetsTableBody.innerHTML = filteredTargets.length els.targetsTableBody.innerHTML = filteredTargets.length
? filteredTargets.map(function (target) { ? filteredTargets.map(function (target) {
return ( return (
'<tr>' + '<tr data-target-id="' + target.id + '">' +
'<td>' + escHtml(target.site_url) + '</td>' + '<td>' + escHtml(target.site_url) + '</td>' +
'<td>' + statusBadge(target.status) + '</td>' + '<td>' + statusBadge(target.status) + '</td>' +
'<td>' + target.attempts + '</td>' + '<td>' + target.attempts + '</td>' +
'<td>' + escHtml(target.error_message || '-') + '</td>' + '<td>' + escHtml(target.error_message || '-') + '</td>' +
'<td class="probe-cell">' + renderProbeStatus(target) + '</td>' +
'<td><button type="button" class="btn btn-outline btn-small probe-btn" data-target-id="' + target.id + '">Test</button></td>' +
'</tr>' '</tr>'
); );
}).join('') }).join('')
: '<tr><td colspan="4">No targets.</td></tr>'; : '<tr><td colspan="6">No targets.</td></tr>';
if (isMailbox) {
els.deviationsTableBody.innerHTML = filteredDeviations.length
? filteredDeviations.map(function (d) {
return (
'<tr>' +
'<td class="col-site">' + escHtml(d.site_url) + '</td>' +
'<td class="col-object">' + escHtml(d.object_url) + '</td>' +
'<td class="col-type">' + escHtml(d.permission_type || d.object_type) + '</td>' +
'<td class="col-principal">' + escHtml(d.principal) + '</td>' +
'<td class="col-role">' + escHtml(d.role_name) + '</td>' +
'<td></td>' +
'</tr>'
);
}).join('')
: '<tr><td colspan="6">No mailbox permissions found for this job.</td></tr>';
return;
}
if (isEntra) {
els.deviationsTableBody.innerHTML = filteredDeviations.length
? filteredDeviations.map(function (d) {
return (
'<tr>' +
'<td class="col-site">' + escHtml(d.object_url) + '</td>' +
'<td class="col-type">' + escHtml(d.permission_type || '') + '</td>' +
'<td class="col-principal">' + escHtml(d.principal) + '</td>' +
'<td class="col-role">' + escHtml(d.role_name) + '</td>' +
'<td></td>' +
'<td></td>' +
'</tr>'
);
}).join('')
: '<tr><td colspan="6">No group memberships found for this job.</td></tr>';
return;
}
els.deviationsTableBody.innerHTML = filteredDeviations.length els.deviationsTableBody.innerHTML = filteredDeviations.length
? filteredDeviations.map(function (deviation) { ? filteredDeviations.map(function (deviation) {
@ -639,7 +848,11 @@
principalCell = '<td class="col-principal" title="' + escHtml(deviation.principal) + '">' + badge + members + '</td>'; principalCell = '<td class="col-principal" title="' + escHtml(deviation.principal) + '">' + badge + members + '</td>';
} else { } else {
const principalShort = shortPrincipal(deviation.principal); const principalShort = shortPrincipal(deviation.principal);
principalCell = '<td class="col-principal"><span class="cell-truncate" title="' + escHtml(deviation.principal) + '">' + escHtml(principalShort) + '</span></td>'; var membersBlock = '';
if (deviation.resolved_members) {
membersBlock = '<br><span class="cell-members">' + escHtml(deviation.resolved_members) + '</span>';
}
principalCell = '<td class="col-principal"><span class="cell-truncate" title="' + escHtml(deviation.principal) + '">' + escHtml(principalShort) + '</span>' + membersBlock + '</td>';
} }
return ( return (
'<tr>' + '<tr>' +
@ -700,25 +913,43 @@
els.exportJobBtn.removeAttribute('hidden'); els.exportJobBtn.removeAttribute('hidden');
// Build resolve sharing links section // Build resolve sharing links section
_renderResolveBlock(job); await _renderResolveBlock(job);
renderJobTables(job); renderJobTables(job);
} }
function _renderResolveBlock(job) { async function _renderResolveBlock(job) {
// Preserve current selection before re-render (auto-refresh runs every few seconds).
if (state.selectedJobId === job.id) {
var currentSelected = Array.from(
els.sharingLinksTypes.querySelectorAll('.sharing-link-type-check:checked')
).map(function (cb) { return cb.value; });
state.sharingLinkSelectionByJob[job.id] = currentSelected;
}
if (job.status === 'queued' || job.status === 'running') { if (job.status === 'queued' || job.status === 'running') {
els.sharingLinksResolveBlock.setAttribute('hidden', ''); els.sharingLinksResolveBlock.setAttribute('hidden', '');
return; return;
} }
// Collect unique link types present in this job's deviations if ((job.scan_type || 'sharepoint') === 'mailbox') {
els.sharingLinksResolveBlock.setAttribute('hidden', '');
return;
}
var typeCounts = {}; var typeCounts = {};
try {
var typeData = await requestJson('/api/scan-jobs/' + encodeURIComponent(job.id) + '/sharing-link-types');
typeCounts = typeData.type_counts || {};
} catch (_err) {
// Fallback to currently loaded deviations when aggregate endpoint fails.
job.deviations.forEach(function (dev) { job.deviations.forEach(function (dev) {
var lt = sharingLinkType(dev.principal); var lt = sharingLinkType(dev.principal);
if (lt) { if (lt) {
typeCounts[lt] = (typeCounts[lt] || 0) + 1; typeCounts[lt] = (typeCounts[lt] || 0) + 1;
} }
}); });
}
var types = Object.keys(typeCounts); var types = Object.keys(typeCounts);
if (!types.length) { if (!types.length) {
@ -727,8 +958,15 @@
} }
types.sort(); types.sort();
var rememberedSelection = state.sharingLinkSelectionByJob[job.id];
els.sharingLinksTypes.innerHTML = types.map(function (lt) { els.sharingLinksTypes.innerHTML = types.map(function (lt) {
var checked = SHARING_LINK_DEFAULT_CHECKED.indexOf(lt) !== -1 ? 'checked' : ''; var isChecked;
if (Array.isArray(rememberedSelection)) {
isChecked = rememberedSelection.indexOf(lt) !== -1;
} else {
isChecked = SHARING_LINK_DEFAULT_CHECKED.indexOf(lt) !== -1;
}
var checked = isChecked ? 'checked' : '';
var riskCls = sharingLinkRiskClass(lt); var riskCls = sharingLinkRiskClass(lt);
return ( return (
'<label class="checkline">' + '<label class="checkline">' +
@ -739,6 +977,14 @@
); );
}).join(''); }).join('');
els.sharingLinksTypes.querySelectorAll('.sharing-link-type-check').forEach(function (cb) {
cb.addEventListener('change', function () {
state.sharingLinkSelectionByJob[job.id] = Array.from(
els.sharingLinksTypes.querySelectorAll('.sharing-link-type-check:checked')
).map(function (x) { return x.value; });
});
});
els.sharingLinksResolveBlock.removeAttribute('hidden'); els.sharingLinksResolveBlock.removeAttribute('hidden');
showFeedback(els.resolveFeedback, '', ''); showFeedback(els.resolveFeedback, '', '');
} }
@ -774,9 +1020,26 @@
// Link types that are resolved by default (checked in the UI) // Link types that are resolved by default (checked in the UI)
var SHARING_LINK_DEFAULT_CHECKED = ['AnonymousEdit', 'AnonymousView', 'Flexible']; var SHARING_LINK_DEFAULT_CHECKED = ['AnonymousEdit', 'AnonymousView', 'Flexible'];
function extractSharingLinkGroupName(principal) {
if (!principal) return null;
var text = String(principal).trim();
var segments = text.split('|').map(function (s) { return s.trim(); }).filter(Boolean);
for (var i = segments.length - 1; i >= 0; i -= 1) {
if (/^sharinglinks\./i.test(segments[i])) {
return segments[i];
}
}
if (/^sharinglinks\./i.test(text)) {
return text;
}
return null;
}
function sharingLinkType(principal) { function sharingLinkType(principal) {
if (!principal || !principal.startsWith('SharingLinks.')) return null; var groupName = extractSharingLinkGroupName(principal);
var parts = principal.split('.'); if (!groupName) return null;
var parts = groupName.split('.');
return parts.length >= 3 ? parts[2] : null; return parts.length >= 3 ? parts[2] : null;
} }
@ -872,6 +1135,14 @@
els.manualScanForm.addEventListener('submit', createManualJob); els.manualScanForm.addEventListener('submit', createManualJob);
els.csvScanForm.addEventListener('submit', createCsvJob); els.csvScanForm.addEventListener('submit', createCsvJob);
els.targetsTableBody.addEventListener('click', function (ev) {
var btn = ev.target.closest('.probe-btn');
if (!btn) return;
var targetId = btn.getAttribute('data-target-id');
if (!targetId) return;
testTargetConnection(targetId, btn);
});
els.refreshJobsBtn.addEventListener('click', function () { els.refreshJobsBtn.addEventListener('click', function () {
tick().catch(function () { tick().catch(function () {
showFeedback(els.submitFeedback, 'Refresh failed.', 'error'); showFeedback(els.submitFeedback, 'Refresh failed.', 'error');
@ -929,6 +1200,29 @@
}); });
}); });
if (els.resolveGroupsBtn) {
els.resolveGroupsBtn.addEventListener('click', function () {
if (!state.selectedJobId) return;
els.resolveGroupsBtn.disabled = true;
showFeedback(els.resolveGroupsFeedback, 'Resolving SharePoint groups…', '');
requestJson('/api/scan-jobs/' + encodeURIComponent(state.selectedJobId) + '/resolve-groups', {
method: 'POST',
}).then(function (result) {
showFeedback(
els.resolveGroupsFeedback,
result.resolved_groups + ' groups resolved, ' + result.skipped_groups + ' skipped (no readable members), ' +
result.updated_deviations + ' deviations updated.',
'ok'
);
return refreshSelectedJob();
}).catch(function (err) {
showFeedback(els.resolveGroupsFeedback, 'Resolve failed: ' + err.message, 'error');
}).finally(function () {
els.resolveGroupsBtn.disabled = false;
});
});
}
els.exportJobBtn.addEventListener('click', function () { els.exportJobBtn.addEventListener('click', function () {
if (!state.selectedJobId) return; if (!state.selectedJobId) return;
const siteFilter = els.jobSiteFilter.value; const siteFilter = els.jobSiteFilter.value;
@ -939,6 +1233,342 @@
window.location.href = url; window.location.href = url;
}); });
// -------------------------------------------------------------------------
// Mailbox scan creation
// -------------------------------------------------------------------------
function readMailboxScanAuth() {
const val = els.mailboxScanTenantSelect ? els.mailboxScanTenantSelect.value : '';
if (!val) {
throw new Error('Select a tenant profile with a certificate.');
}
return { tenant_profile_id: val };
}
async function createManualMailboxJob(event) {
event.preventDefault();
const upns = (els.manualMailboxes.value || '')
.split(/\r?\n/)
.map(function (line) { return line.trim().toLowerCase(); })
.filter(Boolean);
if (!upns.length) {
showFeedback(els.mailboxSubmitFeedback, 'Enter at least one UPN.', 'error');
return;
}
let auth;
try {
auth = readMailboxScanAuth();
} catch (err) {
showFeedback(els.mailboxSubmitFeedback, err.message, 'error');
return;
}
try {
const payload = Object.assign({ scan_type: 'mailbox', mailboxes: upns, skip_default_sites: false }, auth);
const result = await requestJson('/api/scan-jobs', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
showFeedback(
els.mailboxSubmitFeedback,
'Mailbox job queued: ' + result.job.id + ' | accepted=' + result.accepted_urls.length +
', invalid=' + result.invalid_urls.length,
'ok'
);
els.manualMailboxes.value = '';
state.selectedJobId = result.job.id;
navigateTo('jobs');
await refreshJobs();
await refreshSelectedJob();
} catch (err) {
showFeedback(els.mailboxSubmitFeedback, 'Mailbox scan failed: ' + err.message, 'error');
}
}
async function createCsvMailboxJob(event) {
event.preventDefault();
const file = els.csvMailboxFile.files && els.csvMailboxFile.files[0];
if (!file) {
showFeedback(els.mailboxSubmitFeedback, 'Select a CSV file first.', 'error');
return;
}
let auth;
try {
auth = readMailboxScanAuth();
} catch (err) {
showFeedback(els.mailboxSubmitFeedback, err.message, 'error');
return;
}
const formData = new FormData();
formData.append('file', file);
formData.append('scan_type', 'mailbox');
formData.append('tenant_profile_id', auth.tenant_profile_id);
try {
const result = await requestJson('/api/scan-jobs/import-csv?skip_default_sites=false', {
method: 'POST',
body: formData,
});
showFeedback(
els.mailboxSubmitFeedback,
'CSV mailbox job queued: ' + result.job.id + ' | accepted=' + result.accepted_urls.length +
', invalid=' + result.invalid_urls.length,
'ok'
);
els.csvMailboxFile.value = '';
state.selectedJobId = result.job.id;
navigateTo('jobs');
await refreshJobs();
await refreshSelectedJob();
} catch (err) {
showFeedback(els.mailboxSubmitFeedback, 'CSV import failed: ' + err.message, 'error');
}
}
async function createAllMailboxesJob(event) {
event.preventDefault();
const org = (els.allMailboxesOrg.value || '').trim().toLowerCase();
if (!org || org.indexOf('.') === -1) {
showFeedback(els.mailboxSubmitFeedback, 'Enter the tenant primary domain (e.g. contoso.onmicrosoft.com).', 'error');
return;
}
let auth;
try {
auth = readMailboxScanAuth();
} catch (err) {
showFeedback(els.mailboxSubmitFeedback, err.message, 'error');
return;
}
const submitBtn = els.allMailboxesForm.querySelector('button[type="submit"]');
if (submitBtn) submitBtn.disabled = true;
showFeedback(els.mailboxSubmitFeedback, 'Enumerating all mailboxes — this can take up to a minute…', '');
try {
const payload = Object.assign({
scan_type: 'mailbox',
scan_all_mailboxes: true,
organization: org,
skip_default_sites: false,
}, auth);
const result = await requestJson('/api/scan-jobs', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
showFeedback(
els.mailboxSubmitFeedback,
'All-mailboxes job queued: ' + result.job.id + ' | accepted=' + result.accepted_urls.length,
'ok'
);
state.selectedJobId = result.job.id;
navigateTo('jobs');
await refreshJobs();
await refreshSelectedJob();
} catch (err) {
showFeedback(els.mailboxSubmitFeedback, 'Scan-all failed: ' + err.message, 'error');
} finally {
if (submitBtn) submitBtn.disabled = false;
}
}
if (els.manualMailboxForm) {
els.manualMailboxForm.addEventListener('submit', createManualMailboxJob);
}
if (els.csvMailboxForm) {
els.csvMailboxForm.addEventListener('submit', createCsvMailboxJob);
}
if (els.allMailboxesForm) {
els.allMailboxesForm.addEventListener('submit', createAllMailboxesJob);
}
if (els.mailboxScanTenantSelect) {
els.mailboxScanTenantSelect.addEventListener('change', function () {
var id = els.mailboxScanTenantSelect.value;
var tenant = state.tenants.find(function (t) { return t.id === id; });
if (tenant && tenant.primary_domain && els.allMailboxesOrg) {
els.allMailboxesOrg.value = tenant.primary_domain;
}
});
}
if (els.jobTypeFilter) {
els.jobTypeFilter.addEventListener('change', function () {
tick().catch(function () { /* ignore */ });
});
}
// -------------------------------------------------------------------------
// Entra group scan creation
// -------------------------------------------------------------------------
function readEntraScanAuth() {
const val = els.entraScanTenantSelect ? els.entraScanTenantSelect.value : '';
if (!val) {
throw new Error('Select a tenant profile with a certificate.');
}
return { tenant_profile_id: val };
}
async function createManualEntraJob(event) {
event.preventDefault();
const ids = (els.manualEntraIds.value || '')
.split(/\r?\n/)
.map(function (s) { return s.trim(); })
.filter(Boolean);
if (!ids.length) {
showFeedback(els.entraSubmitFeedback, 'Enter at least one Object ID, mail, or display name.', 'error');
return;
}
let auth;
try { auth = readEntraScanAuth(); } catch (err) {
showFeedback(els.entraSubmitFeedback, err.message, 'error');
return;
}
try {
const payload = Object.assign({ scan_type: 'entra_groups', group_ids: ids, skip_default_sites: false }, auth);
const result = await requestJson('/api/scan-jobs', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
showFeedback(els.entraSubmitFeedback,
'Entra job queued: ' + result.job.id + ' | accepted=' + result.accepted_urls.length +
', invalid=' + result.invalid_urls.length, 'ok');
els.manualEntraIds.value = '';
state.selectedJobId = result.job.id;
navigateTo('jobs');
await refreshJobs();
await refreshSelectedJob();
} catch (err) {
showFeedback(els.entraSubmitFeedback, 'Entra scan failed: ' + err.message, 'error');
}
}
async function createCsvEntraJob(event) {
event.preventDefault();
const file = els.csvEntraFile.files && els.csvEntraFile.files[0];
if (!file) {
showFeedback(els.entraSubmitFeedback, 'Select a CSV file first.', 'error');
return;
}
let auth;
try { auth = readEntraScanAuth(); } catch (err) {
showFeedback(els.entraSubmitFeedback, err.message, 'error');
return;
}
const formData = new FormData();
formData.append('file', file);
formData.append('scan_type', 'entra_groups');
formData.append('tenant_profile_id', auth.tenant_profile_id);
try {
const result = await requestJson('/api/scan-jobs/import-csv?skip_default_sites=false', {
method: 'POST',
body: formData,
});
showFeedback(els.entraSubmitFeedback,
'CSV Entra job queued: ' + result.job.id + ' | accepted=' + result.accepted_urls.length, 'ok');
els.csvEntraFile.value = '';
state.selectedJobId = result.job.id;
navigateTo('jobs');
await refreshJobs();
await refreshSelectedJob();
} catch (err) {
showFeedback(els.entraSubmitFeedback, 'CSV import failed: ' + err.message, 'error');
}
}
async function createAllEntraJob(event) {
event.preventDefault();
let auth;
try { auth = readEntraScanAuth(); } catch (err) {
showFeedback(els.entraSubmitFeedback, err.message, 'error');
return;
}
const submitBtn = els.allEntraForm.querySelector('button[type="submit"]');
if (submitBtn) submitBtn.disabled = true;
showFeedback(els.entraSubmitFeedback, 'Enumerating all groups in tenant — this can take up to two minutes…', '');
try {
const payload = Object.assign({
scan_type: 'entra_groups',
scan_all_groups: true,
skip_default_sites: false,
}, auth);
const result = await requestJson('/api/scan-jobs', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
showFeedback(els.entraSubmitFeedback,
'All-groups job queued: ' + result.job.id + ' | accepted=' + result.accepted_urls.length, 'ok');
state.selectedJobId = result.job.id;
navigateTo('jobs');
await refreshJobs();
await refreshSelectedJob();
} catch (err) {
showFeedback(els.entraSubmitFeedback, 'Scan-all failed: ' + err.message, 'error');
} finally {
if (submitBtn) submitBtn.disabled = false;
}
}
if (els.manualEntraForm) els.manualEntraForm.addEventListener('submit', createManualEntraJob);
if (els.csvEntraForm) els.csvEntraForm.addEventListener('submit', createCsvEntraJob);
if (els.allEntraForm) els.allEntraForm.addEventListener('submit', createAllEntraJob);
// -------------------------------------------------------------------------
// Hash router
// -------------------------------------------------------------------------
function parseRoute() {
var hash = (window.location.hash || '').replace(/^#\/?/, '');
if (!hash) return 'dashboard';
if (hash.indexOf('/') !== -1) {
var parts = hash.split('/');
if (parts[0] === 'scan' && parts[1]) return 'scan-' + parts[1];
return parts[0];
}
return hash;
}
function applyRoute(route) {
if (!ROUTE_TITLES[route]) {
route = 'dashboard';
}
state.currentRoute = route;
document.querySelectorAll('.route-page').forEach(function (page) {
if (page.getAttribute('data-route-page') === route) {
page.removeAttribute('hidden');
} else {
page.setAttribute('hidden', '');
}
});
document.querySelectorAll('.sidebar-nav .nav-link').forEach(function (link) {
if (link.getAttribute('data-route') === route) {
link.classList.add('active');
} else {
link.classList.remove('active');
}
});
if (els.contentTitle) {
els.contentTitle.textContent = ROUTE_TITLES[route];
}
}
function navigateTo(route) {
var hash;
if (route === 'scan-sharepoint') hash = '#/scan/sharepoint';
else if (route === 'scan-mailbox') hash = '#/scan/mailbox';
else if (route === 'scan-entra') hash = '#/scan/entra';
else hash = '#/' + route;
if (window.location.hash !== hash) {
window.location.hash = hash;
} else {
applyRoute(route);
}
}
window.addEventListener('hashchange', function () {
applyRoute(parseRoute());
});
applyRoute(parseRoute());
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------
// Init // Init
// ------------------------------------------------------------------------- // -------------------------------------------------------------------------

View File

@ -0,0 +1,16 @@
<svg width="286" height="72" viewBox="0 0 286 72" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="logoTitleDark logoDescDark">
<title id="logoTitleDark">Clearview</title>
<desc id="logoDescDark">Clearview logo for dark backgrounds</desc>
<g transform="translate(0 2)">
<ellipse cx="34" cy="34" rx="34" ry="20" fill="#0EA5E9" fill-opacity="0.20"/>
<ellipse cx="34" cy="34" rx="34" ry="20" stroke="#38BDF8" stroke-width="2.4"/>
<circle cx="34" cy="34" r="12" fill="#0EA5E9" fill-opacity="0.30"/>
<circle cx="34" cy="34" r="12" stroke="#38BDF8" stroke-width="2"/>
<circle cx="34" cy="31" r="4" fill="#38BDF8"/>
<rect x="32" y="34" width="4" height="8" rx="2" fill="#38BDF8"/>
<path d="M8 22C16 14 25 10 34 10C43 10 52 14 60 22" stroke="#38BDF8" stroke-opacity="0.55" stroke-width="2"/>
</g>
<text x="80" y="44" font-size="36" font-weight="600" font-family="'Space Grotesk', 'Avenir Next', 'Segoe UI', sans-serif">
<tspan fill="#38BDF8">Clear</tspan><tspan fill="#F4F7FB">view</tspan>
</text>
</svg>

After

Width:  |  Height:  |  Size: 1.0 KiB

View File

@ -3,36 +3,64 @@
<head> <head>
<meta charset="utf-8"> <meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="viewport" content="width=device-width, initial-scale=1">
<title>Clearview | SharePoint Permission Deviations</title> <title>Clearview | Permission Deviations</title>
<meta name="description" content="Clearview scans SharePoint sites and reports only permission deviations from root level."> <meta name="description" content="Clearview scans Microsoft 365 SharePoint sites and Exchange Online mailboxes for permission deviations.">
<link rel="icon" href="assets/favicon.svg" type="image/svg+xml"> <link rel="icon" href="assets/favicon.svg" type="image/svg+xml">
<link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=IBM+Plex+Sans:wght@400;500;600&display=swap" rel="stylesheet"> <link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=IBM+Plex+Sans:wght@400;500;600&display=swap" rel="stylesheet">
<link rel="stylesheet" href="styles.css"> <link rel="stylesheet" href="styles.css">
</head> </head>
<body> <body class="app-shell">
<div class="bg-orb orb-one" aria-hidden="true"></div> <div class="bg-orb orb-one" aria-hidden="true"></div>
<div class="bg-orb orb-two" aria-hidden="true"></div> <div class="bg-orb orb-two" aria-hidden="true"></div>
<header class="topbar slide-in"> <aside class="sidebar">
<a href="#" class="brand" aria-label="Clearview home"> <div class="sidebar-brand">
<img src="assets/clearview-logo.svg" alt="Clearview logo" class="brand-logo"> <img src="assets/clearview-logo-dark.svg" alt="Clearview" class="brand-logo">
</a> </div>
<div class="topbar-actions"> <nav class="sidebar-nav">
<a href="#/dashboard" class="nav-link" data-route="dashboard">Dashboard</a>
<a href="#/jobs" class="nav-link" data-route="jobs">Scan Jobs</a>
<div class="nav-section">SharePoint</div>
<a href="#/scan/sharepoint" class="nav-link" data-route="scan-sharepoint">New SP Scan</a>
<div class="nav-section">Mailboxes</div>
<a href="#/scan/mailbox" class="nav-link" data-route="scan-mailbox">New Mailbox Scan</a>
<div class="nav-section">Entra</div>
<a href="#/scan/entra" class="nav-link" data-route="scan-entra">New Entra Scan</a>
<div class="nav-spacer"></div>
<a href="#/tenants" class="nav-link" data-route="tenants">Tenants</a>
<a href="#/settings" class="nav-link" data-route="settings">Settings</a>
</nav>
<div class="sidebar-foot">
<span class="sidebar-version">v0.1.0</span>
</div>
</aside>
<main class="content">
<header class="content-topbar">
<div class="content-title" id="contentTitle">Dashboard</div>
<div class="content-actions">
<button id="refreshJobsBtn" class="btn btn-outline" type="button">Refresh</button> <button id="refreshJobsBtn" class="btn btn-outline" type="button">Refresh</button>
</div> </div>
</header> </header>
<main class="layout"> <!-- =================================================================== -->
<section class="hero fade-up" style="--delay: 0.05s"> <!-- Route: Dashboard -->
<p class="eyebrow">Root Permission Drift Detection</p> <!-- =================================================================== -->
<h1>Monitor SharePoint permissions across all your customers</h1> <section class="route-page" data-route-page="dashboard">
<div class="hero fade-up">
<p class="eyebrow">Permission Drift Detection</p>
<h1>Monitor Microsoft 365 permissions across all customers</h1>
<p class="lede"> <p class="lede">
Clearview scans down to folder and file level and reports only rights that deviate from the Scan SharePoint sites for deviations from root permissions, and Exchange Online
root permissions of each site. mailboxes for delegated access (Full Access, Send As, Send on Behalf, folder delegations).
</p> </p>
<div class="hero-stats" id="heroStats"> <div class="hero-stats">
<article> <article>
<span class="kpi" id="statTenants">0</span> <span class="kpi" id="statTenants">0</span>
<span class="label">Tenants</span> <span class="label">Tenants</span>
@ -46,28 +74,29 @@
<span class="label">Active Jobs</span> <span class="label">Active Jobs</span>
</article> </article>
</div> </div>
</div>
</section> </section>
<!-- ------------------------------------------------------------------ --> <!-- =================================================================== -->
<!-- Tenants panel --> <!-- Route: Tenants -->
<!-- ------------------------------------------------------------------ --> <!-- =================================================================== -->
<section class="panel fade-up" style="--delay: 0.11s"> <section class="route-page" data-route-page="tenants" hidden>
<div class="panel">
<div class="panel-header split"> <div class="panel-header split">
<h2>Tenants</h2> <h2>Tenants</h2>
<button id="addTenantBtn" class="btn btn-outline" type="button">Add Tenant</button> <button id="addTenantBtn" class="btn btn-outline" type="button">Add Tenant</button>
</div> </div>
<!-- Add / Edit tenant form (hidden by default) -->
<div id="addTenantForm" class="scan-form" hidden> <div id="addTenantForm" class="scan-form" hidden>
<h3>New Tenant</h3> <h3>New Tenant</h3>
<!-- Automated onboarding -->
<div id="tenantSetupAutomated" class="setup-note" hidden> <div id="tenantSetupAutomated" class="setup-note" hidden>
<h3>Azure App Setup (automated)</h3> <h3>Azure App Setup (automated)</h3>
<p>Connect to the customer's Microsoft tenant, then create a dedicated scan app automatically.</p> <p>Connect to the customer's Microsoft tenant, then create a dedicated scan app automatically.</p>
<ul> <ul>
<li>Click <strong>Connect Microsoft</strong> and approve admin consent for the customer tenant</li> <li>Click <strong>Connect Microsoft</strong> and approve admin consent.</li>
<li>Created scan app receives SharePoint application permission: <code>Sites.FullControl.All</code></li> <li>Created scan app receives SharePoint <code>Sites.FullControl.All</code> with admin consent.</li>
<li>For mailbox scanning, the <strong>Exchange.ManageAsApp</strong> permission and <strong>Exchange Administrator</strong> Entra role must be added manually after creation — see the <em>Enable mailbox scanning</em> section below.</li>
</ul> </ul>
<form id="onboardingForm" class="onboarding-form" action="#" method="post"> <form id="onboardingForm" class="onboarding-form" action="#" method="post">
<div class="onboarding-grid"> <div class="onboarding-grid">
@ -87,21 +116,34 @@
</form> </form>
</div> </div>
<!-- Manual onboarding -->
<div id="tenantSetupManual" class="setup-note" hidden> <div id="tenantSetupManual" class="setup-note" hidden>
<h3>Azure App Setup (manual)</h3> <h3>Azure App Setup (manual)</h3>
<p>Create a dedicated Azure app registration in the customer's tenant and grant it SharePoint access.</p> <p>Create a dedicated Azure app registration in the customer's tenant.</p>
<ol class="setup-steps"> <ol class="setup-steps">
<li>Open <strong>Azure Portal</strong> and go to <strong>Entra ID &rarr; App registrations &rarr; New registration</strong>.</li> <li>Open <strong>Azure Portal</strong> <strong>Entra ID → App registrations → New registration</strong>.</li>
<li>Fill in a name (e.g. <em>Clearview Scan App</em>), select <strong>Single tenant</strong>, click <strong>Register</strong>.</li> <li>Pick a name (e.g. <em>Clearview Scan App</em>), select <strong>Single tenant</strong>, click <strong>Register</strong>.</li>
<li>Copy the <strong>Directory (tenant) ID</strong> and <strong>Application (client) ID</strong> from the Overview page.</li> <li>Copy <strong>Directory (tenant) ID</strong> and <strong>Application (client) ID</strong>.</li>
<li>Go to <strong>API permissions &rarr; Add &rarr; SharePoint &rarr; Application permissions</strong>, add <code>Sites.FullControl.All</code>.</li> <li>For SharePoint: <strong>API permissions → Add a permission → SharePoint → Application permissions</strong>, select <code>Sites.FullControl.All</code>, then click <strong>Grant admin consent</strong>.</li>
<li>Click <strong>Grant admin consent</strong>.</li> <li>For group resolution (recommended): also add <strong>Microsoft Graph → Application permissions → <code>Group.Read.All</code></strong> and grant admin consent. This lets Clearview expand Microsoft 365 / Azure AD security groups to their members and owners during the <em>Resolve groups</em> action. Without it, M365 group entries are kept as a single line.</li>
<li>Go to <strong>Certificates &amp; secrets &rarr; New client secret</strong>, copy the <strong>Value</strong> immediately.</li> <li>The primary domain is the tenant's default Microsoft 365 domain — typically <code>&lt;tenantname&gt;.onmicrosoft.com</code>. Find it in <strong>Microsoft 365 admin center → Settings → Domains</strong> (the <em>Default</em> entry).</li>
</ol> </ol>
</div> </div>
<!-- Tenant fields --> <div id="tenantSetupMailbox" class="setup-note">
<h3>Enable mailbox scanning (Exchange Online)</h3>
<p>Mailbox scanning needs additional permissions on the scan app, on top of the SharePoint setup. Skip this section if the tenant only needs SharePoint scans.</p>
<ol class="setup-steps">
<li><strong>Add the API permission.</strong> Azure Portal → <strong>Entra ID → App registrations → [your scan app] → API permissions → Add a permission → APIs my organization uses</strong>. Search for <em>Office 365 Exchange Online</em>, choose <strong>Application permissions</strong> and tick <code>Exchange.ManageAsApp</code>. Click <strong>Add permissions</strong>.</li>
<li><strong>Grant admin consent.</strong> Still on the API permissions page, click <strong>Grant admin consent for &lt;tenant&gt;</strong>. Verify the status column shows <em>Granted for &lt;tenant&gt;</em>.</li>
<li><strong>Assign the Exchange Administrator role.</strong> Entra ID → <strong>Roles and administrators</strong> → search <em>Exchange Administrator</em> → click the role → <strong>Add assignments</strong> → search the scan app by name (you'll need to switch the picker to include <em>Service principals / Apps</em>) → select it and confirm. This role grants the app the right to read mailbox permissions; it cannot be granted via Microsoft Graph and must be done in the portal.</li>
<li><strong>Generate a certificate.</strong> Save the tenant first (this section's form), then use the <strong>Certificate</strong> button in the Tenants table to generate a self-signed RSA-2048 key. The public PEM appears in a panel — click <strong>Download .cer</strong>.</li>
<li><strong>Upload the certificate to Azure.</strong> Back in the scan app, go to <strong>Certificates &amp; secrets → Certificates → Upload certificate</strong>, pick the downloaded <code>.cer</code> file, and confirm. Azure shows the SHA-1 thumbprint — it must match the one shown in the Tenants table.</li>
<li><strong>Fill in the Primary Domain field</strong> on the tenant form (e.g. <code>contoso.onmicrosoft.com</code>). Clearview uses this for <code>Connect-ExchangeOnline -Organization</code> and to auto-fill the Mailbox scan form.</li>
<li><strong>Test the connection.</strong> Run a <em>Scan all mailboxes</em> job for this tenant; preflight on the first target validates that authentication works end-to-end.</li>
</ol>
<p class="setup-hint">Exchange Online does <strong>not</strong> support client-secret app-only authentication. Mailbox scans require a certificate. The same certificate is reused for SharePoint scans, so generating it once is enough.</p>
</div>
<div class="auth-grid"> <div class="auth-grid">
<label class="onboarding-wide"> <label class="onboarding-wide">
Tenant Name (label for your reference) Tenant Name (label for your reference)
@ -111,12 +153,16 @@
Tenant ID Tenant ID
<input id="newTenantTenantId" type="text" placeholder="00000000-0000-0000-0000-000000000000"> <input id="newTenantTenantId" type="text" placeholder="00000000-0000-0000-0000-000000000000">
</label> </label>
<label>
Primary Domain <span style="font-weight:400;font-size:0.82rem">(used by mailbox scanning, e.g. contoso.onmicrosoft.com)</span>
<input id="newTenantPrimaryDomain" type="text" placeholder="contoso.onmicrosoft.com">
</label>
<label> <label>
Client ID Client ID
<input id="newTenantClientId" type="text" placeholder="00000000-0000-0000-0000-000000000000"> <input id="newTenantClientId" type="text" placeholder="00000000-0000-0000-0000-000000000000">
</label> </label>
<label class="auth-secret"> <label class="auth-secret">
Client Secret <span style="font-weight:400;font-size:0.82rem">(optional — not needed when using a certificate)</span> Client Secret <span style="font-weight:400;font-size:0.82rem">(optional — not needed when using a certificate; not supported for mailbox scans)</span>
<input id="newTenantClientSecret" type="password" placeholder="Leave empty if you will generate a certificate"> <input id="newTenantClientSecret" type="password" placeholder="Leave empty if you will generate a certificate">
</label> </label>
</div> </div>
@ -127,7 +173,6 @@
</div> </div>
</div> </div>
<!-- Tenants table -->
<div class="table-wrap"> <div class="table-wrap">
<table> <table>
<thead> <thead>
@ -148,10 +193,9 @@
<div id="tenantFeedback" class="feedback" aria-live="polite"></div> <div id="tenantFeedback" class="feedback" aria-live="polite"></div>
<!-- Certificate display block (shown after generation) -->
<div id="certBlock" class="cert-block" hidden> <div id="certBlock" class="cert-block" hidden>
<h3>Public Certificate</h3> <h3>Public Certificate</h3>
<p>Upload this certificate in <strong>Azure Portal &rarr; App registrations &rarr; [your app] &rarr; Certificates &amp; secrets &rarr; Certificates &rarr; Upload certificate</strong>.</p> <p>Upload this certificate in <strong>Azure Portal → App registrations → [your app] → Certificates &amp; secrets → Certificates → Upload certificate</strong>.</p>
<textarea id="certPem" class="cert-pem" rows="10" readonly></textarea> <textarea id="certPem" class="cert-pem" rows="10" readonly></textarea>
<div class="form-actions"> <div class="form-actions">
<button id="downloadCertBtn" class="btn btn-solid" type="button">Download .cer</button> <button id="downloadCertBtn" class="btn btn-solid" type="button">Download .cer</button>
@ -159,30 +203,46 @@
<button id="closeCertBtn" class="btn btn-outline" type="button">Close</button> <button id="closeCertBtn" class="btn btn-outline" type="button">Close</button>
</div> </div>
</div> </div>
</div>
</section> </section>
<!-- ------------------------------------------------------------------ --> <!-- =================================================================== -->
<!-- Start New Scan panel --> <!-- Route: Scan SharePoint -->
<!-- ------------------------------------------------------------------ --> <!-- =================================================================== -->
<section class="panel fade-up" style="--delay: 0.17s"> <section class="route-page" data-route-page="scan-sharepoint" hidden>
<div class="panel">
<div class="panel-header split"> <div class="panel-header split">
<h2>Start New Scan</h2> <h2>New SharePoint Scan</h2>
<span class="badge">Async job queue</span> <span class="badge">SharePoint</span>
</div>
<div class="scan-form auth-block">
<h3>Scan mode</h3>
<label>
What to collect
<select id="sharepointScanMode">
<option value="sharepoint">Deviations from root (libraries, folders, files)</option>
<option value="sharepoint_root">Root permissions only (site-level role assignments)</option>
</select>
</label>
<p class="setup-hint">
<strong>Deviations from root</strong> traverses every document library and reports only permissions that
differ from the site root baseline. <strong>Root permissions only</strong> lists the role assignments
on the site root itself — much faster, useful for an inventory of who has site-level access.
</p>
</div> </div>
<!-- Tenant selector -->
<div class="scan-form auth-block"> <div class="scan-form auth-block">
<h3>Tenant</h3> <h3>Tenant</h3>
<label> <label>
Select Tenant Profile Select Tenant Profile
<select id="scanTenantSelect"> <select id="scanTenantSelect" data-shared-tenant-select>
<option value="">-- Select a tenant --</option> <option value="">-- Select a tenant --</option>
<option value="__manual__">Manual credentials...</option> <option value="__manual__">Manual credentials...</option>
</select> </select>
</label> </label>
</div> </div>
<!-- Manual credentials (only shown when __manual__ selected) -->
<div id="manualCredentialsBlock" class="scan-form auth-block" hidden> <div id="manualCredentialsBlock" class="scan-form auth-block" hidden>
<h3>Microsoft App Credentials</h3> <h3>Microsoft App Credentials</h3>
<div class="auth-grid"> <div class="auth-grid">
@ -230,15 +290,148 @@
</div> </div>
<div id="submitFeedback" class="feedback" aria-live="polite"></div> <div id="submitFeedback" class="feedback" aria-live="polite"></div>
</div>
</section> </section>
<!-- ------------------------------------------------------------------ --> <!-- =================================================================== -->
<!-- Scan Jobs panel --> <!-- Route: Scan Mailbox -->
<!-- ------------------------------------------------------------------ --> <!-- =================================================================== -->
<section class="panel fade-up" style="--delay: 0.23s"> <section class="route-page" data-route-page="scan-mailbox" hidden>
<div class="panel">
<div class="panel-header split">
<h2>New Mailbox Scan</h2>
<span class="badge">Exchange Online</span>
</div>
<div class="scan-form auth-block">
<h3>Tenant</h3>
<label>
Select Tenant Profile
<select id="mailboxScanTenantSelect" data-shared-tenant-select>
<option value="">-- Select a tenant --</option>
</select>
</label>
<p class="setup-hint">
Mailbox scanning requires a certificate on the tenant profile and the
<code>Exchange.ManageAsApp</code> permission with the Exchange Administrator role.
Client-secret authentication is not supported for Exchange Online.
</p>
</div>
<div class="form-grid">
<form id="manualMailboxForm" class="scan-form" action="#" method="post">
<h3>Manual UPNs</h3>
<label>
User Principal Names (one per line)
<textarea id="manualMailboxes" rows="6" placeholder="alice@contoso.com&#10;bob@contoso.com"></textarea>
</label>
<button class="btn btn-solid" type="submit">Queue mailbox scan</button>
</form>
<form id="csvMailboxForm" class="scan-form" action="#" method="post" enctype="multipart/form-data">
<h3>CSV Import</h3>
<label>
CSV with <code>UserPrincipalName</code> / <code>Email</code> column
<input id="csvMailboxFile" type="file" accept=".csv,text/csv">
</label>
<button class="btn btn-solid" type="submit">Queue CSV scan</button>
</form>
<form id="allMailboxesForm" class="scan-form" action="#" method="post">
<h3>All mailboxes in tenant</h3>
<label>
Organization (primary tenant domain)
<input id="allMailboxesOrg" type="text" placeholder="contoso.onmicrosoft.com">
</label>
<p class="setup-hint">
Clearview enumerates every mailbox in the tenant via <code>Get-EXOMailbox -ResultSize Unlimited</code>
and queues one target per mailbox. Can take 1060 seconds for large tenants.
</p>
<button class="btn btn-solid" type="submit">Queue scan for all mailboxes</button>
</form>
</div>
<div id="mailboxSubmitFeedback" class="feedback" aria-live="polite"></div>
</div>
</section>
<!-- =================================================================== -->
<!-- Route: Scan Entra Groups -->
<!-- =================================================================== -->
<section class="route-page" data-route-page="scan-entra" hidden>
<div class="panel">
<div class="panel-header split">
<h2>New Entra Group Scan</h2>
<span class="badge">Microsoft Graph</span>
</div>
<div class="scan-form auth-block">
<h3>Tenant</h3>
<label>
Select Tenant Profile
<select id="entraScanTenantSelect">
<option value="">-- Select a tenant --</option>
</select>
</label>
<p class="setup-hint">
Entra group scans use the <strong>Microsoft Graph</strong> API. The scan app needs the
Application permission <code>Group.Read.All</code> with admin consent. Authentication
uses the same tenant certificate as SharePoint and Mailbox scans.
</p>
</div>
<div class="form-grid">
<form id="manualEntraForm" class="scan-form" action="#" method="post">
<h3>Manual Object IDs</h3>
<label>
Group identifiers (one per line — Object ID, mail address, or display name)
<textarea id="manualEntraIds" rows="6" placeholder="00000000-0000-0000-0000-000000000000&#10;Pharmacology@contoso.onmicrosoft.com"></textarea>
</label>
<button class="btn btn-solid" type="submit">Queue Entra scan</button>
</form>
<form id="csvEntraForm" class="scan-form" action="#" method="post" enctype="multipart/form-data">
<h3>CSV Import (Entra export)</h3>
<label>
CSV with <code>Object ID</code> column (Entra "Groups" export)
<input id="csvEntraFile" type="file" accept=".csv,text/csv">
</label>
<p class="setup-hint">
Export from Entra portal → Groups → All groups → Download. Clearview reads the
<code>Object ID</code> / <code>id</code> column; other columns are ignored.
</p>
<button class="btn btn-solid" type="submit">Queue CSV scan</button>
</form>
<form id="allEntraForm" class="scan-form" action="#" method="post">
<h3>All groups in tenant</h3>
<p class="setup-hint">
Enumerates every group in the tenant (any type) via Microsoft Graph and queues one
target per group. Can take 30120 seconds for large tenants.
</p>
<button class="btn btn-solid" type="submit">Queue scan for all groups</button>
</form>
</div>
<div id="entraSubmitFeedback" class="feedback" aria-live="polite"></div>
</div>
</section>
<!-- =================================================================== -->
<!-- Route: Jobs (list + selected job details) -->
<!-- =================================================================== -->
<section class="route-page" data-route-page="jobs" hidden>
<div class="panel">
<div class="panel-header split"> <div class="panel-header split">
<h2>Scan Jobs</h2> <h2>Scan Jobs</h2>
<div class="panel-header-right"> <div class="panel-header-right">
<select id="jobTypeFilter" class="filter-select">
<option value="">All types</option>
<option value="sharepoint">SharePoint deviations</option>
<option value="sharepoint_root">SharePoint root</option>
<option value="mailbox">Mailbox</option>
<option value="entra_groups">Entra groups</option>
</select>
<select id="jobTenantFilter" class="filter-select"> <select id="jobTenantFilter" class="filter-select">
<option value="">All tenants</option> <option value="">All tenants</option>
</select> </select>
@ -250,6 +443,7 @@
<thead> <thead>
<tr> <tr>
<th>Job ID</th> <th>Job ID</th>
<th>Type</th>
<th>Tenant</th> <th>Tenant</th>
<th>Source</th> <th>Source</th>
<th>Status</th> <th>Status</th>
@ -260,21 +454,18 @@
</tr> </tr>
</thead> </thead>
<tbody id="jobsTableBody"> <tbody id="jobsTableBody">
<tr><td colspan="8">No jobs yet.</td></tr> <tr><td colspan="9">No jobs yet.</td></tr>
</tbody> </tbody>
</table> </table>
</div> </div>
</section> </div>
<!-- ------------------------------------------------------------------ --> <div class="panel">
<!-- Selected Job Details panel -->
<!-- ------------------------------------------------------------------ -->
<section class="panel fade-up" style="--delay: 0.29s">
<div class="panel-header split"> <div class="panel-header split">
<h2>Selected Job Details</h2> <h2>Selected Job Details</h2>
<div class="panel-header-right"> <div class="panel-header-right">
<select id="jobSiteFilter" class="filter-select"> <select id="jobSiteFilter" class="filter-select">
<option value="">All sites</option> <option value="">All targets</option>
</select> </select>
<button id="exportJobBtn" class="btn btn-outline" type="button" hidden>Export Excel</button> <button id="exportJobBtn" class="btn btn-outline" type="button" hidden>Export Excel</button>
<span id="selectedJobId" class="badge">No selection</span> <span id="selectedJobId" class="badge">No selection</span>
@ -284,19 +475,21 @@
<div id="jobSummary" class="job-summary">Select a job to inspect targets and deviations.</div> <div id="jobSummary" class="job-summary">Select a job to inspect targets and deviations.</div>
<div id="jobActivity" class="job-activity" hidden></div> <div id="jobActivity" class="job-activity" hidden></div>
<h3 class="subheading">Targets</h3> <h3 class="subheading" id="targetsHeading">Targets</h3>
<div class="table-wrap compact-wrap"> <div class="table-wrap compact-wrap">
<table> <table>
<thead> <thead id="targetsTableHead">
<tr> <tr>
<th>URL</th> <th>URL</th>
<th>Status</th> <th>Status</th>
<th>Attempts</th> <th>Attempts</th>
<th>Error</th> <th>Error</th>
<th>Connection test</th>
<th></th>
</tr> </tr>
</thead> </thead>
<tbody id="targetsTableBody"> <tbody id="targetsTableBody">
<tr><td colspan="4">No job selected.</td></tr> <tr><td colspan="6">No job selected.</td></tr>
</tbody> </tbody>
</table> </table>
</div> </div>
@ -311,10 +504,25 @@
<div id="resolveFeedback" class="feedback" aria-live="polite"></div> <div id="resolveFeedback" class="feedback" aria-live="polite"></div>
</div> </div>
<div id="resolveGroupsBlock" hidden>
<h3 class="subheading">Resolve SharePoint Groups</h3>
<p class="resolve-hint">
Expand SharePoint groups (Owners / Members / Visitors / custom site groups) to the underlying
user list. When a member is itself a Microsoft 365 / Azure AD group, Clearview recursively
expands it via Microsoft Graph (members + owners, depth 3) — requires
<code>Group.Read.All</code> on Microsoft Graph for that tenant. Without that permission the
M365 group lines stay collapsed. Members are written to the deviation rows and Excel export.
</p>
<div class="form-actions" style="margin-top:0.6rem">
<button id="resolveGroupsBtn" class="btn btn-outline" type="button">Resolve groups</button>
</div>
<div id="resolveGroupsFeedback" class="feedback" aria-live="polite"></div>
</div>
<h3 class="subheading">Permission Deviations</h3> <h3 class="subheading">Permission Deviations</h3>
<div class="table-wrap deviations-wrap"> <div class="table-wrap deviations-wrap">
<table> <table>
<thead> <thead id="deviationsTableHead">
<tr> <tr>
<th>Site</th> <th>Site</th>
<th>Object</th> <th>Object</th>
@ -329,6 +537,19 @@
</tbody> </tbody>
</table> </table>
</div> </div>
</div>
</section>
<!-- =================================================================== -->
<!-- Route: Settings (placeholder) -->
<!-- =================================================================== -->
<section class="route-page" data-route-page="settings" hidden>
<div class="panel">
<div class="panel-header split">
<h2>Settings</h2>
</div>
<p class="setup-hint">Runtime configuration is currently controlled via environment variables in <code>stack/.env</code>. See the <strong>TECHNICAL.md</strong> document for the full list (timeouts, retries, scan caps, onboarding).</p>
</div>
</section> </section>
</main> </main>

View File

@ -642,3 +642,157 @@ strong {
flex: 1; flex: 1;
} }
} }
/* ===========================================================================
Sidebar layout (added in mailbox-scanning refactor)
=========================================================================== */
.app-shell {
display: grid;
grid-template-columns: 220px 1fr;
min-height: 100vh;
}
.sidebar {
background: linear-gradient(180deg, #0f1d33 0%, #0b1424 100%);
color: #e6edf7;
display: flex;
flex-direction: column;
padding: 0;
position: sticky;
top: 0;
height: 100vh;
border-right: 1px solid rgba(255, 255, 255, 0.06);
}
.sidebar-brand {
height: 64px;
display: flex;
align-items: center;
padding: 0 1rem;
border-bottom: 1px solid rgba(255, 255, 255, 0.08);
}
.sidebar-brand .brand-logo {
height: 36px;
filter: brightness(1.05) saturate(1.1);
}
.sidebar-nav {
flex: 1;
display: flex;
flex-direction: column;
gap: 0.15rem;
padding: 0.75rem 0.5rem;
overflow-y: auto;
}
.sidebar-nav .nav-link {
display: block;
padding: 0.5rem 0.75rem;
border-radius: 8px;
color: rgba(230, 237, 247, 0.85);
text-decoration: none;
font-size: 0.9rem;
font-weight: 500;
transition: background-color 0.12s ease, color 0.12s ease;
}
.sidebar-nav .nav-link:hover {
background: rgba(255, 255, 255, 0.06);
color: #ffffff;
}
.sidebar-nav .nav-link.active {
background: rgba(14, 165, 233, 0.18);
color: #ffffff;
box-shadow: inset 2px 0 0 var(--cv-accent);
}
.sidebar-nav .nav-section {
padding: 0.65rem 0.75rem 0.25rem;
font-size: 0.7rem;
font-weight: 700;
letter-spacing: 0.08em;
text-transform: uppercase;
color: rgba(230, 237, 247, 0.45);
}
.sidebar-nav .nav-spacer {
flex: 1;
min-height: 1rem;
}
.sidebar-foot {
padding: 0.75rem 1rem;
border-top: 1px solid rgba(255, 255, 255, 0.08);
font-size: 0.78rem;
color: rgba(230, 237, 247, 0.55);
}
.content {
display: flex;
flex-direction: column;
min-width: 0;
padding: 0 1.25rem 2rem;
}
.content-topbar {
display: flex;
align-items: center;
justify-content: space-between;
height: 64px;
border-bottom: 1px solid var(--cv-border);
margin-bottom: 1rem;
}
.content-title {
font-family: "Space Grotesk", sans-serif;
font-size: 1.15rem;
font-weight: 600;
}
.content-actions {
display: flex;
gap: 0.5rem;
}
.route-page {
display: flex;
flex-direction: column;
gap: 1rem;
}
.route-page[hidden] {
display: none !important;
}
.setup-hint {
font-size: 0.85rem;
color: var(--cv-text-secondary);
margin: 0.5rem 0 0;
}
@media (max-width: 900px) {
.app-shell {
grid-template-columns: 1fr;
}
.sidebar {
position: static;
height: auto;
flex-direction: row;
flex-wrap: wrap;
}
.sidebar-nav {
flex-direction: row;
flex-wrap: wrap;
overflow-x: auto;
}
.sidebar-nav .nav-section,
.sidebar-nav .nav-spacer {
display: none;
}
.sidebar-foot {
display: none;
}
}

View File

@ -2,13 +2,18 @@ from __future__ import annotations
import csv import csv
import io import io
import re
from .default_sites import normalize_site_url from .default_sites import normalize_site_url
_EMAIL_RE = re.compile(r"^[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}$")
class CsvImportResult: class CsvImportResult:
def __init__(self) -> None: def __init__(self) -> None:
self.urls: list[str] = [] self.urls: list[str] = []
self.mailboxes: list[str] = []
self.invalid_rows: list[str] = [] self.invalid_rows: list[str] = []
self.total_rows: int = 0 self.total_rows: int = 0
@ -22,7 +27,7 @@ def parse_sites_csv(content: bytes) -> CsvImportResult:
if not reader.fieldnames: if not reader.fieldnames:
return result return result
url_key = _resolve_url_column(reader.fieldnames) url_key = _resolve_column(reader.fieldnames, ("url", "site url", "siteurl"))
if not url_key: if not url_key:
return result return result
@ -49,9 +54,80 @@ def parse_sites_csv(content: bytes) -> CsvImportResult:
return result return result
def _resolve_url_column(fieldnames: list[str]) -> str | None: def parse_entra_groups_csv(content: bytes) -> CsvImportResult:
result = CsvImportResult()
text = content.decode("utf-8-sig", errors="replace")
reader = csv.DictReader(io.StringIO(text))
if not reader.fieldnames:
return result
id_key = _resolve_column(
reader.fieldnames,
("object id", "objectid", "id", "objectguid", "object_id"),
)
if not id_key:
return result
seen: set[str] = set()
for idx, row in enumerate(reader, start=2):
result.total_rows += 1
raw = (row.get(id_key) or "").strip()
if not raw:
result.invalid_rows.append(f"row {idx}: empty Object ID")
continue
normalized = raw.lower()
if normalized in seen:
continue
seen.add(normalized)
result.urls.append(normalized)
return result
def parse_mailboxes_csv(content: bytes) -> CsvImportResult:
result = CsvImportResult()
text = content.decode("utf-8-sig", errors="replace")
reader = csv.DictReader(io.StringIO(text))
if not reader.fieldnames:
return result
upn_key = _resolve_column(
reader.fieldnames,
("userprincipalname", "upn", "email", "emailaddress", "mail", "mailbox", "primary smtp address"),
)
if not upn_key:
return result
seen: set[str] = set()
for idx, row in enumerate(reader, start=2):
result.total_rows += 1
raw = (row.get(upn_key) or "").strip()
if not raw:
result.invalid_rows.append(f"row {idx}: empty mailbox")
continue
normalized = raw.lower()
if not _EMAIL_RE.match(normalized):
result.invalid_rows.append(f"row {idx}: invalid mailbox '{raw}'")
continue
if normalized in seen:
continue
seen.add(normalized)
result.mailboxes.append(normalized)
return result
def _resolve_column(fieldnames: list[str], candidates: tuple[str, ...]) -> str | None:
mapping = {name.strip().lower(): name for name in fieldnames} mapping = {name.strip().lower(): name for name in fieldnames}
for candidate in ("url", "site url", "siteurl"): for candidate in candidates:
if candidate in mapping: if candidate in mapping:
return mapping[candidate] return mapping[candidate]
return None return None

View File

@ -1,18 +1,18 @@
from __future__ import annotations from __future__ import annotations
import io
import re
import uuid import uuid
from datetime import datetime from datetime import datetime
from pathlib import Path from pathlib import Path
import io
from fastapi import FastAPI, File, Form, HTTPException, UploadFile from fastapi import FastAPI, File, Form, HTTPException, UploadFile
from fastapi.responses import FileResponse, RedirectResponse, Response, StreamingResponse from fastapi.responses import FileResponse, RedirectResponse, Response, StreamingResponse
from fastapi.staticfiles import StaticFiles from fastapi.staticfiles import StaticFiles
from sqlalchemy import select, text from sqlalchemy import select, text
from sqlalchemy.orm import joinedload from sqlalchemy.orm import joinedload
from .csv_import import parse_sites_csv from .csv_import import parse_entra_groups_csv, parse_mailboxes_csv, parse_sites_csv
from .db import SessionLocal, engine from .db import SessionLocal, engine
from .default_sites import is_default_site, normalize_site_url from .default_sites import is_default_site, normalize_site_url
from .models import Base, PermissionDeviation, ScanJob, ScanTarget, TenantProfile from .models import Base, PermissionDeviation, ScanJob, ScanTarget, TenantProfile
@ -25,8 +25,11 @@ from .schemas import (
CreateScanJobRequest, CreateScanJobRequest,
CreateTenantProfileRequest, CreateTenantProfileRequest,
PermissionDeviationItem, PermissionDeviationItem,
ProbeResultResponse,
ResolveGroupsResponse,
ResolveSharingLinksRequest, ResolveSharingLinksRequest,
ResolveSharingLinksResponse, ResolveSharingLinksResponse,
SharingLinkTypesResponse,
ScanJobCreateResponse, ScanJobCreateResponse,
ScanJobDetail, ScanJobDetail,
ScanJobSummary, ScanJobSummary,
@ -34,7 +37,7 @@ from .schemas import (
TenantCertificateResponse, TenantCertificateResponse,
TenantProfileItem, TenantProfileItem,
) )
from .scanner import AuthConfig from .scanners import AuthConfig, probe
from .worker import ScanWorker from .worker import ScanWorker
app = FastAPI(title="Clearview API", version="0.1.0") app = FastAPI(title="Clearview API", version="0.1.0")
@ -43,6 +46,34 @@ worker = ScanWorker()
SITE_DIR = Path(__file__).resolve().parents[2] / "site" SITE_DIR = Path(__file__).resolve().parents[2] / "site"
def _extract_sharing_link_group_and_type(principal: str) -> tuple[str, str] | None:
"""
Extract (group_name, link_type) from principal values such as:
- SharingLinks.<guid>.<LinkType>.<guid>
- c:0o.c|federateddirectoryclaimprovider|SharingLinks.<guid>.<LinkType>.<guid>
"""
if not principal:
return None
text = principal.strip()
segments = [s.strip() for s in text.split("|") if s.strip()]
candidate = ""
for segment in reversed(segments):
if segment.lower().startswith("sharinglinks."):
candidate = segment
break
if not candidate and text.lower().startswith("sharinglinks."):
candidate = text
if not candidate:
return None
parts = candidate.split(".")
if len(parts) < 3:
return None
return candidate, parts[2]
@app.on_event("startup") @app.on_event("startup")
def on_startup() -> None: def on_startup() -> None:
Base.metadata.create_all(bind=engine) Base.metadata.create_all(bind=engine)
@ -81,6 +112,7 @@ def create_tenant(payload: CreateTenantProfileRequest) -> TenantProfileItem:
id=str(uuid.uuid4()), id=str(uuid.uuid4()),
name=payload.name.strip(), name=payload.name.strip(),
tenant_id=payload.tenant_id.strip(), tenant_id=payload.tenant_id.strip(),
primary_domain=payload.primary_domain.strip().lower() if payload.primary_domain else None,
client_id=payload.client_id.strip(), client_id=payload.client_id.strip(),
client_secret=payload.client_secret.strip() if payload.client_secret else None, client_secret=payload.client_secret.strip() if payload.client_secret else None,
created_at=now, created_at=now,
@ -100,6 +132,7 @@ def generate_certificate(profile_id: str) -> TenantCertificateResponse:
raise HTTPException(status_code=404, detail="Tenant profile not found") raise HTTPException(status_code=404, detail="Tenant profile not found")
result = generate_tenant_certificate() result = generate_tenant_certificate()
profile.cert_private_key = result.private_key_pem profile.cert_private_key = result.private_key_pem
profile.cert_public_pem = result.public_cert_pem
profile.cert_thumbprint = result.thumbprint profile.cert_thumbprint = result.thumbprint
profile.cert_expires_at = result.expires_at profile.cert_expires_at = result.expires_at
profile.updated_at = datetime.utcnow() profile.updated_at = datetime.utcnow()
@ -141,11 +174,43 @@ def create_scan_job(payload: CreateScanJobRequest) -> ScanJobCreateResponse:
client_id=payload.client_id, client_id=payload.client_id,
client_secret=payload.client_secret, client_secret=payload.client_secret,
) )
raw_urls = [str(item) for item in payload.site_urls] source_type = "manual"
return _create_job_from_urls( if payload.scan_type == "entra_groups":
raw_urls=raw_urls, if payload.scan_all_groups:
raw_targets = _enumerate_all_entra_groups(
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret,
profile_id=profile_id,
)
source_type = "tenant_all"
else:
raw_targets = [str(g) for g in payload.group_ids]
elif payload.scan_type == "mailbox":
if payload.scan_all_mailboxes:
organization = payload.organization
if (not organization) and profile_id:
with SessionLocal() as db:
profile = db.get(TenantProfile, profile_id)
if profile and profile.primary_domain:
organization = profile.primary_domain
raw_targets = _enumerate_all_mailboxes(
organization=organization,
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret,
profile_id=profile_id,
)
source_type = "tenant_all"
else:
raw_targets = [str(m) for m in payload.mailboxes]
else:
raw_targets = [str(item) for item in payload.site_urls]
return _create_job_from_targets(
raw_targets=raw_targets,
scan_type=payload.scan_type,
skip_default_sites=payload.skip_default_sites, skip_default_sites=payload.skip_default_sites,
source_type="manual", source_type=source_type,
tenant_id=tenant_id, tenant_id=tenant_id,
client_id=client_id, client_id=client_id,
client_secret=client_secret, client_secret=client_secret,
@ -156,6 +221,7 @@ def create_scan_job(payload: CreateScanJobRequest) -> ScanJobCreateResponse:
@app.post("/api/scan-jobs/import-csv", response_model=ScanJobCreateResponse) @app.post("/api/scan-jobs/import-csv", response_model=ScanJobCreateResponse)
def create_scan_job_from_csv( def create_scan_job_from_csv(
skip_default_sites: bool = True, skip_default_sites: bool = True,
scan_type: str = Form("sharepoint"),
tenant_profile_id: str | None = Form(None), tenant_profile_id: str | None = Form(None),
tenant_id: str | None = Form(None), tenant_id: str | None = Form(None),
client_id: str | None = Form(None), client_id: str | None = Form(None),
@ -171,9 +237,18 @@ def create_scan_job_from_csv(
client_secret=client_secret, client_secret=client_secret,
) )
content = file.file.read() content = file.file.read()
if scan_type == "mailbox":
parsed = parse_mailboxes_csv(content)
targets = parsed.mailboxes
elif scan_type == "entra_groups":
parsed = parse_entra_groups_csv(content)
targets = parsed.urls
else:
parsed = parse_sites_csv(content) parsed = parse_sites_csv(content)
response = _create_job_from_urls( targets = parsed.urls
raw_urls=parsed.urls, response = _create_job_from_targets(
raw_targets=targets,
scan_type=scan_type,
skip_default_sites=skip_default_sites, skip_default_sites=skip_default_sites,
source_type="csv", source_type="csv",
tenant_id=resolved_tenant_id, tenant_id=resolved_tenant_id,
@ -234,7 +309,11 @@ def delete_scan_job(job_id: str) -> Response:
@app.get("/api/scan-jobs", response_model=list[ScanJobSummary]) @app.get("/api/scan-jobs", response_model=list[ScanJobSummary])
def list_scan_jobs(limit: int = 20, tenant_profile_id: str | None = None) -> list[ScanJobSummary]: def list_scan_jobs(
limit: int = 20,
tenant_profile_id: str | None = None,
scan_type: str | None = None,
) -> list[ScanJobSummary]:
with SessionLocal() as db: with SessionLocal() as db:
stmt = ( stmt = (
select(ScanJob) select(ScanJob)
@ -244,10 +323,36 @@ def list_scan_jobs(limit: int = 20, tenant_profile_id: str | None = None) -> lis
) )
if tenant_profile_id: if tenant_profile_id:
stmt = stmt.where(ScanJob.tenant_profile_id == tenant_profile_id) stmt = stmt.where(ScanJob.tenant_profile_id == tenant_profile_id)
if scan_type:
stmt = stmt.where(ScanJob.scan_type == scan_type)
jobs = list(db.execute(stmt).unique().scalars()) jobs = list(db.execute(stmt).unique().scalars())
return [_to_job_summary(job) for job in jobs] return [_to_job_summary(job) for job in jobs]
@app.get("/api/scan-jobs/{job_id}/sharing-link-types", response_model=SharingLinkTypesResponse)
def get_sharing_link_types(job_id: str) -> SharingLinkTypesResponse:
with SessionLocal() as db:
job = db.get(ScanJob, job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
principals = list(
db.execute(
select(PermissionDeviation.principal).where(PermissionDeviation.job_id == job_id)
).scalars()
)
type_counts: dict[str, int] = {}
for principal in principals:
parsed = _extract_sharing_link_group_and_type(str(principal or ""))
if not parsed:
continue
_group_name, link_type = parsed
type_counts[link_type] = type_counts.get(link_type, 0) + 1
return SharingLinkTypesResponse(type_counts=type_counts)
@app.post("/api/scan-jobs/{job_id}/resolve-sharing-links", response_model=ResolveSharingLinksResponse) @app.post("/api/scan-jobs/{job_id}/resolve-sharing-links", response_model=ResolveSharingLinksResponse)
def resolve_sharing_links_endpoint(job_id: str, payload: ResolveSharingLinksRequest) -> ResolveSharingLinksResponse: def resolve_sharing_links_endpoint(job_id: str, payload: ResolveSharingLinksRequest) -> ResolveSharingLinksResponse:
from .scanner import resolve_sharing_link_members from .scanner import resolve_sharing_link_members
@ -261,11 +366,13 @@ def resolve_sharing_links_endpoint(job_id: str, payload: ResolveSharingLinksRequ
cert_private_key: str | None = None cert_private_key: str | None = None
cert_thumbprint: str | None = None cert_thumbprint: str | None = None
cert_public_pem: str | None = None
if job.tenant_profile_id: if job.tenant_profile_id:
profile = db.get(TenantProfile, job.tenant_profile_id) profile = db.get(TenantProfile, job.tenant_profile_id)
if profile: if profile:
cert_private_key = profile.cert_private_key cert_private_key = profile.cert_private_key
cert_thumbprint = profile.cert_thumbprint cert_thumbprint = profile.cert_thumbprint
cert_public_pem = profile.cert_public_pem
auth = AuthConfig( auth = AuthConfig(
tenant_id=job.auth_tenant_id or "", tenant_id=job.auth_tenant_id or "",
@ -273,6 +380,7 @@ def resolve_sharing_links_endpoint(job_id: str, payload: ResolveSharingLinksRequ
client_secret=job.auth_client_secret or "", client_secret=job.auth_client_secret or "",
cert_private_key=cert_private_key, cert_private_key=cert_private_key,
cert_thumbprint=cert_thumbprint, cert_thumbprint=cert_thumbprint,
cert_public_pem=cert_public_pem,
) )
all_deviations = list( all_deviations = list(
@ -282,15 +390,13 @@ def resolve_sharing_links_endpoint(job_id: str, payload: ResolveSharingLinksRequ
# Group by (site_url, principal) so each unique group is resolved once # Group by (site_url, principal) so each unique group is resolved once
groups: dict[tuple[str, str], list[int]] = {} groups: dict[tuple[str, str], list[int]] = {}
for dev in all_deviations: for dev in all_deviations:
if not dev.principal.startswith("SharingLinks."): parsed = _extract_sharing_link_group_and_type(dev.principal)
if not parsed:
continue continue
parts = dev.principal.split(".", 3) group_name, link_type = parsed
if len(parts) < 3:
continue
link_type = parts[2]
if link_type not in payload.link_types: if link_type not in payload.link_types:
continue continue
key = (dev.site_url, dev.principal) key = (dev.site_url, group_name)
groups.setdefault(key, []).append(dev.id) groups.setdefault(key, []).append(dev.id)
updated_deviations = 0 updated_deviations = 0
@ -311,6 +417,143 @@ def resolve_sharing_links_endpoint(job_id: str, payload: ResolveSharingLinksRequ
) )
@app.post("/api/scan-jobs/{job_id}/resolve-groups", response_model=ResolveGroupsResponse)
def resolve_groups_endpoint(job_id: str) -> ResolveGroupsResponse:
"""
Expand SharePoint group principals on this job's deviations and write
each group's member list to permission_deviations.resolved_members.
Skips claim-encoded principals (federated/AAD), email-shape users, and
SharingLinks groups (those have their own resolver).
"""
from .scanners.sharepoint import (
is_sharepoint_group_principal,
resolve_sharing_link_members,
)
with SessionLocal() as db:
job = db.get(ScanJob, job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if job.status in ("queued", "running"):
raise HTTPException(status_code=409, detail="Job is still running")
if (job.scan_type or "sharepoint") == "mailbox":
raise HTTPException(status_code=400, detail="Group resolution is only available for SharePoint jobs")
cert_private_key: str | None = None
cert_thumbprint: str | None = None
cert_public_pem: str | None = None
if job.tenant_profile_id:
profile = db.get(TenantProfile, job.tenant_profile_id)
if profile:
cert_private_key = profile.cert_private_key
cert_thumbprint = profile.cert_thumbprint
cert_public_pem = profile.cert_public_pem
auth = AuthConfig(
tenant_id=job.auth_tenant_id or "",
client_id=job.auth_client_id or "",
client_secret=job.auth_client_secret or "",
cert_private_key=cert_private_key,
cert_thumbprint=cert_thumbprint,
cert_public_pem=cert_public_pem,
)
all_deviations = list(
db.execute(select(PermissionDeviation).where(PermissionDeviation.job_id == job_id)).scalars()
)
# Group deviations by (site_url, principal) so each unique SP group is resolved once
groups: dict[tuple[str, str], list[int]] = {}
for dev in all_deviations:
if not is_sharepoint_group_principal(dev.principal):
continue
key = (dev.site_url, dev.principal)
groups.setdefault(key, []).append(dev.id)
resolved = 0
skipped = 0
updated = 0
for (site_url, group_name), dev_ids in groups.items():
try:
members = resolve_sharing_link_members(site_url, group_name, auth)
except Exception: # noqa: BLE001
members = []
if not members:
skipped += 1
continue
resolved_text = ", ".join(members)
with SessionLocal() as db:
for dev_id in dev_ids:
dev = db.get(PermissionDeviation, dev_id)
if dev:
dev.resolved_members = resolved_text
db.commit()
resolved += 1
updated += len(dev_ids)
return ResolveGroupsResponse(
resolved_groups=resolved,
skipped_groups=skipped,
updated_deviations=updated,
)
@app.post("/api/scan-jobs/{job_id}/targets/{target_id}/test-connection", response_model=ProbeResultResponse)
def test_target_connection(job_id: str, target_id: int) -> ProbeResultResponse:
with SessionLocal() as db:
job = db.get(ScanJob, job_id)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
target = db.get(ScanTarget, target_id)
if not target or target.job_id != job_id:
raise HTTPException(status_code=404, detail="Target not found")
if job.status in ("queued", "running"):
raise HTTPException(status_code=409, detail="Job is still running")
cert_private_key: str | None = None
cert_thumbprint: str | None = None
cert_public_pem: str | None = None
if job.tenant_profile_id:
profile = db.get(TenantProfile, job.tenant_profile_id)
if profile:
cert_private_key = profile.cert_private_key
cert_thumbprint = profile.cert_thumbprint
cert_public_pem = profile.cert_public_pem
auth = AuthConfig(
tenant_id=job.auth_tenant_id or "",
client_id=job.auth_client_id or "",
client_secret=job.auth_client_secret or "",
cert_private_key=cert_private_key,
cert_thumbprint=cert_thumbprint,
cert_public_pem=cert_public_pem,
)
site_url = target.site_url
job_scan_type = job.scan_type or "sharepoint"
result = probe(job_scan_type, site_url, auth)
with SessionLocal() as db:
target = db.get(ScanTarget, target_id)
if not target:
raise HTTPException(status_code=404, detail="Target not found")
now = datetime.utcnow()
target.last_probe_at = now
target.last_probe_ok = result.ok
target.last_probe_message = result.message
target.updated_at = now
db.commit()
db.refresh(target)
return ProbeResultResponse(
target_id=target.id,
ok=result.ok,
message=result.message,
last_probe_at=target.last_probe_at,
)
@app.get("/api/scan-jobs/{job_id}/export") @app.get("/api/scan-jobs/{job_id}/export")
def export_scan_job(job_id: str, site_url: str | None = None) -> StreamingResponse: def export_scan_job(job_id: str, site_url: str | None = None) -> StreamingResponse:
import openpyxl import openpyxl
@ -364,10 +607,19 @@ def export_scan_job(job_id: str, site_url: str | None = None) -> StreamingRespon
cell.font = header_font_white cell.font = header_font_white
cell.fill = header_fill cell.fill = header_fill
scan_type = job.scan_type or "sharepoint"
target_label = {
"sharepoint": "Site URL",
"sharepoint_root": "Site URL",
"mailbox": "Mailbox",
"entra_groups": "Group",
}.get(scan_type, "Target")
# Targets sheet # Targets sheet
ws_targets = wb.active ws_targets = wb.active
ws_targets.title = "Targets" ws_targets.title = "Targets"
_style_header(ws_targets, ["Site URL", "Status", "Attempts", "Error", "Started", "Finished"]) _style_header(ws_targets, [target_label, "Status", "Attempts", "Error", "Started", "Finished"])
for t in targets: for t in targets:
ws_targets.append([ ws_targets.append([
t.site_url, t.site_url,
@ -380,7 +632,42 @@ def export_scan_job(job_id: str, site_url: str | None = None) -> StreamingRespon
for col in ws_targets.columns: for col in ws_targets.columns:
ws_targets.column_dimensions[col[0].column_letter].width = max(len(str(c.value or "")) for c in col) + 4 ws_targets.column_dimensions[col[0].column_letter].width = max(len(str(c.value or "")) for c in col) + 4
# Deviations sheet # Results sheet — name and columns depend on scan type
if scan_type == "mailbox":
ws_dev = wb.create_sheet("Mailbox Permissions")
_style_header(ws_dev, ["Mailbox", "Object", "Permission Type", "Principal", "Access Rights"])
deviations.sort(key=lambda d: (d.site_url or "", d.permission_type or "", d.principal or ""))
for d in deviations:
ws_dev.append([
d.site_url,
d.object_url,
d.permission_type or d.object_type,
d.principal,
d.role_name,
])
elif scan_type == "entra_groups":
ws_dev = wb.create_sheet("Group Memberships")
_style_header(ws_dev, ["Group", "Group Type", "User", "Role"])
deviations.sort(key=lambda d: (d.object_url or "", d.role_name or "", d.principal or ""))
for d in deviations:
ws_dev.append([
d.object_url,
d.permission_type or "",
d.principal,
d.role_name,
])
elif scan_type == "sharepoint_root":
ws_dev = wb.create_sheet("Root Permissions")
_style_header(ws_dev, ["Site URL", "Principal", "Resolved Members", "Role"])
deviations.sort(key=lambda d: (d.site_url or "", d.principal or "", d.role_name or ""))
for d in deviations:
ws_dev.append([
d.site_url,
d.principal,
d.resolved_members or "",
d.role_name,
])
else:
ws_dev = wb.create_sheet("Deviations") ws_dev = wb.create_sheet("Deviations")
_style_header(ws_dev, ["Site URL", "Object URL", "Object Type", "Principal", "Link Risk", "Resolved Members", "Role", "Delta"]) _style_header(ws_dev, ["Site URL", "Object URL", "Object Type", "Principal", "Link Risk", "Resolved Members", "Role", "Delta"])
deviations.sort(key=lambda d: (d.site_url or "", d.object_url or "", d.principal or "")) deviations.sort(key=lambda d: (d.site_url or "", d.object_url or "", d.principal or ""))
@ -410,7 +697,7 @@ def export_scan_job(job_id: str, site_url: str | None = None) -> StreamingRespon
wb.save(buf) wb.save(buf)
buf.seek(0) buf.seek(0)
filename = f"clearview_job_{job_id}.xlsx" filename = _build_export_filename(job, job_id)
return StreamingResponse( return StreamingResponse(
buf, buf,
media_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", media_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
@ -452,6 +739,9 @@ def get_scan_job(job_id: str, site_url: str | None = None) -> ScanJobDetail:
error_message=t.error_message, error_message=t.error_message,
started_at=t.started_at, started_at=t.started_at,
finished_at=t.finished_at, finished_at=t.finished_at,
last_probe_at=t.last_probe_at,
last_probe_ok=t.last_probe_ok,
last_probe_message=t.last_probe_message,
) )
for t in targets for t in targets
], ],
@ -464,6 +754,7 @@ def get_scan_job(job_id: str, site_url: str | None = None) -> ScanJobDetail:
principal=d.principal, principal=d.principal,
role_name=d.role_name, role_name=d.role_name,
delta_type=d.delta_type, delta_type=d.delta_type,
permission_type=d.permission_type,
resolved_members=d.resolved_members, resolved_members=d.resolved_members,
created_at=d.created_at, created_at=d.created_at,
) )
@ -549,6 +840,98 @@ app.mount("/", StaticFiles(directory=SITE_DIR, html=True), name="site")
# Helpers # Helpers
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
_SCAN_TYPE_LABELS = {
"sharepoint": "Deviations",
"sharepoint_root": "Root",
"mailbox": "Mailbox",
"entra_groups": "EntraGroups",
}
def _build_export_filename(job: ScanJob, job_id: str) -> str:
tenant_label = (job.tenant_profile.name if job.tenant_profile else None) or "Manual"
safe_tenant = re.sub(r"[^A-Za-z0-9_-]+", "_", tenant_label).strip("_") or "Manual"
scan_type = job.scan_type or "sharepoint"
type_label = _SCAN_TYPE_LABELS.get(scan_type, scan_type)
short_id = job_id.replace("-", "")[-12:]
return f"ClearView_{safe_tenant}_{type_label}_{short_id}.xlsx"
def _enumerate_all_entra_groups(
tenant_id: str,
client_id: str,
client_secret: str | None,
profile_id: str | None,
) -> list[str]:
cert_private_key: str | None = None
cert_thumbprint: str | None = None
cert_public_pem: str | None = None
if profile_id:
with SessionLocal() as db:
profile = db.get(TenantProfile, profile_id)
if profile:
cert_private_key = profile.cert_private_key
cert_thumbprint = profile.cert_thumbprint
cert_public_pem = profile.cert_public_pem
auth = AuthConfig(
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret or "",
cert_private_key=cert_private_key,
cert_thumbprint=cert_thumbprint,
cert_public_pem=cert_public_pem,
)
from .scanners import entra as _entra
try:
return _entra.list_all_groups(auth)
except Exception as exc: # noqa: BLE001
raise HTTPException(status_code=400, detail=f"Group enumeration failed: {exc}") from exc
def _enumerate_all_mailboxes(
organization: str | None,
tenant_id: str,
client_id: str,
client_secret: str | None,
profile_id: str | None,
) -> list[str]:
if not organization or "." not in organization:
raise HTTPException(
status_code=400,
detail="organization (e.g. contoso.onmicrosoft.com) is required when scan_all_mailboxes is true",
)
cert_private_key: str | None = None
cert_thumbprint: str | None = None
cert_public_pem: str | None = None
if profile_id:
with SessionLocal() as db:
profile = db.get(TenantProfile, profile_id)
if profile:
cert_private_key = profile.cert_private_key
cert_thumbprint = profile.cert_thumbprint
cert_public_pem = profile.cert_public_pem
auth = AuthConfig(
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret or "",
cert_private_key=cert_private_key,
cert_thumbprint=cert_thumbprint,
cert_public_pem=cert_public_pem,
)
from .scanners import mailbox as _mailbox
try:
return _mailbox.list_mailboxes(organization=organization.strip().lower(), auth=auth)
except Exception as exc: # noqa: BLE001
raise HTTPException(status_code=400, detail=f"Mailbox enumeration failed: {exc}") from exc
def _resolve_credentials( def _resolve_credentials(
db, db,
tenant_profile_id: str | None, tenant_profile_id: str | None,
@ -574,8 +957,9 @@ def _resolve_credentials(
) )
def _create_job_from_urls( def _create_job_from_targets(
raw_urls: list[str], raw_targets: list[str],
scan_type: str,
skip_default_sites: bool, skip_default_sites: bool,
source_type: str, source_type: str,
tenant_id: str, tenant_id: str,
@ -583,59 +967,74 @@ def _create_job_from_urls(
client_secret: str, client_secret: str,
tenant_profile_id: str | None = None, tenant_profile_id: str | None = None,
) -> ScanJobCreateResponse: ) -> ScanJobCreateResponse:
accepted_urls: list[str] = [] accepted: list[str] = []
skipped_default_urls: list[str] = [] skipped_default_urls: list[str] = []
invalid_urls: list[str] = [] invalid: list[str] = []
seen: set[str] = set() seen: set[str] = set()
for raw in raw_urls: for raw in raw_targets:
normalized = normalize_site_url(raw) if scan_type == "mailbox":
normalized = (raw or "").strip().lower()
if not normalized or "@" not in normalized:
invalid.append(raw)
continue
elif scan_type == "entra_groups":
normalized = (raw or "").strip()
if not normalized: if not normalized:
invalid_urls.append(raw) invalid.append(raw)
continue
else:
normalized = normalize_site_url(raw) or ""
if not normalized:
invalid.append(raw)
continue continue
if normalized in seen: if normalized in seen:
continue continue
seen.add(normalized) seen.add(normalized)
if skip_default_sites and is_default_site(normalized): if scan_type in ("sharepoint", "sharepoint_root") and skip_default_sites and is_default_site(normalized):
skipped_default_urls.append(normalized) skipped_default_urls.append(normalized)
continue continue
accepted_urls.append(normalized) accepted.append(normalized)
with SessionLocal() as db: with SessionLocal() as db:
now = datetime.utcnow() now = datetime.utcnow()
job = ScanJob( job = ScanJob(
id=str(uuid.uuid4()), id=str(uuid.uuid4()),
source_type=source_type, source_type=source_type,
status="queued" if accepted_urls else "completed", scan_type=scan_type,
status="queued" if accepted else "completed",
skip_default_sites=skip_default_sites, skip_default_sites=skip_default_sites,
tenant_profile_id=tenant_profile_id, tenant_profile_id=tenant_profile_id,
auth_tenant_id=tenant_id, auth_tenant_id=tenant_id,
auth_client_id=client_id, auth_client_id=client_id,
auth_client_secret=client_secret, auth_client_secret=client_secret,
total_targets=len(accepted_urls), total_targets=len(accepted),
skipped_targets=len(skipped_default_urls), skipped_targets=len(skipped_default_urls),
warning_message=None, warning_message=None,
error_message=None, error_message=None,
created_at=now, created_at=now,
updated_at=now, updated_at=now,
finished_at=now if not accepted_urls else None, finished_at=now if not accepted else None,
) )
if not accepted_urls: if not accepted:
if scan_type == "mailbox":
job.warning_message = "No scannable mailboxes after validation"
else:
job.warning_message = "No scannable sites after validation and default-site filtering" job.warning_message = "No scannable sites after validation and default-site filtering"
db.add(job) db.add(job)
db.flush() db.flush()
for index, site_url in enumerate(accepted_urls, start=1): for index, target in enumerate(accepted, start=1):
db.add( db.add(
ScanTarget( ScanTarget(
job_id=job.id, job_id=job.id,
site_url=site_url, site_url=target,
source_row=index, source_row=index,
status="queued", status="queued",
attempts=0, attempts=0,
@ -646,15 +1045,14 @@ def _create_job_from_urls(
db.commit() db.commit()
# Reload with profile for summary
stmt = select(ScanJob).options(joinedload(ScanJob.tenant_profile)).where(ScanJob.id == job.id) stmt = select(ScanJob).options(joinedload(ScanJob.tenant_profile)).where(ScanJob.id == job.id)
job = db.execute(stmt).unique().scalar_one() job = db.execute(stmt).unique().scalar_one()
return ScanJobCreateResponse( return ScanJobCreateResponse(
job=_to_job_summary(job), job=_to_job_summary(job),
accepted_urls=accepted_urls, accepted_urls=accepted,
skipped_default_urls=skipped_default_urls, skipped_default_urls=skipped_default_urls,
invalid_urls=invalid_urls, invalid_urls=invalid,
) )
@ -663,6 +1061,7 @@ def _to_job_summary(job: ScanJob) -> ScanJobSummary:
id=job.id, id=job.id,
status=job.status, status=job.status,
source_type=job.source_type, source_type=job.source_type,
scan_type=job.scan_type or "sharepoint",
skip_default_sites=job.skip_default_sites, skip_default_sites=job.skip_default_sites,
tenant_profile_id=job.tenant_profile_id, tenant_profile_id=job.tenant_profile_id,
tenant_name=job.tenant_profile.name if job.tenant_profile else None, tenant_name=job.tenant_profile.name if job.tenant_profile else None,
@ -687,6 +1086,7 @@ def _to_tenant_item(profile: TenantProfile) -> TenantProfileItem:
id=profile.id, id=profile.id,
name=profile.name, name=profile.name,
tenant_id=profile.tenant_id, tenant_id=profile.tenant_id,
primary_domain=profile.primary_domain,
client_id=profile.client_id, client_id=profile.client_id,
has_certificate=bool(profile.cert_thumbprint), has_certificate=bool(profile.cert_thumbprint),
cert_thumbprint=profile.cert_thumbprint, cert_thumbprint=profile.cert_thumbprint,
@ -720,12 +1120,19 @@ def _ensure_schema_columns() -> None:
"ALTER TABLE scan_jobs ADD COLUMN IF NOT EXISTS tenant_profile_id VARCHAR(36)", "ALTER TABLE scan_jobs ADD COLUMN IF NOT EXISTS tenant_profile_id VARCHAR(36)",
"ALTER TABLE scan_jobs ADD COLUMN IF NOT EXISTS items_scanned INTEGER NOT NULL DEFAULT 0", "ALTER TABLE scan_jobs ADD COLUMN IF NOT EXISTS items_scanned INTEGER NOT NULL DEFAULT 0",
"ALTER TABLE scan_jobs ADD COLUMN IF NOT EXISTS scan_activity TEXT", "ALTER TABLE scan_jobs ADD COLUMN IF NOT EXISTS scan_activity TEXT",
"ALTER TABLE scan_jobs ADD COLUMN IF NOT EXISTS scan_type VARCHAR(32) NOT NULL DEFAULT 'sharepoint'",
"ALTER TABLE permission_deviations ADD COLUMN IF NOT EXISTS permission_type VARCHAR(32)",
"ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS primary_domain VARCHAR(256)",
"ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS client_secret TEXT", "ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS client_secret TEXT",
"ALTER TABLE tenant_profiles ALTER COLUMN client_secret DROP NOT NULL", "ALTER TABLE tenant_profiles ALTER COLUMN client_secret DROP NOT NULL",
"ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS cert_private_key TEXT", "ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS cert_private_key TEXT",
"ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS cert_public_pem TEXT",
"ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS cert_thumbprint VARCHAR(64)", "ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS cert_thumbprint VARCHAR(64)",
"ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS cert_expires_at TIMESTAMP", "ALTER TABLE tenant_profiles ADD COLUMN IF NOT EXISTS cert_expires_at TIMESTAMP",
"ALTER TABLE permission_deviations ADD COLUMN IF NOT EXISTS resolved_members TEXT", "ALTER TABLE permission_deviations ADD COLUMN IF NOT EXISTS resolved_members TEXT",
"ALTER TABLE scan_targets ADD COLUMN IF NOT EXISTS last_probe_at TIMESTAMP",
"ALTER TABLE scan_targets ADD COLUMN IF NOT EXISTS last_probe_ok BOOLEAN",
"ALTER TABLE scan_targets ADD COLUMN IF NOT EXISTS last_probe_message TEXT",
] ]
with engine.begin() as conn: with engine.begin() as conn:
for stmt in stmts: for stmt in stmts:

View File

@ -16,9 +16,11 @@ class TenantProfile(Base):
id: Mapped[str] = mapped_column(String(36), primary_key=True) id: Mapped[str] = mapped_column(String(36), primary_key=True)
name: Mapped[str] = mapped_column(String(256)) name: Mapped[str] = mapped_column(String(256))
tenant_id: Mapped[str] = mapped_column(String(128)) tenant_id: Mapped[str] = mapped_column(String(128))
primary_domain: Mapped[str | None] = mapped_column(String(256), nullable=True)
client_id: Mapped[str] = mapped_column(String(128)) client_id: Mapped[str] = mapped_column(String(128))
client_secret: Mapped[str | None] = mapped_column(Text, nullable=True) client_secret: Mapped[str | None] = mapped_column(Text, nullable=True)
cert_private_key: Mapped[str | None] = mapped_column(Text, nullable=True) cert_private_key: Mapped[str | None] = mapped_column(Text, nullable=True)
cert_public_pem: Mapped[str | None] = mapped_column(Text, nullable=True)
cert_thumbprint: Mapped[str | None] = mapped_column(String(64), nullable=True) cert_thumbprint: Mapped[str | None] = mapped_column(String(64), nullable=True)
cert_expires_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True) cert_expires_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow) created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow)
@ -34,6 +36,7 @@ class ScanJob(Base):
status: Mapped[str] = mapped_column(String(32), default="queued", index=True) status: Mapped[str] = mapped_column(String(32), default="queued", index=True)
source_type: Mapped[str] = mapped_column(String(16), default="manual") source_type: Mapped[str] = mapped_column(String(16), default="manual")
skip_default_sites: Mapped[bool] = mapped_column(Boolean, default=True) skip_default_sites: Mapped[bool] = mapped_column(Boolean, default=True)
scan_type: Mapped[str] = mapped_column(String(32), default="sharepoint", index=True)
tenant_profile_id: Mapped[str | None] = mapped_column( tenant_profile_id: Mapped[str | None] = mapped_column(
String(36), ForeignKey("tenant_profiles.id", ondelete="SET NULL"), nullable=True, index=True String(36), ForeignKey("tenant_profiles.id", ondelete="SET NULL"), nullable=True, index=True
) )
@ -76,6 +79,10 @@ class ScanTarget(Base):
attempts: Mapped[int] = mapped_column(Integer, default=0) attempts: Mapped[int] = mapped_column(Integer, default=0)
error_message: Mapped[str | None] = mapped_column(Text, nullable=True) error_message: Mapped[str | None] = mapped_column(Text, nullable=True)
last_probe_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True)
last_probe_ok: Mapped[bool | None] = mapped_column(Boolean, nullable=True)
last_probe_message: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow) created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow)
updated_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow) updated_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow)
started_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True) started_at: Mapped[datetime | None] = mapped_column(DateTime, nullable=True)
@ -98,6 +105,7 @@ class PermissionDeviation(Base):
principal: Mapped[str] = mapped_column(Text) principal: Mapped[str] = mapped_column(Text)
role_name: Mapped[str] = mapped_column(Text) role_name: Mapped[str] = mapped_column(Text)
delta_type: Mapped[str] = mapped_column(String(32)) delta_type: Mapped[str] = mapped_column(String(32))
permission_type: Mapped[str | None] = mapped_column(String(32), nullable=True)
resolved_members: Mapped[str | None] = mapped_column(Text, nullable=True) resolved_members: Mapped[str | None] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow) created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow)

View File

@ -1,467 +1,27 @@
from __future__ import annotations
import time
from collections.abc import Callable
from dataclasses import dataclass, field
from urllib.parse import urlparse
import msal
import requests
from .config import (
SCAN_HTTP_BACKOFF_SEC,
SCAN_HTTP_MAX_RETRIES,
SCAN_HTTP_TIMEOUT_SEC,
SCAN_LIST_PAGE_SIZE,
SCAN_MAX_ITEMS_PER_LIST,
SHAREPOINT_SCAN_MODE,
)
@dataclass
class DeviationRecord:
object_url: str
object_type: str
principal: str
role_name: str
delta_type: str
@dataclass
class ScanResult:
deviations: list[DeviationRecord]
warning: str | None = None
@dataclass(frozen=True)
class PermissionEntry:
principal: str
role_name: str
@dataclass(frozen=True)
class AuthConfig:
tenant_id: str
client_id: str
client_secret: str = ""
cert_private_key: str | None = None
cert_thumbprint: str | None = None
_TOKEN_CACHE: dict[str, str] = {}
ProgressCallback = Callable[[str, int], None]
def scan_site_for_deviations(
site_url: str,
auth: AuthConfig,
progress: ProgressCallback | None = None,
) -> ScanResult:
""" """
Scan SharePoint permission deviations versus site-root role assignments. Backwards-compatibility shim. New code should import from clearview_app.scanners.
Only SharePoint role assignments are used (site/list/folder/file scope).
No filesystem/NTFS permission model is used.
""" """
if SHAREPOINT_SCAN_MODE == "placeholder":
return ScanResult( from .scanners.common import (
deviations=[], AuthConfig,
warning=( DeviationRecord,
"SharePoint scan mode is 'placeholder'. " ProbeResult,
"Set SHAREPOINT_SCAN_MODE=sharepoint_app_only and configure Azure app credentials." ProgressCallback,
), ScanResult,
)
from .scanners.sharepoint import (
probe_site,
resolve_sharing_link_members,
scan_site_for_deviations,
) )
if SHAREPOINT_SCAN_MODE != "sharepoint_app_only": __all__ = [
raise RuntimeError(f"Unsupported SHAREPOINT_SCAN_MODE='{SHAREPOINT_SCAN_MODE}'") "AuthConfig",
"DeviationRecord",
_validate_auth_config(auth) "ProbeResult",
"ProgressCallback",
def _report(activity: str, items: int = 0) -> None: "ScanResult",
if progress: "probe_site",
progress(activity, items) "resolve_sharing_link_members",
"scan_site_for_deviations",
parsed = urlparse(site_url) ]
host = parsed.netloc
_report(f"Connecting to {host}")
token = _get_token_for_host(host, auth)
base_headers = {
"Accept": "application/json;odata=nometadata",
"Authorization": f"Bearer {token}",
}
_report(f"Loading site permissions: {site_url}")
root_assignments = _get_role_assignments(
f"{site_url}/_api/web/roleassignments?$expand=Member,RoleDefinitionBindings"
"&$select=Member/LoginName,Member/Title,Member/PrincipalType,RoleDefinitionBindings/Name",
base_headers,
)
root_set = set(root_assignments)
deviations: list[DeviationRecord] = []
warnings: list[str] = []
lists_url = (
f"{site_url}/_api/web/lists"
"?$select=Id,Title,BaseTemplate,Hidden,ItemCount,RootFolder/ServerRelativeUrl,HasUniqueRoleAssignments"
"&$expand=RootFolder"
)
for lst in _iter_paged(lists_url, base_headers):
if _to_bool(lst.get("Hidden")):
continue
if _to_int(lst.get("BaseTemplate")) != 101:
continue
list_id = str(lst.get("Id", "")).strip()
if not list_id:
continue
list_title = str(lst.get("Title") or "Document Library")
list_url = _absolute_url(host, str((lst.get("RootFolder") or {}).get("ServerRelativeUrl") or ""))
_report(f"Library: {list_title}")
if _to_bool(lst.get("HasUniqueRoleAssignments")):
list_assignments = _get_role_assignments(
f"{site_url}/_api/web/lists(guid'{list_id}')/roleassignments"
"?$expand=Member,RoleDefinitionBindings"
"&$select=Member/LoginName,Member/Title,Member/PrincipalType,RoleDefinitionBindings/Name",
base_headers,
)
deviations.extend(
_deviation_records_only_added(
object_url=list_url,
object_type="DocumentLibrary",
root_set=root_set,
current_set=set(list_assignments),
)
)
items_processed = 0
items_total = 0
items_url = (
f"{site_url}/_api/web/lists(guid'{list_id}')/items"
f"?$select=Id,FileRef,FileSystemObjectType,HasUniqueRoleAssignments&$top={SCAN_LIST_PAGE_SIZE}"
)
for item in _iter_paged(items_url, base_headers):
items_total += 1
if items_total % 50 == 0:
_report(f"Library: {list_title} ({items_total} items scanned)", 50)
if not _to_bool(item.get("HasUniqueRoleAssignments")):
continue
if items_processed >= SCAN_MAX_ITEMS_PER_LIST:
warnings.append(
f"List '{list_title}' hit SCAN_MAX_ITEMS_PER_LIST={SCAN_MAX_ITEMS_PER_LIST}; remaining unique-permission items skipped"
)
break
item_id = _to_int(item.get("Id"))
if item_id <= 0:
continue
file_ref = str(item.get("FileRef") or "")
if not file_ref:
continue
item_type = "File" if _to_int(item.get("FileSystemObjectType")) == 0 else "Folder"
item_assignments = _get_role_assignments(
f"{site_url}/_api/web/lists(guid'{list_id}')/items({item_id})/roleassignments"
"?$expand=Member,RoleDefinitionBindings"
"&$select=Member/LoginName,Member/Title,Member/PrincipalType,RoleDefinitionBindings/Name",
base_headers,
)
deviations.extend(
_deviation_records_only_added(
object_url=_absolute_url(host, file_ref),
object_type=item_type,
root_set=root_set,
current_set=set(item_assignments),
)
)
items_processed += 1
_report("Scan complete", 0)
warning = " | ".join(warnings) if warnings else None
return ScanResult(deviations=_deduplicate_hierarchical(deviations), warning=warning)
def resolve_sharing_link_members(
site_url: str,
group_name: str,
auth: AuthConfig,
) -> list[str]:
"""
Return the members of a SharePoint SharingLinks group.
Returns an empty list for anonymous links (no resolvable members).
"""
_validate_auth_config(auth)
parsed = urlparse(site_url)
host = parsed.netloc
token = _get_token_for_host(host, auth)
headers = {
"Accept": "application/json;odata=nometadata",
"Authorization": f"Bearer {token}",
}
encoded = group_name.replace("'", "''")
url = (
f"{site_url}/_api/web/sitegroups/getbyname('{encoded}')/users"
"?$select=LoginName,Email,Title"
)
try:
data = _request_json(url, headers)
except Exception: # noqa: BLE001
return []
members: list[str] = []
for user in _extract_values(data):
email = str(user.get("Email") or "").strip()
login = str(user.get("LoginName") or "").strip()
title = str(user.get("Title") or "").strip()
# Skip built-in SharePoint system accounts
if login.upper().startswith("SHAREPOINT\\") or login.startswith("c:0(.s|true"):
continue
if email:
members.append(email)
elif title:
members.append(title)
elif login:
members.append(login)
return members
def _validate_auth_config(auth: AuthConfig) -> None:
missing = []
if not auth.tenant_id:
missing.append("tenant_id")
if not auth.client_id:
missing.append("client_id")
if not auth.client_secret and not (auth.cert_thumbprint and auth.cert_private_key):
missing.append("client_secret or certificate")
if missing:
raise RuntimeError("Missing required Azure auth settings: " + ", ".join(missing))
def _get_token_for_host(host: str, auth: AuthConfig) -> str:
auth_method = "cert" if auth.cert_thumbprint and auth.cert_private_key else "secret"
cache_key = f"{host}|{auth.tenant_id}|{auth.client_id}|{auth_method}"
cached = _TOKEN_CACHE.get(cache_key)
if cached:
return cached
scope = f"https://{host}/.default"
authority = f"https://login.microsoftonline.com/{auth.tenant_id}"
if auth_method == "cert":
client_credential = {
"thumbprint": auth.cert_thumbprint,
"private_key": auth.cert_private_key,
}
else:
client_credential = auth.client_secret
app = msal.ConfidentialClientApplication(
client_id=auth.client_id,
authority=authority,
client_credential=client_credential,
)
result = app.acquire_token_for_client(scopes=[scope])
if "access_token" not in result:
error = result.get("error", "unknown")
description = result.get("error_description", "")
raise RuntimeError(f"Token request failed ({error}): {description[:300]}")
token = str(result["access_token"])
_TOKEN_CACHE[cache_key] = token
return token
def _iter_paged(url: str, headers: dict[str, str]):
next_url = url
while next_url:
data = _request_json(next_url, headers)
for item in _extract_values(data):
yield item
next_url = _extract_next_link(data)
def _request_json(url: str, headers: dict[str, str]) -> dict:
last_error: str | None = None
for attempt in range(1, SCAN_HTTP_MAX_RETRIES + 1):
try:
response = requests.get(url, headers=headers, timeout=SCAN_HTTP_TIMEOUT_SEC)
if response.status_code in (429, 503):
retry_after = _to_int(response.headers.get("Retry-After"))
delay = retry_after if retry_after > 0 else SCAN_HTTP_BACKOFF_SEC * attempt
time.sleep(delay)
continue
if response.status_code >= 400:
raise RuntimeError(f"HTTP {response.status_code}: {response.text[:300]}")
return response.json()
except Exception as exc: # noqa: BLE001
last_error = str(exc)
if attempt < SCAN_HTTP_MAX_RETRIES:
time.sleep(SCAN_HTTP_BACKOFF_SEC * attempt)
continue
raise RuntimeError(f"Request failed for {url}: {last_error}") from exc
raise RuntimeError(f"Request failed for {url}: {last_error}")
def _extract_values(data: dict) -> list[dict]:
if "value" in data and isinstance(data["value"], list):
return data["value"]
d = data.get("d")
if isinstance(d, dict):
results = d.get("results")
if isinstance(results, list):
return results
return []
def _extract_next_link(data: dict) -> str | None:
for key in ("@odata.nextLink", "odata.nextLink", "__next"):
value = data.get(key)
if isinstance(value, str) and value:
return value
d = data.get("d")
if isinstance(d, dict):
value = d.get("__next")
if isinstance(value, str) and value:
return value
return None
def _get_role_assignments(url: str, headers: dict[str, str]) -> list[PermissionEntry]:
data = _request_json(url, headers)
assignments: list[PermissionEntry] = []
for item in _extract_values(data):
member = item.get("Member") or {}
principal = str(member.get("LoginName") or member.get("Title") or "").strip()
if not principal:
continue
role_bindings = item.get("RoleDefinitionBindings")
roles = _extract_role_names(role_bindings)
for role_name in roles:
if role_name.lower() == "limited access":
continue
assignments.append(PermissionEntry(principal=principal, role_name=role_name))
return assignments
_ROLE_NAME_NL_TO_EN: dict[str, str] = {
"volledig beheer": "Full Control",
"ontwerpen": "Design",
"bewerken": "Edit",
"bijdragen": "Contribute",
"lezen": "Read",
"beperkte toegang": "Limited Access",
"goedkeuren": "Approve",
"hiërarchieën beheren": "Manage Hierarchy",
"weergeven alleen": "View Only",
"beperkt lezen": "Restricted Read",
}
def _normalize_role_name(name: str) -> str:
return _ROLE_NAME_NL_TO_EN.get(name.lower(), name)
def _extract_role_names(bindings) -> list[str]:
if isinstance(bindings, list):
return [_normalize_role_name(str(x.get("Name") or "").strip()) for x in bindings if isinstance(x, dict) and x.get("Name")]
if isinstance(bindings, dict):
results = bindings.get("results")
if isinstance(results, list):
return [_normalize_role_name(str(x.get("Name") or "").strip()) for x in results if isinstance(x, dict) and x.get("Name")]
return []
def _deduplicate_hierarchical(deviations: list[DeviationRecord]) -> list[DeviationRecord]:
"""
Remove child-level deviations that are already covered by a parent in the URL hierarchy.
A deviation for (principal, role) at /sites/X/Lib/FolderA is redundant when the same
(principal, role) was already reported at /sites/X/Lib or /sites/X/Lib/FolderA's parent.
Sorting by URL length ascending guarantees parents are evaluated before their children.
"""
sorted_devs = sorted(deviations, key=lambda d: len(d.object_url))
# Maps (principal, role_name) → list of ancestor URLs already reported
covered: dict[tuple[str, str], list[str]] = {}
result: list[DeviationRecord] = []
for dev in sorted_devs:
key = (dev.principal, dev.role_name)
ancestor_urls = covered.get(key)
if ancestor_urls:
parent = dev.object_url.rstrip("/")
already_covered = any(
parent == anc.rstrip("/") or parent.startswith(anc.rstrip("/") + "/")
for anc in ancestor_urls
)
if already_covered:
continue
else:
covered[key] = []
result.append(dev)
covered[key].append(dev.object_url)
return result
def _deviation_records_only_added(
object_url: str,
object_type: str,
root_set: set[PermissionEntry],
current_set: set[PermissionEntry],
) -> list[DeviationRecord]:
records: list[DeviationRecord] = []
for entry in sorted(current_set - root_set, key=lambda x: (x.principal.lower(), x.role_name.lower())):
records.append(
DeviationRecord(
object_url=object_url,
object_type=object_type,
principal=entry.principal,
role_name=entry.role_name,
delta_type="added",
)
)
return records
def _absolute_url(host: str, server_relative_url: str) -> str:
if not server_relative_url:
return f"https://{host}"
if server_relative_url.startswith("http://") or server_relative_url.startswith("https://"):
return server_relative_url
if not server_relative_url.startswith("/"):
server_relative_url = "/" + server_relative_url
return f"https://{host}{server_relative_url}"
def _to_int(value) -> int:
try:
if value is None:
return 0
return int(value)
except (TypeError, ValueError):
return 0
def _to_bool(value) -> bool:
if isinstance(value, bool):
return value
if isinstance(value, str):
return value.strip().lower() in ("1", "true", "yes")
return bool(value)

View File

@ -0,0 +1,61 @@
"""
Scanner package dispatches scan requests by scan_type.
Public API:
- AuthConfig, DeviationRecord, ScanResult, ProbeResult, ProgressCallback (common)
- scan(scan_type, target, auth, progress) dispatcher
- probe(scan_type, target, auth) dispatcher
- resolve_sharing_link_members SharePoint-specific, re-exported
"""
from __future__ import annotations
from .common import (
AuthConfig,
DeviationRecord,
ProbeResult,
ProgressCallback,
ScanResult,
)
from . import entra, mailbox, sharepoint
from .sharepoint import resolve_sharing_link_members
__all__ = [
"AuthConfig",
"DeviationRecord",
"ProbeResult",
"ProgressCallback",
"ScanResult",
"scan",
"probe",
"resolve_sharing_link_members",
]
def scan(
scan_type: str,
target: str,
auth: AuthConfig,
progress: ProgressCallback | None = None,
) -> ScanResult:
"""Dispatch a scan to the right scanner module."""
if scan_type == "sharepoint":
return sharepoint.scan_site_for_deviations(target, auth, progress)
if scan_type == "sharepoint_root":
return sharepoint.scan_site_root_permissions(target, auth, progress)
if scan_type == "mailbox":
return mailbox.scan_mailbox_for_deviations(target, auth, progress)
if scan_type == "entra_groups":
return entra.scan_entra_group(target, auth, progress)
raise RuntimeError(f"Unknown scan_type '{scan_type}'")
def probe(scan_type: str, target: str, auth: AuthConfig) -> ProbeResult:
"""Dispatch a preflight probe to the right scanner module."""
if scan_type in ("sharepoint", "sharepoint_root"):
return sharepoint.probe_site(target, auth)
if scan_type == "mailbox":
return mailbox.probe_mailbox(target, auth)
if scan_type == "entra_groups":
return entra.probe_entra(target, auth)
raise RuntimeError(f"Unknown scan_type '{scan_type}'")

View File

@ -0,0 +1,52 @@
from __future__ import annotations
from collections.abc import Callable
from dataclasses import dataclass
@dataclass(frozen=True)
class AuthConfig:
tenant_id: str
client_id: str
client_secret: str = ""
cert_private_key: str | None = None
cert_thumbprint: str | None = None
cert_public_pem: str | None = None
@dataclass
class DeviationRecord:
object_url: str
object_type: str
principal: str
role_name: str
delta_type: str
permission_type: str | None = None
@dataclass
class ScanResult:
deviations: list[DeviationRecord]
warning: str | None = None
@dataclass
class ProbeResult:
ok: bool
message: str
ProgressCallback = Callable[[str, int], None]
def validate_auth_config(auth: AuthConfig) -> None:
missing = []
if not auth.tenant_id:
missing.append("tenant_id")
if not auth.client_id:
missing.append("client_id")
if not auth.client_secret and not (auth.cert_thumbprint and auth.cert_private_key):
missing.append("client_secret or certificate")
if missing:
raise RuntimeError("Missing required Azure auth settings: " + ", ".join(missing))

View File

@ -0,0 +1,293 @@
"""
Entra (Azure AD) groups scanner.
For each target (group object-id or email/UPN-style mail) Clearview retrieves:
- The group's display name and type (Microsoft 365 / Security / Distribution / Mail-enabled security)
- Every Member (recursive across nested groups)
- Every Owner (recursive across nested groups)
Each resulting user is stored as one deviation with:
- object_url = group display label
- object_type = 'EntraGroup'
- principal = userPrincipalName / mail / displayName
- role_name = 'Member' or 'Owner' (with " (via X > Y)" chain when nested)
- delta_type = 'present'
- permission_type = group type ("Microsoft 365" / "Security" / )
Authentication uses a Graph token obtained from MSAL via the existing tenant
certificate. Required Application permission: Group.Read.All on Microsoft Graph.
"""
from __future__ import annotations
from dataclasses import dataclass
from urllib.parse import quote
import requests
from ..config import SCAN_HTTP_BACKOFF_SEC, SCAN_HTTP_MAX_RETRIES, SCAN_HTTP_TIMEOUT_SEC
from .common import (
AuthConfig,
DeviationRecord,
ProbeResult,
ProgressCallback,
ScanResult,
validate_auth_config,
)
from .sharepoint import _get_token_for_host, _request_json
@dataclass
class _ResolvedUser:
upn: str
via: list[str]
def scan_entra_group(
target: str,
auth: AuthConfig,
progress: ProgressCallback | None = None,
) -> ScanResult:
validate_auth_config(auth)
def _report(activity: str, items: int = 0) -> None:
if progress:
progress(activity, items)
headers = _graph_headers(auth)
_report(f"Resolving group: {target}")
group = _resolve_group(target, headers)
if not group:
return ScanResult(deviations=[], warning=f"Group not found: {target}")
group_id = str(group.get("id") or "").strip()
label = (
str(group.get("displayName") or "").strip()
or str(group.get("mail") or "").strip()
or group_id
)
group_type = _classify_group_type(group)
_report(f"Members: {label}")
members = _collect_users(group_id, "/members", headers, [label])
_report(f"Owners: {label}")
owners = _collect_users(group_id, "/owners", headers, [label])
deviations: list[DeviationRecord] = []
for user in members:
deviations.append(_user_to_record(user, label, group_type, "Member"))
for user in owners:
deviations.append(_user_to_record(user, label, group_type, "Owner"))
_report("Scan complete", 0)
return ScanResult(deviations=deviations, warning=None)
def probe_entra(target: str, auth: AuthConfig) -> ProbeResult:
try:
validate_auth_config(auth)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=f"Config: {exc}")
if not (target or "").strip():
return ProbeResult(ok=False, message="Empty group target")
try:
headers = _graph_headers(auth)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=f"Token: {str(exc)[:240]}")
try:
group = _resolve_group(target, headers)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=_probe_hint(str(exc)))
if not group:
return ProbeResult(ok=False, message=f"Group not found: {target}")
return ProbeResult(ok=True, message="OK")
def list_all_groups(auth: AuthConfig, max_count: int = 50000) -> list[str]:
"""
Enumerate every group object id in the tenant (any group type) via Graph.
Returns a list of object IDs that can each be queued as a scan target.
"""
validate_auth_config(auth)
headers = _graph_headers(auth)
next_url: str | None = (
"https://graph.microsoft.com/v1.0/groups"
"?$select=id,displayName,mail&$top=999"
)
ids: list[str] = []
while next_url:
data = _request_json(next_url, headers)
for g in data.get("value", []):
gid = str(g.get("id") or "").strip()
if gid:
ids.append(gid)
if len(ids) > max_count:
raise RuntimeError(f"Group count exceeds limit {max_count}")
nl = data.get("@odata.nextLink")
next_url = nl if isinstance(nl, str) and nl else None
return ids
def _user_to_record(user: _ResolvedUser, group_label: str, group_type: str, role: str) -> DeviationRecord:
via_chain = " > ".join(user.via)
role_name = role
if user.via and user.via != [group_label]:
role_name = f"{role} (via {via_chain})"
return DeviationRecord(
object_url=group_label,
object_type="EntraGroup",
principal=user.upn,
role_name=role_name,
delta_type="present",
permission_type=group_type,
)
def _graph_headers(auth: AuthConfig) -> dict[str, str]:
token = _get_token_for_host("graph.microsoft.com", auth)
return {
"Accept": "application/json",
"Authorization": f"Bearer {token}",
}
def _resolve_group(target: str, headers: dict[str, str]) -> dict | None:
"""Accept a GUID, an email/SMTP, or a displayName."""
cleaned = (target or "").strip()
if not cleaned:
return None
if _is_guid(cleaned):
try:
return _request_json(
f"https://graph.microsoft.com/v1.0/groups/{cleaned}"
"?$select=id,displayName,mail,groupTypes,securityEnabled,mailEnabled",
headers,
)
except Exception: # noqa: BLE001
return None
safe = cleaned.replace("'", "''")
if "@" in cleaned:
url = (
"https://graph.microsoft.com/v1.0/groups"
f"?$filter=mail eq '{safe}'"
"&$select=id,displayName,mail,groupTypes,securityEnabled,mailEnabled"
)
else:
url = (
"https://graph.microsoft.com/v1.0/groups"
f"?$filter=displayName eq '{safe}'"
"&$select=id,displayName,mail,groupTypes,securityEnabled,mailEnabled"
)
try:
data = _request_json(url, headers)
except Exception: # noqa: BLE001
return None
items = data.get("value") or []
return items[0] if items else None
def _classify_group_type(group: dict) -> str:
types = group.get("groupTypes") or []
if isinstance(types, list) and any(str(t).lower() == "unified" for t in types):
return "Microsoft 365"
mail_enabled = bool(group.get("mailEnabled"))
security_enabled = bool(group.get("securityEnabled"))
if mail_enabled and security_enabled:
return "Mail-enabled Security"
if security_enabled:
return "Security"
if mail_enabled:
return "Distribution"
return "Group"
def _collect_users(
group_id: str,
relative: str,
headers: dict[str, str],
via_chain: list[str],
seen_groups: set[str] | None = None,
depth: int = 0,
) -> list[_ResolvedUser]:
if depth > 5:
return []
if seen_groups is None:
seen_groups = set()
next_url: str | None = (
f"https://graph.microsoft.com/v1.0/groups/{group_id}{relative}"
"?$select=id,userPrincipalName,mail,displayName&$top=999"
)
out: list[_ResolvedUser] = []
while next_url:
try:
data = _request_json(next_url, headers)
except Exception: # noqa: BLE001
break
for entry in data.get("value", []):
otype = str(entry.get("@odata.type") or "")
if otype.endswith("user"):
upn = (
str(entry.get("userPrincipalName") or "").strip()
or str(entry.get("mail") or "").strip()
or str(entry.get("displayName") or "").strip()
)
if upn:
out.append(_ResolvedUser(upn=upn, via=list(via_chain)))
elif otype.endswith("group"):
nested_id = str(entry.get("id") or "").strip()
if not nested_id or nested_id in seen_groups:
continue
seen_groups.add(nested_id)
nested_label = (
str(entry.get("displayName") or "").strip()
or str(entry.get("mail") or "").strip()
or nested_id
)
# Nested groups under /members are themselves "members" — we
# recurse via /members only. For /owners, owners of the nested
# group are not themselves owners of the parent in any
# meaningful sense, so we still recurse via /members.
out.extend(
_collect_users(
nested_id,
"/members",
headers,
via_chain + [nested_label],
seen_groups,
depth + 1,
)
)
nl = data.get("@odata.nextLink")
next_url = nl if isinstance(nl, str) and nl else None
return out
def _is_guid(value: str) -> bool:
if not value or len(value) != 36:
return False
parts = value.split("-")
if len(parts) != 5:
return False
return all(all(c in "0123456789abcdefABCDEF" for c in p) for p in parts)
def _probe_hint(error: str) -> str:
low = error.lower()
if "401" in low or "unauthorized" in low or "aadsts" in low:
return f"{error[:200]} — verify Group.Read.All permission and admin consent on Microsoft Graph"
if "403" in low or "forbidden" in low:
return f"{error[:200]} — Microsoft Graph permission denied (Group.Read.All missing?)"
if "404" in low:
return f"{error[:200]} — group not found in this tenant"
return error[:240]

View File

@ -0,0 +1,257 @@
"""
Mailbox permission scanner Exchange Online via PowerShell subprocess.
Requires `pwsh` and the `ExchangeOnlineManagement` module to be installed
in the runtime container. Authentication uses certificate-based app-only
auth, identical to the SharePoint scanner's tenant profile.
"""
from __future__ import annotations
import json
import os
import secrets
import shutil
import subprocess
import tempfile
from pathlib import Path
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.serialization import pkcs12
from cryptography import x509
from .common import (
AuthConfig,
DeviationRecord,
ProbeResult,
ProgressCallback,
ScanResult,
validate_auth_config,
)
_SCRIPTS_DIR = Path(__file__).parent / "exo_scripts"
_PROBE_SCRIPT = _SCRIPTS_DIR / "probe.ps1"
_GET_PERMS_SCRIPT = _SCRIPTS_DIR / "get-permissions.ps1"
_LIST_SCRIPT = _SCRIPTS_DIR / "list-mailboxes.ps1"
# pwsh subprocess timeout — connect can take ~10s, scan up to a few minutes per mailbox
_PWSH_TIMEOUT_SEC = 600
def scan_mailbox_for_deviations(
upn: str,
auth: AuthConfig,
progress: ProgressCallback | None = None,
) -> ScanResult:
validate_auth_config(auth)
_require_certificate(auth)
def _report(activity: str, items: int = 0) -> None:
if progress:
progress(activity, items)
organization = _resolve_organization(auth, upn)
_report(f"Connecting to Exchange Online ({organization})")
payload = _run_pwsh(_GET_PERMS_SCRIPT, auth, organization, upn)
if not payload.get("ok"):
raise RuntimeError(payload.get("error") or "Mailbox scan failed")
entries = payload.get("entries") or []
warnings = payload.get("warnings") or []
mailbox_id = payload.get("mailbox") or upn
_report(f"Mailbox: {mailbox_id} ({len(entries)} entries)", len(entries))
deviations: list[DeviationRecord] = []
for entry in entries:
principal = str(entry.get("principal") or "").strip()
if not principal:
continue
deviations.append(
DeviationRecord(
object_url=str(entry.get("object") or mailbox_id),
object_type=str(entry.get("object_type") or "Mailbox"),
principal=principal,
role_name=str(entry.get("role_name") or ""),
delta_type="present",
permission_type=str(entry.get("permission_type") or ""),
)
)
_report("Scan complete", 0)
warning_text = " | ".join(str(w) for w in warnings) if warnings else None
return ScanResult(deviations=deviations, warning=warning_text)
def list_mailboxes(organization: str, auth: AuthConfig, max_count: int = 50000) -> list[str]:
"""
Enumerate every UserPrincipalName in the tenant via Exchange Online.
`organization` must be the tenant's primary domain (e.g. contoso.onmicrosoft.com).
Raises on connection failure or when the count exceeds max_count.
"""
validate_auth_config(auth)
_require_certificate(auth)
if not shutil.which("pwsh"):
raise RuntimeError("pwsh not available in runtime")
payload = _run_pwsh(_LIST_SCRIPT, auth, organization, mailbox=None, timeout_sec=300)
if not payload.get("ok"):
raise RuntimeError(payload.get("error") or "Mailbox enumeration failed")
mailboxes = payload.get("mailboxes") or []
if not isinstance(mailboxes, list):
return []
cleaned = [str(m).strip().lower() for m in mailboxes if isinstance(m, str) and m.strip()]
if len(cleaned) > max_count:
raise RuntimeError(f"Mailbox count {len(cleaned)} exceeds limit {max_count}")
return cleaned
def probe_mailbox(upn: str, auth: AuthConfig) -> ProbeResult:
try:
validate_auth_config(auth)
_require_certificate(auth)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=f"Config: {exc}")
if not shutil.which("pwsh"):
return ProbeResult(ok=False, message="pwsh not available in runtime")
if not (upn or "").strip() or "@" not in upn:
return ProbeResult(ok=False, message="Invalid mailbox (UPN/email)")
organization = _resolve_organization(auth, upn)
try:
payload = _run_pwsh(_PROBE_SCRIPT, auth, organization, upn)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=f"pwsh: {str(exc)[:240]}")
ok = bool(payload.get("ok"))
message = str(payload.get("message") or ("OK" if ok else "Unknown error"))
if not ok:
message = _probe_hint(message)
return ProbeResult(ok=ok, message=message)
def _require_certificate(auth: AuthConfig) -> None:
if not (auth.cert_thumbprint and auth.cert_private_key):
raise RuntimeError(
"Mailbox scanning requires a certificate on the tenant profile "
"(client secret is not supported by Exchange Online for app-only auth)."
)
def _resolve_organization(auth: AuthConfig, upn: str) -> str:
"""
Exchange Online expects the organization as the tenant's primary domain
(e.g. contoso.onmicrosoft.com). The UPN domain is the practical default.
"""
domain = upn.split("@", 1)[-1].strip().lower()
return domain or auth.tenant_id
def _run_pwsh(
script: Path,
auth: AuthConfig,
organization: str,
mailbox: str | None = None,
timeout_sec: int = _PWSH_TIMEOUT_SEC,
) -> dict:
if not shutil.which("pwsh"):
raise RuntimeError("pwsh not available in runtime")
public_pem = _resolve_public_cert_pem(auth)
pfx_password = secrets.token_urlsafe(16)
with tempfile.TemporaryDirectory(prefix="clearview-exo-") as tmp:
pfx_path = Path(tmp) / "cert.pfx"
_write_pfx(
private_key_pem=auth.cert_private_key or "",
public_cert_pem=public_pem,
out_path=pfx_path,
password=pfx_password,
)
cmd = [
"pwsh",
"-NoProfile",
"-NonInteractive",
"-File", str(script),
"-TenantId", auth.tenant_id,
"-ClientId", auth.client_id,
"-Organization", organization,
"-CertPath", str(pfx_path),
]
if mailbox is not None:
cmd.extend(["-Mailbox", mailbox])
env = os.environ.copy()
env["CLEARVIEW_PFX_PASSWORD"] = pfx_password
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout_sec,
env=env,
)
except subprocess.TimeoutExpired as exc:
raise RuntimeError(f"pwsh script timed out after {timeout_sec}s") from exc
if result.returncode != 0:
stderr = (result.stderr or "").strip()[:500]
raise RuntimeError(f"pwsh exited with code {result.returncode}: {stderr}")
out = (result.stdout or "").strip()
if not out:
raise RuntimeError("pwsh returned empty output")
last_line = out.splitlines()[-1]
try:
return json.loads(last_line)
except json.JSONDecodeError as exc:
raise RuntimeError(f"Could not parse pwsh JSON output: {out[:500]}") from exc
def _resolve_public_cert_pem(auth: AuthConfig) -> str:
"""
The public cert PEM is stored on the tenant profile via the AuthConfig
extension below. This helper raises if it is missing happens for tenants
whose certificate was generated before cert_public_pem was stored.
"""
pem = getattr(auth, "cert_public_pem", None)
if not pem:
raise RuntimeError(
"Tenant certificate has no public PEM stored. "
"Regenerate the certificate to enable mailbox scanning."
)
return pem
def _write_pfx(private_key_pem: str, public_cert_pem: str, out_path: Path, password: str) -> None:
private_key = serialization.load_pem_private_key(private_key_pem.encode(), password=None)
cert = x509.load_pem_x509_certificate(public_cert_pem.encode())
pfx_bytes = pkcs12.serialize_key_and_certificates(
name=b"clearview",
key=private_key,
cert=cert,
cas=None,
encryption_algorithm=serialization.BestAvailableEncryption(password.encode()),
)
out_path.write_bytes(pfx_bytes)
def _probe_hint(message: str) -> str:
low = message.lower()
if "unauthorized" in low or "401" in low or "aadsts" in low:
return f"{message[:200]} — verify Exchange.ManageAsApp permission, admin consent, and the Exchange Administrator role assignment"
if "not found" in low or "couldn't find object" in low:
return f"{message[:200]} — mailbox not found in this tenant"
if "module not available" in low:
return f"{message[:200]} — install the ExchangeOnlineManagement module in the container"
return message[:240]

View File

@ -0,0 +1,722 @@
from __future__ import annotations
import time
from dataclasses import dataclass
from urllib.parse import urlparse
import msal
import requests
from ..config import (
SCAN_HTTP_BACKOFF_SEC,
SCAN_HTTP_MAX_RETRIES,
SCAN_HTTP_TIMEOUT_SEC,
SCAN_LIST_PAGE_SIZE,
SCAN_MAX_ITEMS_PER_LIST,
SHAREPOINT_SCAN_MODE,
)
from .common import (
AuthConfig,
DeviationRecord,
ProbeResult,
ProgressCallback,
ScanResult,
validate_auth_config,
)
@dataclass(frozen=True)
class PermissionEntry:
principal: str
role_name: str
_TOKEN_CACHE: dict[str, str] = {}
def scan_site_for_deviations(
site_url: str,
auth: AuthConfig,
progress: ProgressCallback | None = None,
) -> ScanResult:
"""
Scan SharePoint permission deviations versus site-root role assignments.
Only SharePoint role assignments are used (site/list/folder/file scope).
No filesystem/NTFS permission model is used.
"""
if SHAREPOINT_SCAN_MODE == "placeholder":
return ScanResult(
deviations=[],
warning=(
"SharePoint scan mode is 'placeholder'. "
"Set SHAREPOINT_SCAN_MODE=sharepoint_app_only and configure Azure app credentials."
),
)
if SHAREPOINT_SCAN_MODE != "sharepoint_app_only":
raise RuntimeError(f"Unsupported SHAREPOINT_SCAN_MODE='{SHAREPOINT_SCAN_MODE}'")
validate_auth_config(auth)
def _report(activity: str, items: int = 0) -> None:
if progress:
progress(activity, items)
parsed = urlparse(site_url)
host = parsed.netloc
_report(f"Connecting to {host}")
token = _get_token_for_host(host, auth)
base_headers = {
"Accept": "application/json;odata=nometadata",
"Authorization": f"Bearer {token}",
}
_report(f"Loading site permissions: {site_url}")
root_assignments = _get_role_assignments(
f"{site_url}/_api/web/roleassignments?$expand=Member,RoleDefinitionBindings"
"&$select=Member/LoginName,Member/Title,Member/PrincipalType,RoleDefinitionBindings/Name",
base_headers,
)
root_set = set(root_assignments)
deviations: list[DeviationRecord] = []
warnings: list[str] = []
lists_url = (
f"{site_url}/_api/web/lists"
"?$select=Id,Title,BaseTemplate,Hidden,ItemCount,RootFolder/ServerRelativeUrl,HasUniqueRoleAssignments"
"&$expand=RootFolder"
)
for lst in _iter_paged(lists_url, base_headers):
if _to_bool(lst.get("Hidden")):
continue
if _to_int(lst.get("BaseTemplate")) != 101:
continue
list_id = str(lst.get("Id", "")).strip()
if not list_id:
continue
list_title = str(lst.get("Title") or "Document Library")
list_url = _absolute_url(host, str((lst.get("RootFolder") or {}).get("ServerRelativeUrl") or ""))
_report(f"Library: {list_title}")
if _to_bool(lst.get("HasUniqueRoleAssignments")):
list_assignments = _get_role_assignments(
f"{site_url}/_api/web/lists(guid'{list_id}')/roleassignments"
"?$expand=Member,RoleDefinitionBindings"
"&$select=Member/LoginName,Member/Title,Member/PrincipalType,RoleDefinitionBindings/Name",
base_headers,
)
deviations.extend(
_deviation_records_only_added(
object_url=list_url,
object_type="DocumentLibrary",
root_set=root_set,
current_set=set(list_assignments),
)
)
items_processed = 0
items_total = 0
items_url = (
f"{site_url}/_api/web/lists(guid'{list_id}')/items"
f"?$select=Id,FileRef,FileSystemObjectType,HasUniqueRoleAssignments&$top={SCAN_LIST_PAGE_SIZE}"
)
for item in _iter_paged(items_url, base_headers):
items_total += 1
if items_total % 50 == 0:
_report(f"Library: {list_title} ({items_total} items scanned)", 50)
if not _to_bool(item.get("HasUniqueRoleAssignments")):
continue
if items_processed >= SCAN_MAX_ITEMS_PER_LIST:
warnings.append(
f"List '{list_title}' hit SCAN_MAX_ITEMS_PER_LIST={SCAN_MAX_ITEMS_PER_LIST}; remaining unique-permission items skipped"
)
break
item_id = _to_int(item.get("Id"))
if item_id <= 0:
continue
file_ref = str(item.get("FileRef") or "")
if not file_ref:
continue
item_type = "File" if _to_int(item.get("FileSystemObjectType")) == 0 else "Folder"
item_assignments = _get_role_assignments(
f"{site_url}/_api/web/lists(guid'{list_id}')/items({item_id})/roleassignments"
"?$expand=Member,RoleDefinitionBindings"
"&$select=Member/LoginName,Member/Title,Member/PrincipalType,RoleDefinitionBindings/Name",
base_headers,
)
deviations.extend(
_deviation_records_only_added(
object_url=_absolute_url(host, file_ref),
object_type=item_type,
root_set=root_set,
current_set=set(item_assignments),
)
)
items_processed += 1
_report("Scan complete", 0)
warning = " | ".join(warnings) if warnings else None
return ScanResult(deviations=_deduplicate_hierarchical(deviations), warning=warning)
def scan_site_root_permissions(
site_url: str,
auth: AuthConfig,
progress: ProgressCallback | None = None,
) -> ScanResult:
"""
Collect the role assignments at the site-root level without traversing
libraries, folders, or items. Each assignment is reported as a record
with delta_type='root' so it is distinguishable from the deviation scan.
"""
if SHAREPOINT_SCAN_MODE == "placeholder":
return ScanResult(
deviations=[],
warning="SharePoint scan mode is 'placeholder'.",
)
if SHAREPOINT_SCAN_MODE != "sharepoint_app_only":
raise RuntimeError(f"Unsupported SHAREPOINT_SCAN_MODE='{SHAREPOINT_SCAN_MODE}'")
validate_auth_config(auth)
def _report(activity: str, items: int = 0) -> None:
if progress:
progress(activity, items)
parsed = urlparse(site_url)
host = parsed.netloc
_report(f"Connecting to {host}")
token = _get_token_for_host(host, auth)
headers = {
"Accept": "application/json;odata=nometadata",
"Authorization": f"Bearer {token}",
}
_report(f"Loading root permissions: {site_url}")
root_assignments = _get_role_assignments(
f"{site_url}/_api/web/roleassignments?$expand=Member,RoleDefinitionBindings"
"&$select=Member/LoginName,Member/Title,Member/PrincipalType,RoleDefinitionBindings/Name",
headers,
)
filtered = [e for e in root_assignments if not _is_noise_principal(e.principal)]
records: list[DeviationRecord] = []
for entry in sorted(filtered, key=lambda e: (e.principal.lower(), e.role_name.lower())):
records.append(
DeviationRecord(
object_url=site_url,
object_type="Site",
principal=entry.principal,
role_name=entry.role_name,
delta_type="root",
)
)
_report("Scan complete", 0)
skipped = len(root_assignments) - len(filtered)
warning = f"{skipped} SharingLinks/system entries hidden" if skipped else None
return ScanResult(deviations=records, warning=warning)
def is_sharepoint_group_principal(principal: str) -> bool:
"""
Heuristic: a SharePoint group has a plain display-name principal
(no claim-encoded prefix, no email shape). Used to decide which entries
can be resolved via /_api/web/sitegroups/getbyname.
"""
if not principal:
return False
p = principal.strip()
if not p:
return False
# Claim-encoded principals: c:0o.c|..., i:0#.f|..., c:0t.c|..., c:0(.s|...
if p.startswith(("c:0", "i:0")):
return False
if "|" in p:
return False
# Email-shape user
if "@" in p:
return False
# SharingLinks are handled by the dedicated resolver
if p.lower().startswith("sharinglinks."):
return False
return True
def _is_noise_principal(principal: str) -> bool:
"""
SharePoint surfaces several principal types at site-root level that are
not part of a meaningful root-permission inventory:
- SharingLinks.<guid>.<LinkType>.<guid> auto-created when a child item is shared
- System / built-in accounts (SHAREPOINT\\system, NT AUTHORITY\\*)
- "Limited Access System Group" SP groups
"""
if not principal:
return True
p = principal.lower()
if "sharinglinks." in p:
return True
if p.startswith("sharepoint\\") or p.startswith("nt authority\\"):
return True
if "limited access system group" in p:
return True
return False
def probe_site(site_url: str, auth: AuthConfig) -> ProbeResult:
"""
Lightweight preflight: validate that the configured credentials can
reach the site and read role assignments.
"""
if SHAREPOINT_SCAN_MODE == "placeholder":
return ProbeResult(ok=False, message="SHAREPOINT_SCAN_MODE=placeholder")
try:
validate_auth_config(auth)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=f"Config: {exc}")
parsed = urlparse(site_url)
host = parsed.netloc
if not host:
return ProbeResult(ok=False, message="Invalid site URL")
try:
token = _get_token_for_host(host, auth)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=f"Token: {str(exc)[:240]}")
headers = {
"Accept": "application/json;odata=nometadata",
"Authorization": f"Bearer {token}",
}
try:
_probe_request(f"{site_url}/_api/web?$select=Title", headers)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=_probe_hint(str(exc), stage="site"))
try:
_probe_request(
f"{site_url}/_api/web/roleassignments?$top=1&$select=PrincipalId",
headers,
)
except Exception as exc: # noqa: BLE001
return ProbeResult(ok=False, message=_probe_hint(str(exc), stage="roleassignments"))
return ProbeResult(ok=True, message="OK")
def resolve_sharing_link_members(
site_url: str,
group_name: str,
auth: AuthConfig,
) -> list[str]:
"""
Return members of a SharePoint group. When a member is itself an
M365/AAD group, expand it via Microsoft Graph (recursion-bounded).
Returns an empty list for anonymous links and groups that cannot be read.
"""
raw_users = _get_sp_group_users(site_url, group_name, auth)
members: list[str] = []
seen_groups: set[str] = set()
for user in raw_users:
members.extend(_render_principal(user, auth, seen_groups, depth=0))
return members
def _get_sp_group_users(site_url: str, group_name: str, auth: AuthConfig) -> list[dict]:
validate_auth_config(auth)
parsed = urlparse(site_url)
host = parsed.netloc
token = _get_token_for_host(host, auth)
headers = {
"Accept": "application/json;odata=nometadata",
"Authorization": f"Bearer {token}",
}
encoded = group_name.replace("'", "''")
url = (
f"{site_url}/_api/web/sitegroups/getbyname('{encoded}')/users"
"?$select=LoginName,Email,Title,PrincipalType"
)
try:
data = _request_json(url, headers)
except Exception: # noqa: BLE001
return []
return list(_extract_values(data))
# SharePoint PrincipalType values:
# 1 = User, 2 = DistributionList, 4 = SecurityGroup, 8 = SharePointGroup, 16 = All
_PRINCIPAL_TYPE_GROUP = {2, 4}
def _render_principal(user: dict, auth: AuthConfig, seen: set[str], depth: int) -> list[str]:
email = str(user.get("Email") or "").strip()
login = str(user.get("LoginName") or "").strip()
title = str(user.get("Title") or "").strip()
if login.upper().startswith("SHAREPOINT\\") or login.startswith("c:0(.s|true"):
return []
is_group = (
_to_int(user.get("PrincipalType")) in _PRINCIPAL_TYPE_GROUP
or "federateddirectoryclaimprovider" in login.lower()
or "tenant|" in login.lower()
)
if is_group and email and depth < 3:
nested = _expand_aad_group_via_graph(email, auth, seen, depth=depth + 1)
label = title or email
if nested:
return [f"{label} [{', '.join(nested)}]"]
return [f"{label} (group, no readable members)"]
if email:
return [email]
if title:
return [title]
if login:
return [login]
return []
def _expand_aad_group_via_graph(
group_mail: str,
auth: AuthConfig,
seen: set[str],
depth: int,
) -> list[str]:
if depth > 3:
return [f"… (recursion limit)"]
key = group_mail.strip().lower()
if not key or key in seen:
return []
seen.add(key)
try:
token = _get_token_for_host("graph.microsoft.com", auth)
except Exception: # noqa: BLE001
return []
headers = {"Accept": "application/json", "Authorization": f"Bearer {token}"}
safe_mail = key.replace("'", "''")
lookup_url = (
"https://graph.microsoft.com/v1.0/groups"
f"?$filter=mail eq '{safe_mail}'&$select=id,displayName"
)
try:
data = _request_json(lookup_url, headers)
except Exception: # noqa: BLE001
return []
groups = data.get("value") or []
if not groups:
return []
group_id = str(groups[0].get("id") or "").strip()
if not group_id:
return []
out: list[str] = []
out.extend(_graph_collect(f"/groups/{group_id}/members", headers, auth, seen, depth, owner=False))
out.extend(_graph_collect(f"/groups/{group_id}/owners", headers, auth, seen, depth, owner=True))
return _dedup_preserve_order(out)
def _graph_collect(
relative: str,
headers: dict[str, str],
auth: AuthConfig,
seen: set[str],
depth: int,
owner: bool,
) -> list[str]:
next_url: str | None = (
f"https://graph.microsoft.com/v1.0{relative}"
"?$select=id,userPrincipalName,mail,displayName"
)
out: list[str] = []
while next_url:
try:
data = _request_json(next_url, headers)
except Exception: # noqa: BLE001
return out
for entry in data.get("value", []):
otype = str(entry.get("@odata.type") or "")
if otype.endswith("user"):
upn = (
str(entry.get("userPrincipalName") or "").strip()
or str(entry.get("mail") or "").strip()
or str(entry.get("displayName") or "").strip()
)
if upn:
out.append(f"{upn} (owner)" if owner else upn)
elif otype.endswith("group"):
nested_mail = str(entry.get("mail") or "").strip()
if nested_mail:
nested = _expand_aad_group_via_graph(nested_mail, auth, seen, depth + 1)
label = str(entry.get("displayName") or nested_mail)
if nested:
out.append(f"{label} [{', '.join(nested)}]")
else:
out.append(f"{label} (group, no readable members)")
nl = data.get("@odata.nextLink")
next_url = nl if isinstance(nl, str) and nl else None
return out
def _dedup_preserve_order(items: list[str]) -> list[str]:
seen: set[str] = set()
result: list[str] = []
for item in items:
if item not in seen:
seen.add(item)
result.append(item)
return result
def _probe_request(url: str, headers: dict[str, str]) -> None:
response = requests.get(url, headers=headers, timeout=SCAN_HTTP_TIMEOUT_SEC)
if response.status_code >= 400:
snippet = (response.text or "").strip()[:200]
raise RuntimeError(f"HTTP {response.status_code}: {snippet or '{}'}")
def _probe_hint(error: str, stage: str) -> str:
if "401" in error:
if stage == "roleassignments":
return f"{error[:180]} — likely missing admin consent or insufficient permission"
return f"{error[:180]} — likely certificate not uploaded in Azure, or wrong tenant/client id"
if "403" in error:
return f"{error[:180]} — app has no access to this site (Sites.Selected without per-site grant?)"
if "404" in error:
return f"{error[:180]} — site not found"
return error[:220]
def _get_token_for_host(host: str, auth: AuthConfig) -> str:
auth_method = "cert" if auth.cert_thumbprint and auth.cert_private_key else "secret"
cache_key = f"{host}|{auth.tenant_id}|{auth.client_id}|{auth_method}"
cached = _TOKEN_CACHE.get(cache_key)
if cached:
return cached
scope = f"https://{host}/.default"
authority = f"https://login.microsoftonline.com/{auth.tenant_id}"
if auth_method == "cert":
client_credential = {
"thumbprint": auth.cert_thumbprint,
"private_key": auth.cert_private_key,
}
else:
client_credential = auth.client_secret
app = msal.ConfidentialClientApplication(
client_id=auth.client_id,
authority=authority,
client_credential=client_credential,
)
result = app.acquire_token_for_client(scopes=[scope])
if "access_token" not in result:
error = result.get("error", "unknown")
description = result.get("error_description", "")
raise RuntimeError(f"Token request failed ({error}): {description[:300]}")
token = str(result["access_token"])
_TOKEN_CACHE[cache_key] = token
return token
def _iter_paged(url: str, headers: dict[str, str]):
next_url = url
while next_url:
data = _request_json(next_url, headers)
for item in _extract_values(data):
yield item
next_url = _extract_next_link(data)
def _request_json(url: str, headers: dict[str, str]) -> dict:
last_error: str | None = None
for attempt in range(1, SCAN_HTTP_MAX_RETRIES + 1):
try:
response = requests.get(url, headers=headers, timeout=SCAN_HTTP_TIMEOUT_SEC)
if response.status_code in (429, 503):
retry_after = _to_int(response.headers.get("Retry-After"))
delay = retry_after if retry_after > 0 else SCAN_HTTP_BACKOFF_SEC * attempt
time.sleep(delay)
continue
if response.status_code >= 400:
raise RuntimeError(f"HTTP {response.status_code}: {response.text[:300]}")
return response.json()
except Exception as exc: # noqa: BLE001
last_error = str(exc)
if attempt < SCAN_HTTP_MAX_RETRIES:
time.sleep(SCAN_HTTP_BACKOFF_SEC * attempt)
continue
raise RuntimeError(f"Request failed for {url}: {last_error}") from exc
raise RuntimeError(f"Request failed for {url}: {last_error}")
def _extract_values(data: dict) -> list[dict]:
if "value" in data and isinstance(data["value"], list):
return data["value"]
d = data.get("d")
if isinstance(d, dict):
results = d.get("results")
if isinstance(results, list):
return results
return []
def _extract_next_link(data: dict) -> str | None:
for key in ("@odata.nextLink", "odata.nextLink", "__next"):
value = data.get(key)
if isinstance(value, str) and value:
return value
d = data.get("d")
if isinstance(d, dict):
value = d.get("__next")
if isinstance(value, str) and value:
return value
return None
def _get_role_assignments(url: str, headers: dict[str, str]) -> list[PermissionEntry]:
data = _request_json(url, headers)
assignments: list[PermissionEntry] = []
for item in _extract_values(data):
member = item.get("Member") or {}
principal = str(member.get("LoginName") or member.get("Title") or "").strip()
if not principal:
continue
role_bindings = item.get("RoleDefinitionBindings")
roles = _extract_role_names(role_bindings)
for role_name in roles:
if role_name.lower() == "limited access":
continue
assignments.append(PermissionEntry(principal=principal, role_name=role_name))
return assignments
_ROLE_NAME_NL_TO_EN: dict[str, str] = {
"volledig beheer": "Full Control",
"ontwerpen": "Design",
"bewerken": "Edit",
"bijdragen": "Contribute",
"lezen": "Read",
"beperkte toegang": "Limited Access",
"goedkeuren": "Approve",
"hiërarchieën beheren": "Manage Hierarchy",
"weergeven alleen": "View Only",
"beperkt lezen": "Restricted Read",
}
def _normalize_role_name(name: str) -> str:
return _ROLE_NAME_NL_TO_EN.get(name.lower(), name)
def _extract_role_names(bindings) -> list[str]:
if isinstance(bindings, list):
return [_normalize_role_name(str(x.get("Name") or "").strip()) for x in bindings if isinstance(x, dict) and x.get("Name")]
if isinstance(bindings, dict):
results = bindings.get("results")
if isinstance(results, list):
return [_normalize_role_name(str(x.get("Name") or "").strip()) for x in results if isinstance(x, dict) and x.get("Name")]
return []
def _deduplicate_hierarchical(deviations: list[DeviationRecord]) -> list[DeviationRecord]:
"""
Remove child-level deviations that are already covered by a parent in the URL hierarchy.
"""
sorted_devs = sorted(deviations, key=lambda d: len(d.object_url))
covered: dict[tuple[str, str], list[str]] = {}
result: list[DeviationRecord] = []
for dev in sorted_devs:
key = (dev.principal, dev.role_name)
ancestor_urls = covered.get(key)
if ancestor_urls:
parent = dev.object_url.rstrip("/")
already_covered = any(
parent == anc.rstrip("/") or parent.startswith(anc.rstrip("/") + "/")
for anc in ancestor_urls
)
if already_covered:
continue
else:
covered[key] = []
result.append(dev)
covered[key].append(dev.object_url)
return result
def _deviation_records_only_added(
object_url: str,
object_type: str,
root_set: set[PermissionEntry],
current_set: set[PermissionEntry],
) -> list[DeviationRecord]:
records: list[DeviationRecord] = []
for entry in sorted(current_set - root_set, key=lambda x: (x.principal.lower(), x.role_name.lower())):
records.append(
DeviationRecord(
object_url=object_url,
object_type=object_type,
principal=entry.principal,
role_name=entry.role_name,
delta_type="added",
)
)
return records
def _absolute_url(host: str, server_relative_url: str) -> str:
if not server_relative_url:
return f"https://{host}"
if server_relative_url.startswith("http://") or server_relative_url.startswith("https://"):
return server_relative_url
if not server_relative_url.startswith("/"):
server_relative_url = "/" + server_relative_url
return f"https://{host}{server_relative_url}"
def _to_int(value) -> int:
try:
if value is None:
return 0
return int(value)
except (TypeError, ValueError):
return 0
def _to_bool(value) -> bool:
if isinstance(value, bool):
return value
if isinstance(value, str):
return value.strip().lower() in ("1", "true", "yes")
return bool(value)

View File

@ -10,12 +10,14 @@ class CreateTenantProfileRequest(BaseModel):
tenant_id: str tenant_id: str
client_id: str client_id: str
client_secret: str | None = None client_secret: str | None = None
primary_domain: str | None = None
class TenantProfileItem(BaseModel): class TenantProfileItem(BaseModel):
id: str id: str
name: str name: str
tenant_id: str tenant_id: str
primary_domain: str | None = None
client_id: str client_id: str
has_certificate: bool has_certificate: bool
cert_thumbprint: str | None cert_thumbprint: str | None
@ -31,7 +33,13 @@ class TenantCertificateResponse(BaseModel):
class CreateScanJobRequest(BaseModel): class CreateScanJobRequest(BaseModel):
scan_type: str = "sharepoint"
site_urls: list[HttpUrl] = Field(default_factory=list) site_urls: list[HttpUrl] = Field(default_factory=list)
mailboxes: list[str] = Field(default_factory=list)
scan_all_mailboxes: bool = False
organization: str | None = None
group_ids: list[str] = Field(default_factory=list)
scan_all_groups: bool = False
skip_default_sites: bool = True skip_default_sites: bool = True
tenant_profile_id: str | None = None tenant_profile_id: str | None = None
tenant_id: str | None = None tenant_id: str | None = None
@ -43,6 +51,7 @@ class ScanJobSummary(BaseModel):
id: str id: str
status: str status: str
source_type: str source_type: str
scan_type: str
skip_default_sites: bool skip_default_sites: bool
tenant_profile_id: str | None tenant_profile_id: str | None
tenant_name: str | None tenant_name: str | None
@ -72,6 +81,16 @@ class ScanTargetItem(BaseModel):
error_message: str | None error_message: str | None
started_at: datetime | None started_at: datetime | None
finished_at: datetime | None finished_at: datetime | None
last_probe_at: datetime | None = None
last_probe_ok: bool | None = None
last_probe_message: str | None = None
class ProbeResultResponse(BaseModel):
target_id: int
ok: bool
message: str
last_probe_at: datetime
class PermissionDeviationItem(BaseModel): class PermissionDeviationItem(BaseModel):
@ -82,7 +101,8 @@ class PermissionDeviationItem(BaseModel):
principal: str principal: str
role_name: str role_name: str
delta_type: str delta_type: str
resolved_members: str | None permission_type: str | None = None
resolved_members: str | None = None
created_at: datetime created_at: datetime
@ -95,6 +115,16 @@ class ResolveSharingLinksResponse(BaseModel):
updated_deviations: int updated_deviations: int
class ResolveGroupsResponse(BaseModel):
resolved_groups: int
skipped_groups: int
updated_deviations: int
class SharingLinkTypesResponse(BaseModel):
type_counts: dict[str, int]
class ScanJobDetail(ScanJobSummary): class ScanJobDetail(ScanJobSummary):
targets: list[ScanTargetItem] targets: list[ScanTargetItem]
deviations: list[PermissionDeviationItem] deviations: list[PermissionDeviationItem]

View File

@ -16,7 +16,7 @@ from .config import (
) )
from .db import SessionLocal from .db import SessionLocal
from .models import PermissionDeviation, ScanJob, ScanTarget, TenantProfile from .models import PermissionDeviation, ScanJob, ScanTarget, TenantProfile
from .scanner import AuthConfig, scan_site_for_deviations from .scanners import AuthConfig, ProbeResult, probe, scan
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
@ -121,6 +121,28 @@ class ScanWorker:
job.updated_at = now job.updated_at = now
db.commit() db.commit()
probe = self._run_probe(target_id)
if not probe.ok:
with SessionLocal() as db:
job = db.get(ScanJob, job_id)
target = db.get(ScanTarget, target_id)
if not job or not target:
return
now = datetime.utcnow()
target.status = "failed"
target.attempts = 1
target.error_message = f"Preflight: {probe.message}"
target.finished_at = now
target.updated_at = now
job.processed_targets += 1
job.failed_targets += 1
job.heartbeat_at = now
job.updated_at = now
if not job.error_message:
job.error_message = "One or more scan targets failed preflight"
db.commit()
return
max_attempts = SCAN_TARGET_MAX_RETRIES + 1 max_attempts = SCAN_TARGET_MAX_RETRIES + 1
last_error: str | None = None last_error: str | None = None
latest_warning: str | None = None latest_warning: str | None = None
@ -147,6 +169,7 @@ class ScanWorker:
principal=deviation.principal, principal=deviation.principal,
role_name=deviation.role_name, role_name=deviation.role_name,
delta_type=deviation.delta_type, delta_type=deviation.delta_type,
permission_type=deviation.permission_type,
) )
) )
@ -196,6 +219,48 @@ class ScanWorker:
db.commit() db.commit()
def _run_probe(self, target_id: int):
with SessionLocal() as db:
target = db.get(ScanTarget, target_id)
if not target:
return ProbeResult(ok=False, message="Target not found")
site_url = target.site_url
job = db.get(ScanJob, target.job_id)
if not job:
return ProbeResult(ok=False, message="Job not found")
scan_type = job.scan_type or "sharepoint"
cert_private_key: str | None = None
cert_thumbprint: str | None = None
cert_public_pem: str | None = None
if job.tenant_profile_id:
profile = db.get(TenantProfile, job.tenant_profile_id)
if profile:
cert_private_key = profile.cert_private_key
cert_thumbprint = profile.cert_thumbprint
cert_public_pem = profile.cert_public_pem
auth = AuthConfig(
tenant_id=job.auth_tenant_id or "",
client_id=job.auth_client_id or "",
client_secret=job.auth_client_secret or "",
cert_private_key=cert_private_key,
cert_thumbprint=cert_thumbprint,
cert_public_pem=cert_public_pem,
)
result = probe(scan_type, site_url, auth)
with SessionLocal() as db:
target = db.get(ScanTarget, target_id)
if target:
now = datetime.utcnow()
target.last_probe_at = now
target.last_probe_ok = result.ok
target.last_probe_message = result.message
target.updated_at = now
db.commit()
return result
def _scan_with_timeout(self, target_id: int, timeout_sec: int): def _scan_with_timeout(self, target_id: int, timeout_sec: int):
with SessionLocal() as db: with SessionLocal() as db:
target = db.get(ScanTarget, target_id) target = db.get(ScanTarget, target_id)
@ -205,25 +270,30 @@ class ScanWorker:
job = db.get(ScanJob, target.job_id) job = db.get(ScanJob, target.job_id)
if not job: if not job:
raise RuntimeError(f"Job {target.job_id} not found for target {target_id}") raise RuntimeError(f"Job {target.job_id} not found for target {target_id}")
scan_type = job.scan_type or "sharepoint"
job_id = job.id
cert_private_key: str | None = None cert_private_key: str | None = None
cert_thumbprint: str | None = None cert_thumbprint: str | None = None
cert_public_pem: str | None = None
if job.tenant_profile_id: if job.tenant_profile_id:
profile = db.get(TenantProfile, job.tenant_profile_id) profile = db.get(TenantProfile, job.tenant_profile_id)
if profile: if profile:
cert_private_key = profile.cert_private_key cert_private_key = profile.cert_private_key
cert_thumbprint = profile.cert_thumbprint cert_thumbprint = profile.cert_thumbprint
cert_public_pem = profile.cert_public_pem
auth = AuthConfig( auth = AuthConfig(
tenant_id=job.auth_tenant_id or "", tenant_id=job.auth_tenant_id or "",
client_id=job.auth_client_id or "", client_id=job.auth_client_id or "",
client_secret=job.auth_client_secret or "", client_secret=job.auth_client_secret or "",
cert_private_key=cert_private_key, cert_private_key=cert_private_key,
cert_thumbprint=cert_thumbprint, cert_thumbprint=cert_thumbprint,
cert_public_pem=cert_public_pem,
) )
def progress_callback(activity: str, items: int) -> None: def progress_callback(activity: str, items: int) -> None:
try: try:
with SessionLocal() as db: with SessionLocal() as db:
job = db.get(ScanJob, target.job_id) job = db.get(ScanJob, job_id)
if job: if job:
job.scan_activity = activity job.scan_activity = activity
if items > 0: if items > 0:
@ -235,7 +305,7 @@ class ScanWorker:
pass pass
with ThreadPoolExecutor(max_workers=1) as pool: with ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(scan_site_for_deviations, site_url, auth, progress_callback) future = pool.submit(scan, scan_type, site_url, auth, progress_callback)
try: try:
return future.result(timeout=timeout_sec) return future.result(timeout=timeout_sec)
except FutureTimeoutError as exc: except FutureTimeoutError as exc:

View File

@ -2,7 +2,11 @@
## Scope ## Scope
Clearview scans SharePoint sites for permission deviations from the site root permission baseline. Clearview scans Microsoft 365 for permission deviations across two domains:
1. **SharePoint sites** — deviations relative to the site root permission baseline (libraries, folders, files).
2. **Exchange Online mailboxes** — non-default permissions: Full Access, Send As, Send on Behalf, and folder delegations (Calendar, Inbox).
Designed to monitor multiple customer tenants from a single instance. Designed to monitor multiple customer tenants from a single instance.
## Runtime Architecture ## Runtime Architecture
@ -16,13 +20,20 @@ All services are defined in `stack/docker-compose.yml` for Portainer deployment.
## Application Layout ## Application Layout
- `containers/clearview/site/` - `containers/clearview/site/`
- Frontend UI (tenant management, manual URL input, CSV import, jobs, deviations) - Frontend UI: vanilla HTML/JS/CSS with a fixed sidebar and hash-based routing.
- Routes: `#/dashboard`, `#/jobs`, `#/scan/sharepoint`, `#/scan/mailbox`, `#/tenants`, `#/settings`.
- `containers/clearview/src/clearview_app/` - `containers/clearview/src/clearview_app/`
- FastAPI backend - FastAPI backend
- SQLAlchemy models - SQLAlchemy models
- CSV parser - CSV parser (SharePoint URLs and mailbox UPNs)
- Default-site filtering - Default-site filtering (SharePoint only)
- Background worker for long-running scans - Background worker for long-running scans
- `containers/clearview/src/clearview_app/scanners/`
- `common.py``AuthConfig`, `DeviationRecord`, `ScanResult`, `ProbeResult`, shared helpers.
- `sharepoint.py` — SharePoint REST scanner, MSAL token cache, hierarchical dedup, SharingLinks helpers.
- `mailbox.py` — Exchange Online scanner; spawns `pwsh` with the EXO scripts.
- `exo_scripts/` — PowerShell scripts (`probe.ps1`, `get-permissions.ps1`).
- Dispatcher: `scanners.scan(scan_type, target, auth, progress)` and `scanners.probe(scan_type, target, auth)`.
## Multi-Tenant Model ## Multi-Tenant Model
@ -71,6 +82,7 @@ The scanner uses the certificate path when `cert_thumbprint` is present on the t
|---|---| |---|---|
| `client_secret` | Azure client secret (optional when a certificate is available) | | `client_secret` | Azure client secret (optional when a certificate is available) |
| `cert_private_key` | PEM-encoded private key (internal, never exposed via API) | | `cert_private_key` | PEM-encoded private key (internal, never exposed via API) |
| `cert_public_pem` | PEM-encoded public certificate (used to build a PFX for Exchange Online PowerShell) |
| `cert_thumbprint` | SHA-1 thumbprint (used by MSAL) | | `cert_thumbprint` | SHA-1 thumbprint (used by MSAL) |
| `cert_expires_at` | Certificate expiry date | | `cert_expires_at` | Certificate expiry date |
@ -85,6 +97,26 @@ Scans run asynchronously through a DB-backed job queue:
5. Background worker processes targets with retries and per-target timeout. 5. Background worker processes targets with retries and per-target timeout.
6. API/UI expose progress and deviations per job. 6. API/UI expose progress and deviations per job.
### Connection Preflight
Before the full scan of a target runs, the worker performs a lightweight probe to verify that the configured credentials can actually reach the site and read role assignments. This catches the common setup errors (missing admin consent, certificate not yet uploaded to Azure, wrong tenant/client ID) early and with a clear message, instead of producing a silent 401 during the full scan.
The probe issues two calls:
1. `GET /_api/web?$select=Title` — validates token + tenant + site URL.
2. `GET /_api/web/roleassignments?$top=1&$select=PrincipalId` — validates that the app actually has permission to read role assignments (not only basic read).
The result is persisted per target in `last_probe_at`, `last_probe_ok`, and `last_probe_message`. If the probe fails, the target is marked `failed` with `error_message = "Preflight: <hint>"` and the full scan is skipped. Hints interpret common HTTP codes:
| Code | Hint |
|---|---|
| 401 on `/_api/web` | Certificate not uploaded in Azure, or wrong tenant/client ID |
| 401 on `/roleassignments` | Admin consent missing, or granted permission too low |
| 403 | App has no access to this site (e.g. `Sites.Selected` without a per-site grant) |
| 404 | Site not found |
The same probe is exposed as an on-demand **Test connection** action on each target in the Job Details UI (see API Endpoints below). The action is blocked while the job is still queued or running.
### Timeout and Retry Controls ### Timeout and Retry Controls
Configured through environment variables (defaults shown): Configured through environment variables (defaults shown):
@ -162,6 +194,7 @@ GET /api/scan-jobs/{id} Get job detail (targets
POST /api/scan-jobs/{id}/cancel Cancel a queued or running job POST /api/scan-jobs/{id}/cancel Cancel a queued or running job
DELETE /api/scan-jobs/{id} Delete a completed job and all its data DELETE /api/scan-jobs/{id} Delete a completed job and all its data
POST /api/scan-jobs/{id}/resolve-sharing-links Resolve SharingLinks group members post-scan POST /api/scan-jobs/{id}/resolve-sharing-links Resolve SharingLinks group members post-scan
POST /api/scan-jobs/{id}/targets/{tid}/test-connection Re-run the connection preflight for one target
GET /api/scan-jobs/{id}/export Download deviations as .xlsx (optional ?site_url=) GET /api/scan-jobs/{id}/export Download deviations as .xlsx (optional ?site_url=)
``` ```
@ -189,23 +222,77 @@ Main tables:
| Table | Key columns | | Table | Key columns |
|---|---| |---|---|
| `tenant_profiles` | credentials, `cert_private_key`, `cert_thumbprint`, `cert_expires_at` | | `tenant_profiles` | credentials, `cert_private_key`, `cert_public_pem`, `cert_thumbprint`, `cert_expires_at` |
| `scan_jobs` | `status`, `tenant_profile_id`, progress counters, auth credentials | | `scan_jobs` | `status`, `scan_type` (`sharepoint`/`mailbox`), `tenant_profile_id`, progress counters, auth credentials |
| `scan_targets` | `job_id`, `site_url`, `status`, `attempts`, `error_message` | | `scan_targets` | `job_id`, `site_url` (holds UPN for mailbox jobs), `status`, `attempts`, `error_message`, `last_probe_at`, `last_probe_ok`, `last_probe_message` |
| `permission_deviations` | `job_id`, `site_url`, `object_url`, `object_type`, `principal`, `role_name`, `delta_type`, `resolved_members` | | `permission_deviations` | `job_id`, `site_url`, `object_url`, `object_type`, `principal`, `role_name`, `delta_type`, `permission_type`, `resolved_members` |
Scan jobs, targets, and deviations are cascade-deleted when a job is removed via `DELETE /api/scan-jobs/{id}`. Jobs with status `queued` or `running` cannot be deleted. Scan jobs, targets, and deviations are cascade-deleted when a job is removed via `DELETE /api/scan-jobs/{id}`. Jobs with status `queued` or `running` cannot be deleted.
Schema migrations for new columns are applied automatically on startup via `_ensure_schema_columns()` in `main.py`. Schema migrations for new columns are applied automatically on startup via `_ensure_schema_columns()` in `main.py`.
## Mailbox Scanning
Mailbox scans use Exchange Online PowerShell with certificate-based app-only auth.
### What is collected
| Permission | PowerShell source | `permission_type` value |
|---|---|---|
| Full Access (and other mailbox-level rights) | `Get-MailboxPermission` | `FullAccess` |
| Send As | `Get-RecipientPermission` (`AccessControlType=Allow`) | `SendAs` |
| Send on Behalf | mailbox property `GrantSendOnBehalfTo` | `SendOnBehalf` |
| Folder delegation — Calendar | `Get-MailboxFolderPermission "<upn>:\Calendar"` | `Folder:Calendar` |
| Folder delegation — Inbox | `Get-MailboxFolderPermission "<upn>:\Inbox"` | `Folder:Inbox` |
The scanner filters out `NT AUTHORITY\SELF`, `S-1-5-*` SIDs, inherited mailbox permissions, and the default folder principals (`Default`, `Anonymous` with `None` rights). What remains is stored as deviations on the job — there is no SharePoint-style root baseline; every non-default principal counts.
### Authentication
Mailbox scanning uses the **same tenant certificate** as SharePoint, but Exchange Online requires a `.pfx` rather than a thumbprint + raw private key. At scan time Clearview builds an in-memory PFX from `cert_private_key` + `cert_public_pem` (random password), writes it to a tempdir, and removes it immediately after the `pwsh` process exits.
### Targets
Three ways to seed a mailbox scan job:
1. **Manual UPNs** — paste one UPN per line.
2. **CSV import** — column `UserPrincipalName` / `Email` / `Mailbox` / `Primary SMTP Address` (auto-detected, case-insensitive).
3. **All mailboxes in tenant** — Clearview enumerates every mailbox via `Get-EXOMailbox -ResultSize Unlimited` and queues one target per mailbox. Requires the tenant's primary domain (e.g. `contoso.onmicrosoft.com`) so `Connect-ExchangeOnline -Organization` can authenticate. Capped at 50000 mailboxes per job.
### Required Azure permissions
In addition to the SharePoint setup the scan app needs:
- API permission: **Office 365 Exchange Online → Application permissions → `Exchange.ManageAsApp`** (admin-consented).
- Entra role assigned to the app's service principal: **Exchange Administrator** (cannot be granted via Microsoft Graph; must be assigned in Azure Portal → Entra ID → Roles and administrators).
### Runtime requirements
The container image installs:
- **PowerShell 7 (`pwsh`)** from the official Microsoft package repo.
- **`ExchangeOnlineManagement`** module from PSGallery (`Install-Module -Scope AllUsers`).
Adds roughly 150 MB to the image. Without these, mailbox probes return `pwsh not available in runtime` and scans fail.
### Probe
Mailbox preflight runs `probe.ps1` which connects to Exchange Online and calls `Get-EXOMailbox -Identity <upn> -PropertySets Minimum`. Failure hints map common errors:
| Error fragment | Hint |
|---|---|
| `Unauthorized` / `401` / `AADSTS*` | Check `Exchange.ManageAsApp` permission, admin consent, and the Exchange Administrator role assignment |
| `Couldn't find object` / `not found` | Mailbox does not exist in this tenant |
| `module not available` | `ExchangeOnlineManagement` PS module missing in the container |
## Build and Release ## Build and Release
Use `./build-and-push.sh` from repo root. `./build-and-push.sh` from the repo root, sourced from the shared script in `/docker/develop/shared-integrations/tooling/docker-build-and-push/`.
- `./build-and-push.sh t` for test build (`:dev` tag only) - `./build-and-push.sh t` — test build, push `:dev` tag only.
- `./build-and-push.sh 1` patch release - `./build-and-push.sh r` — release build, parses the version from `docs/changelog.md` (first `## vX.Y.Z` heading), pushes `:<version>`, `:dev`, and `:latest`.
- `./build-and-push.sh 2` minor release
- `./build-and-push.sh 3` major release The script performs no git operations. After a successful release, run the `git commit` / `git tag` / `git push --tags` commands the script prints in its summary.
## Current Scan Mode ## Current Scan Mode

View File

@ -2,6 +2,55 @@
This file documents changes on the develop branch of this project. This file documents changes on the develop branch of this project.
## [2026-04-28]
### Changed
- **Excel export sheet name + columns adapt to scan type** — second sheet is now named `Mailbox Permissions` for mailbox jobs, `Group Memberships` for Entra-group jobs, `Root Permissions` for SharePoint-root jobs, and `Deviations` for the original SharePoint deviation scan. Column sets are tailored per type so headers like "Object URL" / "Link Risk" / "Delta" no longer appear on exports where they don't apply. Targets sheet first column label switches between Site URL / Mailbox / Group based on the job.
### Added
- **Entra Group Scan** — new scan type `entra_groups` dedicated to enumerating Microsoft 365 / Azure AD group memberships. New `scanners/entra.py` resolves a target (Object ID, mail, or display name) via Microsoft Graph and stores one deviation per user with role `Member` or `Owner` (with `(via group > nested-group)` chain when expanded recursively). Group classification (Microsoft 365 / Security / Mail-enabled Security / Distribution) is stored in `permission_type`. New helper `entra.list_all_groups` for the "All groups in tenant" option. New CSV parser `parse_entra_groups_csv` reads the `Object ID` column from the Entra portal Groups export. New sidebar route `#/scan/entra` with three forms (manual IDs, CSV import, all-tenant). New filter option in the Scan Jobs type dropdown. Job Details renders Group / Group Type / User / Role columns for these jobs. Requires `Group.Read.All` on Microsoft Graph.
- **Recursive group expansion via Microsoft Graph** — when a SharePoint group member is itself a Microsoft 365 / Azure AD group, the resolver now expands it transitively. New helpers `_expand_aad_group_via_graph` and `_graph_collect` in `scanners/sharepoint.py` call `/groups?$filter=mail eq …` to look up the group, then `/groups/{id}/members` and `/groups/{id}/owners` to enumerate users. Owners are tagged with `(owner)` in the output. Recursion is depth-limited to 3 with a per-resolve `seen` set to break cycles. Output format puts nested members in square brackets after the group name, e.g. `Pharmacology@contoso.onmicrosoft.com [alice@contoso.com, bob@contoso.com (owner)]`. Requires the new `Group.Read.All` Application permission on Microsoft Graph (added to the onboarding instructions). Without it, group lines remain collapsed and labelled `(group, no readable members)`.
- **Resolve SharePoint groups** — new "Resolve groups" action on the Job Details panel for SharePoint and SharePoint-root jobs. Expands every SharePoint group principal (Owners / Members / Visitors / custom site groups) to its underlying user list via `/_api/web/sitegroups/getbyname/<group>/users` and writes the comma-separated members to `permission_deviations.resolved_members`. Members are rendered below the principal in the Deviations table and included in the Excel export. Azure AD security groups and federated claims (principals starting with `c:0…` / `i:0…` or containing `|`) are skipped — those would need `Group.Read.All` on Microsoft Graph. New endpoint `POST /api/scan-jobs/{id}/resolve-groups`, helper `sharepoint.is_sharepoint_group_principal()`.
- **SharePoint root-permissions scan mode** — new `scan_type='sharepoint_root'` that lists role assignments on the site root only, without traversing libraries/folders/files. Much faster (~1 HTTP call per target) and useful for an inventory of who has site-level access. New scanner function `sharepoint.scan_site_root_permissions`. Records are stored with `delta_type='root'` and `object_type='Site'`. Selectable on the New SharePoint Scan page via a "Scan mode" dropdown that controls both the manual-URL and CSV-import forms. New filter option in the Scan Jobs type filter. Noise filter `_is_noise_principal` excludes SharingLinks groups, `SHAREPOINT\system`/`NT AUTHORITY\*` accounts, and "Limited Access System Group" entries — these are SharePoint plumbing surfaced at site-root by spotted-item shares and are not part of a meaningful root inventory.
- **Tenant `primary_domain` field** — new column on `tenant_profiles`, exposed in the Add Tenant form (e.g. `contoso.onmicrosoft.com`). When set, the Mailbox scan page auto-fills the Organization field on tenant selection, and the API falls back to it when `organization` is omitted on a `scan_all_mailboxes` request. SharePoint scans are unaffected.
- **Expanded mailbox-scan onboarding instructions** — new "Enable mailbox scanning" section in the Add Tenant form covers adding the `Exchange.ManageAsApp` API permission, granting admin consent, assigning the Exchange Administrator Entra role to the service principal, certificate generation/upload, and primary-domain entry. Always visible (independent of automated/manual onboarding mode).
- **Scan all mailboxes in a tenant** — third option on the Mailbox scan page next to manual UPNs and CSV import. Clearview enumerates every mailbox via `Get-EXOMailbox -ResultSize Unlimited` and queues one target per mailbox. Requires the tenant's primary domain (e.g. `contoso.onmicrosoft.com`) and a tenant certificate. New PowerShell script `exo_scripts/list-mailboxes.ps1`, new Python helper `mailbox.list_mailboxes()`, new request fields `scan_all_mailboxes` and `organization`. Job source type is recorded as `tenant_all`.
### Changed
- **Sidebar logo** — replaced with a dark-background variant (`assets/clearview-logo-dark.svg`) so the "view" wordmark stays legible on the dark sidebar (previously rendered in `#141413` and was invisible).
- **English-only UI** — replaced remaining Dutch labels in the application with English equivalents: probe status `Nog niet getest`/`Mislukt` → `Not tested yet`/`Failed`, button label `Testen…``Testing…`, error toast `Test mislukt:``Test failed:`, and probe hints in `scanners/sharepoint.py` + `scanners/mailbox.py`. The Dutch→English role-name mapping table in `sharepoint.py` is unchanged (it normalizes incoming SharePoint role names).
- **Mailbox permission scanning** — Clearview can now scan Exchange Online mailboxes for delegated access alongside SharePoint sites.
- Permission categories collected: Full Access (`Get-MailboxPermission`), Send As (`Get-RecipientPermission`), Send on Behalf (`GrantSendOnBehalfTo` mailbox property), and folder delegations on Calendar and Inbox (`Get-MailboxFolderPermission`).
- Implementation: `pwsh` subprocess invoking the `ExchangeOnlineManagement` module with certificate-based app-only authentication (same tenant profile cert as SharePoint scans).
- Default principals (`NT AUTHORITY\SELF`, `S-1-5-*`, folder `Default`/`Anonymous=None`) are filtered out at scan time; only non-default permissions become deviations.
- Mailbox scans require a tenant certificate plus the `Office 365 Exchange Online → Exchange.ManageAsApp` API permission and the **Exchange Administrator** Entra role on the scan app's service principal. Client-secret auth is not supported by Exchange Online.
- **Frontend sidebar layout** — single-page UI replaced with a fixed left sidebar (200px, dark) and routed pages, mirroring the AlertHub layout convention.
- Routes via hash-based router: `#/dashboard`, `#/jobs`, `#/scan/sharepoint`, `#/scan/mailbox`, `#/tenants`, `#/settings`. Implementation stays vanilla HTML/JS/CSS (no React introduction).
- Job Details panel adapts column labels and headers based on `scan_type`: SharePoint shows Site/Object/Type/Principal/Role/Delta; Mailbox shows Mailbox/Object/Permission Type/Principal/Access Rights. SharingLinks resolution is hidden for mailbox jobs.
- Jobs list gets a **Type** column (SharePoint / Mailbox) and a type filter.
- **Scanners package**`clearview_app/scanner.py` split into `clearview_app/scanners/{__init__.py, common.py, sharepoint.py, mailbox.py, exo_scripts/}`. Public dispatcher `scanners.scan(scan_type, target, auth, progress)` and `scanners.probe(scan_type, target, auth)`. The original `scanner.py` remains as a thin compatibility shim re-exporting the SharePoint API.
- **Datamodel changes** (auto-migrated on startup):
- `scan_jobs.scan_type VARCHAR(32) NOT NULL DEFAULT 'sharepoint'`
- `permission_deviations.permission_type VARCHAR(32)` — populated by mailbox scans (`FullAccess`, `SendAs`, `SendOnBehalf`, `Folder:Calendar`, `Folder:Inbox`)
- `tenant_profiles.cert_public_pem TEXT` — public PEM is now stored alongside the private key so the mailbox scanner can build a `.pfx` for `Connect-ExchangeOnline -CertificateFilePath`. Existing tenants need to regenerate the certificate before mailbox scanning is available; SharePoint scans keep working with the existing key.
- **Mailbox CSV import**`parse_mailboxes_csv` accepts `UserPrincipalName` / `UPN` / `Email` / `Mailbox` / `Primary SMTP Address` columns with case-insensitive matching, dedup, and email-shape validation.
- **API additions**:
- `POST /api/scan-jobs` payload extended with `scan_type` and `mailboxes[]` next to the existing `site_urls[]`.
- `POST /api/scan-jobs/import-csv` accepts a `scan_type` form field (`sharepoint`|`mailbox`).
- `GET /api/scan-jobs?scan_type=…` filter.
- `ScanJobSummary.scan_type` and `PermissionDeviationItem.permission_type` returned.
- **Dockerfile** now installs Microsoft PowerShell 7 from the official Microsoft repository plus the `ExchangeOnlineManagement` PowerShell module from PSGallery. Adds ~150 MB to the image.
- **Build script migration** — replaced the local `build-and-push.sh` with the shared version from `/docker/develop/shared-integrations/tooling/docker-build-and-push/`. Reads the version from `docs/changelog.md` (release-summary file) instead of `version.txt`.
- **`docs/changelog.md`** — new release-summary changelog file used by the new build script. The development log (`changelog-develop.md`) remains the append-only source of truth for individual changes.
## [2026-04-23]
### Added
- **Connection preflight per scan target** — before a target is scanned, a lightweight probe validates that the configured credentials can reach the site and read role assignments (`/_api/web` + `/_api/web/roleassignments?$top=1`). Targets that fail preflight are marked `failed` with a clear reason (401/403/404 hints) instead of attempting the full scan. Fixes the previous silent-failure behaviour when admin consent or the certificate upload was missing in Azure.
- **Manual "Test" button** — new button in the Targets table in Job Details that re-runs the probe on demand. New endpoint: `POST /api/scan-jobs/{id}/targets/{target_id}/test-connection`. Blocked while the job is still queued or running.
- **Probe status in UI** — each target row shows the last probe result (OK / Mislukt / Nog niet getest) with timestamp and error message. Fields persist until the next test, so "last known status" remains visible even after permissions are later revoked.
- `scan_targets` table extended with `last_probe_at`, `last_probe_ok`, `last_probe_message` (auto-migrated on startup).
## [2026-04-13] ## [2026-04-13]
### Added ### Added

15
docs/changelog.md Normal file
View File

@ -0,0 +1,15 @@
# Clearview changelog
This file is the **release-summary** changelog used by `build-and-push.sh` to determine the current version. The first heading must be the most recent release in the form `## vX.Y.Z — YYYY-MM-DD`.
For day-by-day development history see [`changelog-develop.md`](changelog-develop.md).
## v0.1.0 — 2026-04-13
### Added
- Initial Clearview release: SharePoint permission deviation scanning across multiple customer tenants.
- Tenant Profiles with certificate-based or client-secret authentication.
- Asynchronous scan job queue with per-target preflight probe and retry handling.
- Job Details panel with site filter, Excel export, and SharingLinks resolution.
- CSV import of Microsoft Sites export format.
- Two onboarding modes (automated via Graph platform app, or manual).