Skip to main content

Structured Data Governance

The previous chapters in this section address data protection for unstructured content — files in SharePoint, OneDrive, Exchange, and Teams. Sensitivity labels, DLP policies, and the Purview Information Protection scanner all operate on documents. This chapter addresses a different surface: structured data in relational databases, typically Azure SQL or Azure Synapse, where the governing tool is the Purview Data Map rather than Purview Information Protection.

The Escaped-RBAC Problem

Almost every organization that has moved beyond spreadsheets has a version of this problem. A system of record — an ERP, CRM, HRIS, or project management platform — enforces access control natively. Sales representatives see their territory. HR sees their region. Finance sees everything. That access model is built into the application and maintained by its own role administration.

The problem arrives when someone needs better reporting than the system provides. The natural solution is to replicate data into Azure SQL and build Power BI reports against it. This is the correct architectural move — but it has a consequence that is easy to miss: the moment data lands in Azure SQL, the source system's RBAC no longer applies. The database does not know that a sales representative is only supposed to see the Pacific Northwest. It knows only that the user has read access to the sales table.

The result is a governance gap with a predictable shape:

In the Source SystemIn Azure SQL (unaddressed)
Sales rep sees own territorySales rep can query all territories
HR sees own regionHR can query all regions
Finance sees everythingFinance sees everything
Access revoked on terminationSQL access persists until manually removed
Field-level restrictions enforced by appAll columns readable by anyone with table access

This gap is not a security failure — it is an architectural reality that requires an explicit governance response. Purview Data Map provides the discovery and classification layer; Power BI Row-Level Security provides the enforcement layer.

Purview Data Map vs. Purview Information Protection

These are distinct capabilities that share the Purview brand but serve different surfaces:

Purview Information ProtectionPurview Data Map
Primary surfaceM365 content (files, email, Teams messages)Structured data sources (Azure SQL, Synapse, ADLSg2, on-prem SQL)
Governance unitDocument / emailTable, column, data asset
ClassificationSensitivity labels applied to filesSensitive Information Types mapped to columns
PortalPurview compliance portalPurview governance portal (purview.microsoft.com)
EnforcementDLP policies, label encryptionPower BI RLS, column masking, access policies
AI groundingCopilot scoped by label + permissionCopilot grounded in certified, cataloged sources

Both portals are part of the Microsoft Purview product family. Both are required for a complete governance posture when an organization has structured data outside M365.

Purview Data Map availability in GCC High

Purview Data Map (formerly Azure Purview) is available in Azure Government. Access the governance portal at purview.microsoft.us. Confirm that your Azure SQL instance is deployed in an Azure Government region before registering it as a data source — cross-cloud scanning (Azure Government SQL → commercial Purview) is not supported.

Purview Data Map: Registering Azure SQL

The Purview Data Map is a metadata catalog. It does not store your data — it stores facts about your data: schema, column names, detected sensitive information types, data lineage, and business glossary terms. Registering a data source teaches Purview what exists. Scanning it classifies what it finds.

Prerequisites

  • Purview account: Create a Microsoft Purview account in the Azure portal. The account requires a managed identity that will be used to authenticate to data sources.
  • Azure SQL firewall: Allow the Purview managed identity to reach the Azure SQL server. The simplest approach is to enable Allow Azure services and resources to access this server in the SQL server firewall settings, then restrict further using Purview's managed private endpoints if required.
  • SQL permissions: Grant the Purview managed identity db_datareader on the target databases. This is read-only access used only for schema discovery and data sampling during classification.
-- Run in each database Purview will scan
CREATE USER [your-purview-account-name] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-purview-account-name];

Registering the Data Source

  1. In the Purview governance portal, navigate to Data Map > Sources.
  2. Click Register and select Azure SQL Database.
  3. Select your subscription and SQL server. Choose whether to register individual databases or all databases on the server.
  4. Assign the source to a Collection — a logical grouping that controls who can see and manage the asset. Create collections that mirror your organizational structure (e.g., Finance, Operations, HR).

Running a Classification Scan

Once registered, create a scan to discover and classify the data:

  1. Select the registered source and click New scan.
  2. Choose a scan rule set — start with the built-in AzureSqlDatabase rule set, which includes Purview's library of Sensitive Information Types (SITs) covering financial account numbers, PII, health data, and credentials.
  3. Set the scan scope: select specific databases and schemas, or scan everything.
  4. Schedule the scan — run immediately for initial discovery, then weekly or monthly for ongoing monitoring.

After the scan completes, each table and column is annotated with detected classifications. A column containing 16-digit numbers in a payment context will be detected as Credit Card Number. A column with 9-digit patterns will be detected as U.S. Social Security Number. Custom classifications can be added for business-specific patterns (routing numbers, employee IDs, contract numbers).

Reading Scan Results

Navigate to Data Map > Assets and filter by source to see classified tables and columns. For each asset you can see:

ViewWhat It Shows
SchemaAll tables and columns with their detected classifications
LineageWhere data came from and where it flows (populated when Power BI datasets are also registered)
ContactsAssigned owners and stewards
Glossary termsBusiness definitions linked to this asset
ClassificationsSensitive Information Types detected in this column

Column-Level Classification

Purview's classification engine samples data during the scan and matches column values against its SIT library. For structured data from ERP or CRM systems, the most commonly detected types include:

Sensitive Information TypeTypical SourceRisk if Exposed
Bank Account NumberAP/AR systemsFinancial fraud
Routing NumberPayment processingFinancial fraud
Credit Card NumberPoint-of-sale, e-commercePCI-DSS scope, fraud
U.S. Social Security NumberHR, payrollIdentity theft
Date of BirthHR, CRMPII — privacy regulation
Email AddressCRM, ERPLow individually; high in aggregate
Street AddressCRM, ERPPII in regulated contexts
IP AddressLog tablesNetwork intelligence

Custom Classifications

For business-specific sensitive columns not covered by built-in SITs — contract numbers, internal project codes, employee badge numbers — create custom classifications:

  1. Navigate to Management > Classification rules in the Purview governance portal.
  2. Create a Regular expression rule (for structured patterns like PRJ-\d{6}) or a Dictionary rule (for enumerated values like internal department codes).
  3. Add the rule to a custom scan rule set and re-scan affected sources.

Column Masking in Azure SQL

Classification in Purview is a metadata operation — it labels but does not restrict. For columns classified as sensitive, enforce Dynamic Data Masking in Azure SQL to limit what low-privilege users see in query results without changing the underlying data:

-- Mask bank account numbers — show only last 4 digits
ALTER TABLE dbo.Payments
ALTER COLUMN BankAccountNumber ADD MASKED WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)');

-- Mask SSNs — show only last 4 digits
ALTER TABLE dbo.Employees
ALTER COLUMN SSN ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)');

-- Grant unmasked access to privileged roles only
GRANT UNMASK TO [FinanceAdmins];

Dynamic Data Masking operates at the SQL engine level — no application changes required — and applies to all query interfaces including Power BI DirectQuery connections.

Power BI Row-Level Security

Classification identifies what data is sensitive. Row-Level Security (RLS) enforces who can see which rows. For organizations that replicated ERP data into Azure SQL to escape reporting limitations, RLS in the Power BI semantic model is the primary enforcement mechanism that restores the access discipline the source system provided.

Mapping Source System Roles to RLS

Begin with the source system's role model. Document each role and its data scope before writing any DAX:

Source System RoleData ScopePower BI RLS Equivalent
Sales RepresentativeAssigned territory / neighborhoodRows where Territory = USERPRINCIPALNAME() lookup
Builder / ContractorAssigned project or neighborhoodRows where Project IN (user's project list)
Regional ManagerMultiple territoriesRows where Region = USERPRINCIPALNAME() lookup
AccountingAll financial dataNo row filter (full access)
Executive / LeadershipAll dataNo row filter (full access)

This mapping becomes the blueprint for your RLS roles in Power BI Desktop.

Implementing RLS in Power BI Desktop

RLS is defined in the Power BI Desktop file (.pbix) before publishing to the Power BI service.

Step 1: Create a user mapping table in Azure SQL

Rather than hardcoding territory assignments in DAX, maintain a mapping table that mirrors the source system's role assignments:

CREATE TABLE dbo.UserTerritoryMapping (
UserPrincipalName NVARCHAR(255) NOT NULL,
Territory NVARCHAR(100) NOT NULL,
AccessLevel NVARCHAR(50) NOT NULL -- 'Territory', 'Region', 'All'
);

Populate this table from the source system's role export and refresh it on the same schedule as the ERP replication.

Step 2: Define RLS roles in Power BI Desktop

In Power BI Desktop, go to Modeling > Manage roles and create a role for each access tier:

-- Role: Territory Rep
-- Filter on the Sales table
[Territory] = LOOKUPVALUE(
UserTerritoryMapping[Territory],
UserTerritoryMapping[UserPrincipalName], USERPRINCIPALNAME()
)

-- Role: Regional Manager
-- Filter on the Sales table (region-scoped)
[Region] = LOOKUPVALUE(
UserTerritoryMapping[Territory],
UserTerritoryMapping[UserPrincipalName], USERPRINCIPALNAME()
)

For users who should see all data (accounting, executives), do not assign them to any RLS role — users not assigned to a role see all data.

Step 3: Publish and assign roles in the Power BI service

After publishing the report to a Power BI workspace:

  1. Navigate to the semantic model settings in the Power BI service.
  2. Select Security and assign Entra ID users or security groups to each RLS role.
  3. Use Test as role to verify each role's filter behavior before communicating access to end users.
Sync RLS assignments from Entra ID groups

Rather than managing individual user assignments in Power BI, assign Entra ID security groups to RLS roles. The groups can be maintained in Entra with the same lifecycle management (provisioning, de-provisioning on termination) that governs other access. When a sales rep's territory changes, updating the mapping table and the Entra group membership is all that is required — no changes to the Power BI report.

DirectQuery vs. Import mode affects RLS behavior

RLS works correctly in both Import and DirectQuery mode, but the timing differs. In Import mode, data is refreshed on a schedule and RLS filters apply to the cached dataset — a terminated user's access is removed when their Entra account is disabled, not when data refreshes. In DirectQuery mode, every query hits Azure SQL live, and SQL-level permissions (Dynamic Data Masking, row-level security in SQL itself) are an additional enforcement layer. For highly sensitive data with frequent role changes, DirectQuery with SQL-native RLS provides the strongest posture.

Testing RLS Before Deployment

Test each role using View as in the Power BI service:

  1. Open the published report.
  2. Select More options (...) on the semantic model > Security.
  3. Use Test as role and enter a test user's UPN to verify they see only the rows their role permits.
  4. Confirm that totals and aggregates reflect only in-scope data — not just that individual row details are hidden.

Aggregate leakage is the most common RLS mistake: a sales rep who cannot see other territories' individual records may still see a company-wide total if the DAX measure is not filtered correctly. Validate every card, KPI, and summary visual.

Business Glossary and Source-of-Truth Certification

The Purview Data Catalog addresses a problem that technical controls cannot: inconsistent definitions. When multiple teams use the same word to mean different things — "sale" means contract signed to Sales, means revenue recognized to Finance, means unit closed to Operations — reports contradict each other and AI responses become unreliable.

Building the Business Glossary

In the Purview governance portal, navigate to Data Catalog > Business glossary and create terms for each canonical concept:

TermDefinitionOwnerRelated Assets
SaleA contract executed by a buyer, confirmed with a deposit, and not subsequently cancelled. Excludes reservations and holds.VP Salesdbo.Contracts, dbo.SalesTransactions
CancellationA signed contract subsequently voided by either party prior to closing. Distinct from an expired reservation.VP Salesdbo.Contracts
Closed UnitA sale that has completed escrow and transferred title. The authoritative count for revenue recognition.Controllerdbo.Closings
Active InventoryUnits that are built or under construction, not yet sold or cancelled.VP Operationsdbo.Units

Once terms are defined, link them to the columns in the Data Map that represent them. A Power BI developer or Copilot querying "how many sales did we close in Q1?" now has an authoritative, agreed-upon definition to draw from rather than relying on column names or stale documentation.

Certifying Authoritative Sources

Purview's endorsement feature allows data stewards to mark assets as authoritative. This is the technical mechanism for declaring that Azure SQL (replicated from the ERP) is the source of truth — not the Excel exports individual users have saved to SharePoint.

To endorse an asset:

  1. Navigate to the asset in the Data Catalog.
  2. Select Edit > Certification and choose Certified.
  3. Assign a data steward as the certifying contact.

Certified assets are surfaced preferentially in Purview search results and, when integrated with Microsoft Fabric, in Copilot's data grounding pipeline. Non-certified assets (desktop exports, unmanaged SharePoint files) are discoverable but not promoted as authoritative.

Data Stewardship

Technical certification is only as durable as the human process behind it. Assign explicit steward roles for each certified data domain:

DomainSteward RoleResponsibilities
Sales dataSales OperationsValidate glossary terms, approve new column classifications, review access reports quarterly
Financial dataControllerCertify financial assets, approve column masking policy changes
HR / PayrollHR ManagerApprove access to HR schema, review user mapping table changes
Customer PIIPrivacy Officer (or equivalent)Review scan results for new PII columns, escalate classification gaps

These are not full-time roles — they are accountability assignments that take a few hours per quarter once the initial catalog is established.

Copilot Governance: Scoping AI to Certified Sources

Microsoft 365 Copilot draws from the data its grounding pipeline can reach: M365 content the user has access to, and (with Copilot in Power BI or Fabric) the semantic models the user can query. The governance work done in earlier steps directly shapes what Copilot can and cannot draw from.

Sensitivity labels limit which M365 content Copilot will surface. A document labeled Highly Confidential and encrypted to a Finance group will not appear in a sales rep's Copilot response.

Power BI RLS limits which rows Copilot in Power BI returns. A sales rep asking "how are we tracking against target?" receives data filtered to their territory — not because Copilot has territory logic, but because the semantic model it queries enforces RLS against their identity.

Purview certification guides which data sources Copilot and Microsoft Fabric treat as authoritative when multiple assets contain similar data. When a user asks a Fabric Copilot question against the data catalog, certified assets are weighted higher than uncertified ones.

The ungoverned path to bad AI: If individual users continue exporting ERP data to Excel and storing it on SharePoint, Copilot will find those files and may return answers grounded in stale, ungoverned exports rather than the certified SQL source. The technical controls (RLS, masking, certification) cannot compensate for an uncontrolled export pipeline. Governance requires a change management conversation alongside the technical deployment:

"The Excel exports you've been using for ad-hoc analysis are not going away — but they are not authoritative. Any report or AI response that needs to be trusted by leadership or customers should come from Power BI, which draws from the governed source."

Phased Implementation

The structured data governance work described in this chapter is substantial. A practical sequencing:

PhaseFocusOutcome
1 — DiscoveryRegister Azure SQL in Purview Data Map. Run initial scan. Document tables, columns, detected classifications. Map source system roles to SQL tables.Written data estate map; column classification report
2 — Enforce AccessImplement Power BI RLS aligned to source system roles. Apply Dynamic Data Masking to highest-sensitivity columns.Source system access model restored in reporting layer
3 — CatalogBuild business glossary for key terms. Certify authoritative assets. Assign stewards. Remove or label non-authoritative SharePoint exports.Single source of truth established; AI grounding trustworthy
4 — SustainAutomate mapping table updates from source system role changes. Schedule Purview re-scans for new tables. Quarterly steward reviews.Governance maintained without manual intervention

Structured Data — CMMC Control Mapping

Structured data governance addresses CMMC controls in the access control and media protection families, extending coverage beyond M365 content to Azure SQL workloads:

CMMC PracticeStructured Data Governance Mechanism
AC.L2-3.1.1 (Authorized Access)Power BI RLS restricts report access to authorized users. Azure SQL Dynamic Data Masking limits field-level exposure. User mapping table maintained in sync with Entra ID provisioning lifecycle.
AC.L2-3.1.3 (CUI Flow Control)Purview Data Map classification identifies columns containing CUI-equivalent data (account numbers, PII) in Azure SQL. Column masking prevents unauthorized read of sensitive fields.
AC.L2-3.1.5 (Least Privilege)RLS roles are scoped to minimum required data access. Accounting and executive roles with full access are explicitly documented and reviewed. Service accounts used for Power BI refresh are granted db_datareader only.
AU.L2-3.3.1 (Audit Logging)Azure SQL Auditing logs all query activity to a Log Analytics workspace. Power BI usage metrics track which reports and datasets are accessed by which users.
MP.L2-3.8.1 (Media Protection)Purview Data Map scan results feed the asset inventory for CUI in Azure SQL. Classified columns are tracked in the data catalog as part of the SSP data inventory.
CM.L2-3.4.2 (Security Configuration)Azure SQL server configuration (firewall rules, auditing enabled, TDE enforced, managed identity authentication only) is documented as a baseline configuration in the SSP.

Structured Data Next Steps

TopicWhere
Sensitivity labels for M365 filesSensitivity Labels
DLP policies for M365 contentData Loss Prevention Policies
Purview deployment phasingPurview Deployment Blueprint
Scanning on-premises file sharesPurview Information Protection Scanner
Asset inventory across M365Asset Inventory

📩 Don't Miss the Next Solution

Join the list to see the real-time solutions I'm delivering to my GCC High clients.