Structured Data Governance

The previous chapters in this section address data protection for unstructured content: files in SharePoint, OneDrive, Exchange, and Teams. Sensitivity labels, DLP policies, and the Purview Information Protection scanner all operate on documents. This chapter addresses a different surface: structured data in relational databases, typically Azure SQL or Azure Synapse, where the governing tool is the Purview Data Map rather than Purview Information Protection.

The Escaped-RBAC Problem

Almost every organization that has moved beyond spreadsheets has a version of this problem. A system of record (an ERP, CRM, HRIS, or project management platform) enforces access control natively. Sales representatives see their territory. HR sees their region. Finance sees everything. That access model is built into the application and maintained by its own role administration.

The problem arrives when someone needs better reporting than the system provides. The natural solution is to replicate data into Azure SQL and build Power BI reports against it. This is the correct architectural move, but it has a consequence that is easy to miss: the moment data lands in Azure SQL, the source system's RBAC no longer applies. The database does not know that a sales representative is only supposed to see the Pacific Northwest. It knows only that the user has read access to the sales table.

The result is a governance gap with a predictable shape:

In the Source System	In Azure SQL (unaddressed)
Sales rep sees own territory	Sales rep can query all territories
HR sees own region	HR can query all regions
Finance sees everything	Finance sees everything
Access revoked on termination	SQL access persists until manually removed
Field-level restrictions enforced by app	All columns readable by anyone with table access

Purview Data Map provides the discovery and classification layer; Power BI Row-Level Security provides the enforcement layer.

Purview Data Map vs. Purview Information Protection

These are distinct capabilities that share the Purview brand but serve different surfaces:

	Purview Information Protection	Purview Data Map
Primary surface	M365 content (files, email, Teams messages)	Structured data sources (Azure SQL, Synapse, ADLSg2, on-prem SQL)
Governance unit	Document / email	Table, column, data asset
Classification	Sensitivity labels applied to files	Sensitive Information Types mapped to columns
Portal	Purview compliance portal	Purview governance portal (`purview.microsoft.com`)
Enforcement	DLP policies, label encryption	Power BI RLS, column masking, access policies
AI grounding	Copilot scoped by label + permission	Copilot grounded in certified, cataloged sources

Both are required for a complete governance posture when an organization has structured data outside M365.

GCC High
Commercial

Purview Data Map availability in GCC High

Purview Data Map (formerly Azure Purview) is available in Azure Government. Access the governance portal at purview.microsoft.us. Confirm that your Azure SQL instance is deployed in an Azure Government region before registering it as a data source: cross-cloud scanning (Azure Government SQL → commercial Purview) is not supported.

The Purview governance portal for commercial tenants is at purview.microsoft.com. Azure SQL instances in any commercial Azure region can be registered as data sources.

Purview Data Map: Registering Azure SQL

The Purview Data Map is a metadata catalog. It does not store your data: it stores facts about your data: schema, column names, detected sensitive information types, data lineage, and business glossary terms. Registering a data source teaches Purview what exists. Scanning it classifies what it finds.

Prerequisites

Purview account: Create a Microsoft Purview account in the Azure portal. The account requires a managed identity that will be used to authenticate to data sources.
Azure SQL firewall: Allow the Purview managed identity to reach the Azure SQL server. The simplest approach is to enable Allow Azure services and resources to access this server in the SQL server firewall settings, then restrict further using Purview's managed private endpoints if required.
SQL permissions: Grant the Purview managed identity db_datareader on the target databases. This is read-only access used only for schema discovery and data sampling during classification.

-- Run in each database Purview will scan
CREATE USER [your-purview-account-name] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-purview-account-name];

Registering the Data Source

In the Purview governance portal, navigate to Data Map > Sources.
Click Register and select Azure SQL Database.
Select your subscription and SQL server. Choose whether to register individual databases or all databases on the server.
Assign the source to a Collection, a logical grouping that controls who can see and manage the asset. Create collections that mirror your organizational structure (e.g., Finance, Operations, HR).

Running a Classification Scan

Once registered, create a scan to discover and classify the data:

Select the registered source and click New scan.
Choose a scan rule set: start with the built-in AzureSqlDatabase rule set, which includes Purview's library of Sensitive Information Types (SITs) covering financial account numbers, PII, health data, and credentials.
Set the scan scope: select specific databases and schemas, or scan everything.
Schedule the scan: run immediately for initial discovery, then weekly or monthly for ongoing monitoring.

After the scan completes, each table and column is annotated with detected classifications. A column containing 16-digit numbers in a payment context will be detected as Credit Card Number. A column with 9-digit patterns will be detected as U.S. Social Security Number. Custom classifications can be added for business-specific patterns (routing numbers, employee IDs, contract numbers).

Reading Scan Results

Navigate to Data Map > Assets and filter by source to see classified tables and columns. For each asset you can see:

View	What It Shows
Schema	All tables and columns with their detected classifications
Lineage	Where data came from and where it flows (populated when Power BI datasets are also registered)
Contacts	Assigned owners and stewards
Glossary terms	Business definitions linked to this asset
Classifications	Sensitive Information Types detected in this column

Column-Level Classification

Purview's classification engine samples data during the scan and matches column values against its SIT library. For structured data from ERP or CRM systems, the most commonly detected types include:

Sensitive Information Type	Typical Source	Risk if Exposed
Bank Account Number	AP/AR systems	Financial fraud
Routing Number	Payment processing	Financial fraud
Credit Card Number	Point-of-sale, e-commerce	PCI-DSS scope, fraud
U.S. Social Security Number	HR, payroll	Identity theft
Date of Birth	HR, CRM	PII: privacy regulation
Email Address	CRM, ERP	Low individually; high in aggregate
Street Address	CRM, ERP	PII in regulated contexts
IP Address	Log tables	Network intelligence

Custom Classifications

For business-specific sensitive columns not covered by built-in SITs (contract numbers, internal project codes, employee badge numbers), create custom classifications:

Navigate to Management > Classification rules in the Purview governance portal.
Create a Regular expression rule (for structured patterns like PRJ-\d{6}) or a Dictionary rule (for enumerated values like internal department codes).
Add the rule to a custom scan rule set and re-scan affected sources.

Column Masking in Azure SQL

Classification in Purview is a metadata operation: it labels but does not restrict. For columns classified as sensitive, enforce Dynamic Data Masking in Azure SQL to limit what low-privilege users see in query results without changing the underlying data:

-- Mask bank account numbers — show only last 4 digits
ALTER TABLE dbo.Payments
ALTER COLUMN BankAccountNumber ADD MASKED WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)');

-- Mask SSNs — show only last 4 digits
ALTER TABLE dbo.Employees
ALTER COLUMN SSN ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)');

-- Grant unmasked access to privileged roles only
GRANT UNMASK TO [FinanceAdmins];

Dynamic Data Masking operates at the SQL engine level (no application changes required) and applies to all query interfaces including Power BI DirectQuery connections.

Power BI Row-Level Security

For organizations that replicated ERP data into Azure SQL to escape reporting limitations, Row-Level Security (RLS) in the Power BI semantic model is the primary enforcement mechanism that restores the access discipline the source system provided.

Mapping Source System Roles to RLS

Begin with the source system's role model. Document each role and its data scope before writing any DAX:

Source System Role	Data Scope	Power BI RLS Equivalent
Sales Representative	Assigned territory / neighborhood	Rows where `Territory = USERPRINCIPALNAME()` lookup
Builder / Contractor	Assigned project or neighborhood	Rows where `Project IN` (user's project list)
Regional Manager	Multiple territories	Rows where `Region = USERPRINCIPALNAME()` lookup
Accounting	All financial data	No row filter (full access)
Executive / Leadership	All data	No row filter (full access)

This mapping becomes the blueprint for your RLS roles in Power BI Desktop.

Implementing RLS in Power BI Desktop

RLS is defined in the Power BI Desktop file (.pbix) before publishing to the Power BI service.

Step 1: Create a user mapping table in Azure SQL

Rather than hardcoding territory assignments in DAX, maintain a mapping table that mirrors the source system's role assignments:

CREATE TABLE dbo.UserTerritoryMapping (
    UserPrincipalName NVARCHAR(255) NOT NULL,
    Territory         NVARCHAR(100) NOT NULL,
    AccessLevel       NVARCHAR(50)  NOT NULL  -- 'Territory', 'Region', 'All'
);

Populate this table from the source system's role export and refresh it on the same schedule as the ERP replication.

Step 2: Define RLS roles in Power BI Desktop

In Power BI Desktop, go to Modeling > Manage roles and create a role for each access tier:

-- Role: Territory Rep
-- Filter on the Sales table
[Territory] = LOOKUPVALUE(
    UserTerritoryMapping[Territory],
    UserTerritoryMapping[UserPrincipalName], USERPRINCIPALNAME()
)

-- Role: Regional Manager
-- Filter on the Sales table (region-scoped)
[Region] = LOOKUPVALUE(
    UserTerritoryMapping[Territory],
    UserTerritoryMapping[UserPrincipalName], USERPRINCIPALNAME()
)

For users who should see all data (accounting, executives), do not assign them to any RLS role: users not assigned to a role see all data.

Step 3: Publish and assign roles in the Power BI service

After publishing the report to a Power BI workspace:

Navigate to the semantic model settings in the Power BI service.
Select Security and assign Entra ID users or security groups to each RLS role.
Use Test as role to verify each role's filter behavior before communicating access to end users.

Sync RLS assignments from Entra ID groups

Rather than managing individual user assignments in Power BI, assign Entra ID security groups to RLS roles. The groups can be maintained in Entra with the same lifecycle management (provisioning, de-provisioning on termination) that governs other access. When a sales rep's territory changes, updating the mapping table and the Entra group membership is all that is required: no changes to the Power BI report.

DirectQuery vs. Import mode affects RLS behavior

RLS works correctly in both Import and DirectQuery mode, but the timing differs. In Import mode, data is refreshed on a schedule and RLS filters apply to the cached dataset: a terminated user's access is removed when their Entra account is disabled, not when data refreshes. In DirectQuery mode, every query hits Azure SQL live, and SQL-level permissions (Dynamic Data Masking, row-level security in SQL itself) are an additional enforcement layer. For highly sensitive data with frequent role changes, DirectQuery with SQL-native RLS provides the strongest posture.

Testing RLS Before Deployment

Test each role using View as in the Power BI service:

Open the published report.
Select More options (...) on the semantic model > Security.
Use Test as role and enter a test user's UPN to verify they see only the rows their role permits.
Confirm that totals and aggregates reflect only in-scope data, not just that individual row details are hidden.

Aggregate leakage is the most common RLS mistake: a sales rep who cannot see other territories' individual records may still see a company-wide total if the DAX measure is not filtered correctly. Validate every card, KPI, and summary visual.

Business Glossary and Source-of-Truth Certification

The Purview Data Catalog addresses a problem that technical controls cannot: inconsistent definitions. When multiple teams use the same word to mean different things ("sale" means contract signed to Sales, means revenue recognized to Finance, means unit closed to Operations), reports contradict each other and AI responses become unreliable.

Building the Business Glossary

In the Purview governance portal, navigate to Data Catalog > Business glossary and create terms for each canonical concept:

Term	Definition	Owner	Related Assets
Sale	A contract executed by a buyer, confirmed with a deposit, and not subsequently cancelled. Excludes reservations and holds.	VP Sales	`dbo.Contracts`, `dbo.SalesTransactions`
Cancellation	A signed contract subsequently voided by either party prior to closing. Distinct from an expired reservation.	VP Sales	`dbo.Contracts`
Closed Unit	A sale that has completed escrow and transferred title. The authoritative count for revenue recognition.	Controller	`dbo.Closings`
Active Inventory	Units that are built or under construction, not yet sold or cancelled.	VP Operations	`dbo.Units`

Once terms are defined, link them to the columns in the Data Map that represent them. A Power BI developer or Copilot querying "how many sales did we close in Q1?" now has an authoritative, agreed-upon definition to draw from rather than relying on column names or stale documentation.

Certifying Authoritative Sources

Purview's endorsement feature allows data stewards to mark assets as authoritative. This is the technical mechanism for declaring that Azure SQL (replicated from the ERP) is the source of truth, not the Excel exports individual users have saved to SharePoint.

To endorse an asset:

Navigate to the asset in the Data Catalog.
Select Edit > Certification and choose Certified.
Assign a data steward as the certifying contact.

Certified assets are surfaced preferentially in Purview search results and, when integrated with Microsoft Fabric, in Copilot's data grounding pipeline. Non-certified assets (desktop exports, unmanaged SharePoint files) are discoverable but not promoted as authoritative.

Data Stewardship

Technical certification is only as durable as the human process behind it. Assign explicit steward roles for each certified data domain:

Domain	Steward Role	Responsibilities
Sales data	Sales Operations	Validate glossary terms, approve new column classifications, review access reports quarterly
Financial data	Controller	Certify financial assets, approve column masking policy changes
HR / Payroll	HR Manager	Approve access to HR schema, review user mapping table changes
Customer PII	Privacy Officer (or equivalent)	Review scan results for new PII columns, escalate classification gaps

These are not full-time roles: they are accountability assignments that take a few hours per quarter once the initial catalog is established.

Copilot Governance: Scoping AI to Certified Sources

Microsoft 365 Copilot draws from the data its grounding pipeline can reach: M365 content the user has access to, and (with Copilot in Power BI or Fabric) the semantic models the user can query. The governance work done in earlier steps directly shapes what Copilot can and cannot draw from.

Sensitivity labels limit which M365 content Copilot will surface. A document labeled Restricted and encrypted to a Finance group will not appear in a sales rep's Copilot response.

Power BI RLS limits which rows Copilot in Power BI returns. A sales rep asking "how are we tracking against target?" receives data filtered to their territory, not because Copilot has territory logic, but because the semantic model it queries enforces RLS against their identity.

Purview certification guides which data sources Copilot and Microsoft Fabric treat as authoritative when multiple assets contain similar data. When a user asks a Fabric Copilot question against the data catalog, certified assets are weighted higher than uncertified ones.

The ungoverned path to bad AI: If individual users continue exporting ERP data to Excel and storing it on SharePoint, Copilot will find those files and may return answers grounded in stale, ungoverned exports rather than the certified SQL source. The technical controls (RLS, masking, certification) cannot compensate for an uncontrolled export pipeline. Governance requires a change management conversation alongside the technical deployment:

"The Excel exports you've been using for ad-hoc analysis are not going away, but they are not authoritative. Any report or AI response that needs to be trusted by leadership or customers should come from Power BI, which draws from the governed source."

Phased Implementation

The structured data governance work described in this chapter is substantial. A practical sequencing:

Phase	Focus	Outcome
1: Discovery	Register Azure SQL in Purview Data Map. Run initial scan. Document tables, columns, detected classifications. Map source system roles to SQL tables.	Written data estate map; column classification report
2: Enforce Access	Implement Power BI RLS aligned to source system roles. Apply Dynamic Data Masking to highest-sensitivity columns.	Source system access model restored in reporting layer
3: Catalog	Build business glossary for key terms. Certify authoritative assets. Assign stewards. Remove or label non-authoritative SharePoint exports.	Single source of truth established; AI grounding trustworthy
4: Sustain	Automate mapping table updates from source system role changes. Schedule Purview re-scans for new tables. Quarterly steward reviews.	Governance maintained without manual intervention

GCC High
Commercial

Structured Data: CMMC Control Mapping

Structured data governance addresses CMMC controls in the access control and media protection families, extending coverage beyond M365 content to Azure SQL workloads:

CMMC Practice	Structured Data Governance Mechanism
AC.L2-3.1.1 (Authorized Access)	Power BI RLS restricts report access to authorized users. Azure SQL Dynamic Data Masking limits field-level exposure. User mapping table maintained in sync with Entra ID provisioning lifecycle.
AC.L2-3.1.3 (CUI Flow Control)	Purview Data Map classification identifies columns containing CUI-equivalent data (account numbers, PII) in Azure SQL. Column masking prevents unauthorized read of sensitive fields.
AC.L2-3.1.5 (Least Privilege)	RLS roles are scoped to minimum required data access. Accounting and executive roles with full access are explicitly documented and reviewed. Service accounts used for Power BI refresh are granted `db_datareader` only.
AU.L2-3.3.1 (Audit Logging)	Azure SQL Auditing logs all query activity to a Log Analytics workspace. Power BI usage metrics track which reports and datasets are accessed by which users.
MP.L2-3.8.1 (Media Protection)	Purview Data Map scan results feed the asset inventory for CUI in Azure SQL. Classified columns are tracked in the data catalog as part of the SSP data inventory.
CM.L2-3.4.2 (Security Configuration)	Azure SQL server configuration (firewall rules, auditing enabled, TDE enforced, managed identity authentication only) is documented as a baseline configuration in the SSP.

Structured Data: NIST SP 800-171 Rev. 3 Control Mapping

Structured data governance extends NIST SP 800-171 coverage to Azure SQL workloads beyond M365 content:

NIST Requirement	Structured Data Governance Mechanism
3.1.1 (Authorized Access)	Power BI RLS restricts report access to authorized users. Azure SQL Dynamic Data Masking limits field-level exposure.
3.1.3 (Information Flow Control)	Purview Data Map classification identifies sensitive columns. Column masking and RLS control the flow of sensitive data to reporting consumers.
3.1.5 (Least Privilege)	RLS roles scoped to minimum required data access. Service accounts for Power BI refresh granted read-only database access.
3.3.1 (Audit Logging)	Azure SQL Auditing logs query activity to Log Analytics. Power BI usage metrics track report and dataset access.
3.4.2 (Security Configuration)	Azure SQL baseline configuration (firewall, auditing, TDE, managed identity) documented as part of the system security plan.

Structured Data Next Steps

Topic	Where
Sensitivity labels for M365 files	Sensitivity Labels
DLP policies for M365 content	Data Loss Prevention Policies
Purview deployment phasing	Purview Deployment Blueprint
Scanning on-premises file shares	Purview Information Protection Scanner
Asset inventory across M365	Asset Inventory

📩 Don't Miss the Next Solution

Join the list to see the real-time solutions I'm delivering to my GCC High clients.

The Escaped-RBAC Problem​

Purview Data Map vs. Purview Information Protection​

Purview Data Map: Registering Azure SQL​

Prerequisites​

Registering the Data Source​

Running a Classification Scan​

Reading Scan Results​

Column-Level Classification​

Custom Classifications​

Column Masking in Azure SQL​

Power BI Row-Level Security​

Mapping Source System Roles to RLS​

Implementing RLS in Power BI Desktop​

Testing RLS Before Deployment​

Business Glossary and Source-of-Truth Certification​

Building the Business Glossary​

Certifying Authoritative Sources​

Data Stewardship​

Copilot Governance: Scoping AI to Certified Sources​

Phased Implementation​

Structured Data: CMMC Control Mapping​

Structured Data: NIST SP 800-171 Rev. 3 Control Mapping​

Structured Data Next Steps​