Structured Data Governance
The previous chapters in this section address data protection for unstructured content — files in SharePoint, OneDrive, Exchange, and Teams. Sensitivity labels, DLP policies, and the Purview Information Protection scanner all operate on documents. This chapter addresses a different surface: structured data in relational databases, typically Azure SQL or Azure Synapse, where the governing tool is the Purview Data Map rather than Purview Information Protection.
The Escaped-RBAC Problem
Almost every organization that has moved beyond spreadsheets has a version of this problem. A system of record — an ERP, CRM, HRIS, or project management platform — enforces access control natively. Sales representatives see their territory. HR sees their region. Finance sees everything. That access model is built into the application and maintained by its own role administration.
The problem arrives when someone needs better reporting than the system provides. The natural solution is to replicate data into Azure SQL and build Power BI reports against it. This is the correct architectural move — but it has a consequence that is easy to miss: the moment data lands in Azure SQL, the source system's RBAC no longer applies. The database does not know that a sales representative is only supposed to see the Pacific Northwest. It knows only that the user has read access to the sales table.
The result is a governance gap with a predictable shape:
| In the Source System | In Azure SQL (unaddressed) |
|---|---|
| Sales rep sees own territory | Sales rep can query all territories |
| HR sees own region | HR can query all regions |
| Finance sees everything | Finance sees everything |
| Access revoked on termination | SQL access persists until manually removed |
| Field-level restrictions enforced by app | All columns readable by anyone with table access |
This gap is not a security failure — it is an architectural reality that requires an explicit governance response. Purview Data Map provides the discovery and classification layer; Power BI Row-Level Security provides the enforcement layer.
Purview Data Map vs. Purview Information Protection
These are distinct capabilities that share the Purview brand but serve different surfaces:
| Purview Information Protection | Purview Data Map | |
|---|---|---|
| Primary surface | M365 content (files, email, Teams messages) | Structured data sources (Azure SQL, Synapse, ADLSg2, on-prem SQL) |
| Governance unit | Document / email | Table, column, data asset |
| Classification | Sensitivity labels applied to files | Sensitive Information Types mapped to columns |
| Portal | Purview compliance portal | Purview governance portal (purview.microsoft.com) |
| Enforcement | DLP policies, label encryption | Power BI RLS, column masking, access policies |
| AI grounding | Copilot scoped by label + permission | Copilot grounded in certified, cataloged sources |
Both portals are part of the Microsoft Purview product family. Both are required for a complete governance posture when an organization has structured data outside M365.
- GCC High
- Commercial
Purview Data Map (formerly Azure Purview) is available in Azure Government. Access the governance portal at purview.microsoft.us. Confirm that your Azure SQL instance is deployed in an Azure Government region before registering it as a data source — cross-cloud scanning (Azure Government SQL → commercial Purview) is not supported.
The Purview governance portal for commercial tenants is at purview.microsoft.com. Azure SQL instances in any commercial Azure region can be registered as data sources.
Purview Data Map: Registering Azure SQL
The Purview Data Map is a metadata catalog. It does not store your data — it stores facts about your data: schema, column names, detected sensitive information types, data lineage, and business glossary terms. Registering a data source teaches Purview what exists. Scanning it classifies what it finds.
Prerequisites
- Purview account: Create a Microsoft Purview account in the Azure portal. The account requires a managed identity that will be used to authenticate to data sources.
- Azure SQL firewall: Allow the Purview managed identity to reach the Azure SQL server. The simplest approach is to enable Allow Azure services and resources to access this server in the SQL server firewall settings, then restrict further using Purview's managed private endpoints if required.
- SQL permissions: Grant the Purview managed identity
db_datareaderon the target databases. This is read-only access used only for schema discovery and data sampling during classification.
-- Run in each database Purview will scan
CREATE USER [your-purview-account-name] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-purview-account-name];
Registering the Data Source
- In the Purview governance portal, navigate to Data Map > Sources.
- Click Register and select Azure SQL Database.
- Select your subscription and SQL server. Choose whether to register individual databases or all databases on the server.
- Assign the source to a Collection — a logical grouping that controls who can see and manage the asset. Create collections that mirror your organizational structure (e.g., Finance, Operations, HR).
Running a Classification Scan
Once registered, create a scan to discover and classify the data:
- Select the registered source and click New scan.
- Choose a scan rule set — start with the built-in AzureSqlDatabase rule set, which includes Purview's library of Sensitive Information Types (SITs) covering financial account numbers, PII, health data, and credentials.
- Set the scan scope: select specific databases and schemas, or scan everything.
- Schedule the scan — run immediately for initial discovery, then weekly or monthly for ongoing monitoring.
After the scan completes, each table and column is annotated with detected classifications. A column containing 16-digit numbers in a payment context will be detected as Credit Card Number. A column with 9-digit patterns will be detected as U.S. Social Security Number. Custom classifications can be added for business-specific patterns (routing numbers, employee IDs, contract numbers).
Reading Scan Results
Navigate to Data Map > Assets and filter by source to see classified tables and columns. For each asset you can see:
| View | What It Shows |
|---|---|
| Schema | All tables and columns with their detected classifications |
| Lineage | Where data came from and where it flows (populated when Power BI datasets are also registered) |
| Contacts | Assigned owners and stewards |
| Glossary terms | Business definitions linked to this asset |
| Classifications | Sensitive Information Types detected in this column |
Column-Level Classification
Purview's classification engine samples data during the scan and matches column values against its SIT library. For structured data from ERP or CRM systems, the most commonly detected types include:
| Sensitive Information Type | Typical Source | Risk if Exposed |
|---|---|---|
| Bank Account Number | AP/AR systems | Financial fraud |
| Routing Number | Payment processing | Financial fraud |
| Credit Card Number | Point-of-sale, e-commerce | PCI-DSS scope, fraud |
| U.S. Social Security Number | HR, payroll | Identity theft |
| Date of Birth | HR, CRM | PII — privacy regulation |
| Email Address | CRM, ERP | Low individually; high in aggregate |
| Street Address | CRM, ERP | PII in regulated contexts |
| IP Address | Log tables | Network intelligence |
Custom Classifications
For business-specific sensitive columns not covered by built-in SITs — contract numbers, internal project codes, employee badge numbers — create custom classifications:
- Navigate to Management > Classification rules in the Purview governance portal.
- Create a Regular expression rule (for structured patterns like
PRJ-\d{6}) or a Dictionary rule (for enumerated values like internal department codes). - Add the rule to a custom scan rule set and re-scan affected sources.
Column Masking in Azure SQL
Classification in Purview is a metadata operation — it labels but does not restrict. For columns classified as sensitive, enforce Dynamic Data Masking in Azure SQL to limit what low-privilege users see in query results without changing the underlying data:
-- Mask bank account numbers — show only last 4 digits
ALTER TABLE dbo.Payments
ALTER COLUMN BankAccountNumber ADD MASKED WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)');
-- Mask SSNs — show only last 4 digits
ALTER TABLE dbo.Employees
ALTER COLUMN SSN ADD MASKED WITH (FUNCTION = 'partial(0,"XXX-XX-",4)');
-- Grant unmasked access to privileged roles only
GRANT UNMASK TO [FinanceAdmins];
Dynamic Data Masking operates at the SQL engine level — no application changes required — and applies to all query interfaces including Power BI DirectQuery connections.
Power BI Row-Level Security
Classification identifies what data is sensitive. Row-Level Security (RLS) enforces who can see which rows. For organizations that replicated ERP data into Azure SQL to escape reporting limitations, RLS in the Power BI semantic model is the primary enforcement mechanism that restores the access discipline the source system provided.
Mapping Source System Roles to RLS
Begin with the source system's role model. Document each role and its data scope before writing any DAX:
| Source System Role | Data Scope | Power BI RLS Equivalent |
|---|---|---|
| Sales Representative | Assigned territory / neighborhood | Rows where Territory = USERPRINCIPALNAME() lookup |
| Builder / Contractor | Assigned project or neighborhood | Rows where Project IN (user's project list) |
| Regional Manager | Multiple territories | Rows where Region = USERPRINCIPALNAME() lookup |
| Accounting | All financial data | No row filter (full access) |
| Executive / Leadership | All data | No row filter (full access) |
This mapping becomes the blueprint for your RLS roles in Power BI Desktop.
Implementing RLS in Power BI Desktop
RLS is defined in the Power BI Desktop file (.pbix) before publishing to the Power BI service.
Step 1: Create a user mapping table in Azure SQL
Rather than hardcoding territory assignments in DAX, maintain a mapping table that mirrors the source system's role assignments:
CREATE TABLE dbo.UserTerritoryMapping (
UserPrincipalName NVARCHAR(255) NOT NULL,
Territory NVARCHAR(100) NOT NULL,
AccessLevel NVARCHAR(50) NOT NULL -- 'Territory', 'Region', 'All'
);
Populate this table from the source system's role export and refresh it on the same schedule as the ERP replication.
Step 2: Define RLS roles in Power BI Desktop
In Power BI Desktop, go to Modeling > Manage roles and create a role for each access tier:
-- Role: Territory Rep
-- Filter on the Sales table
[Territory] = LOOKUPVALUE(
UserTerritoryMapping[Territory],
UserTerritoryMapping[UserPrincipalName], USERPRINCIPALNAME()
)
-- Role: Regional Manager
-- Filter on the Sales table (region-scoped)
[Region] = LOOKUPVALUE(
UserTerritoryMapping[Territory],
UserTerritoryMapping[UserPrincipalName], USERPRINCIPALNAME()
)
For users who should see all data (accounting, executives), do not assign them to any RLS role — users not assigned to a role see all data.
Step 3: Publish and assign roles in the Power BI service
After publishing the report to a Power BI workspace:
- Navigate to the semantic model settings in the Power BI service.
- Select Security and assign Entra ID users or security groups to each RLS role.
- Use Test as role to verify each role's filter behavior before communicating access to end users.
Rather than managing individual user assignments in Power BI, assign Entra ID security groups to RLS roles. The groups can be maintained in Entra with the same lifecycle management (provisioning, de-provisioning on termination) that governs other access. When a sales rep's territory changes, updating the mapping table and the Entra group membership is all that is required — no changes to the Power BI report.
RLS works correctly in both Import and DirectQuery mode, but the timing differs. In Import mode, data is refreshed on a schedule and RLS filters apply to the cached dataset — a terminated user's access is removed when their Entra account is disabled, not when data refreshes. In DirectQuery mode, every query hits Azure SQL live, and SQL-level permissions (Dynamic Data Masking, row-level security in SQL itself) are an additional enforcement layer. For highly sensitive data with frequent role changes, DirectQuery with SQL-native RLS provides the strongest posture.
Testing RLS Before Deployment
Test each role using View as in the Power BI service:
- Open the published report.
- Select More options (...) on the semantic model > Security.
- Use Test as role and enter a test user's UPN to verify they see only the rows their role permits.
- Confirm that totals and aggregates reflect only in-scope data — not just that individual row details are hidden.
Aggregate leakage is the most common RLS mistake: a sales rep who cannot see other territories' individual records may still see a company-wide total if the DAX measure is not filtered correctly. Validate every card, KPI, and summary visual.
Business Glossary and Source-of-Truth Certification
The Purview Data Catalog addresses a problem that technical controls cannot: inconsistent definitions. When multiple teams use the same word to mean different things — "sale" means contract signed to Sales, means revenue recognized to Finance, means unit closed to Operations — reports contradict each other and AI responses become unreliable.
Building the Business Glossary
In the Purview governance portal, navigate to Data Catalog > Business glossary and create terms for each canonical concept:
| Term | Definition | Owner | Related Assets |
|---|---|---|---|
| Sale | A contract executed by a buyer, confirmed with a deposit, and not subsequently cancelled. Excludes reservations and holds. | VP Sales | dbo.Contracts, dbo.SalesTransactions |
| Cancellation | A signed contract subsequently voided by either party prior to closing. Distinct from an expired reservation. | VP Sales | dbo.Contracts |
| Closed Unit | A sale that has completed escrow and transferred title. The authoritative count for revenue recognition. | Controller | dbo.Closings |
| Active Inventory | Units that are built or under construction, not yet sold or cancelled. | VP Operations | dbo.Units |
Once terms are defined, link them to the columns in the Data Map that represent them. A Power BI developer or Copilot querying "how many sales did we close in Q1?" now has an authoritative, agreed-upon definition to draw from rather than relying on column names or stale documentation.
Certifying Authoritative Sources
Purview's endorsement feature allows data stewards to mark assets as authoritative. This is the technical mechanism for declaring that Azure SQL (replicated from the ERP) is the source of truth — not the Excel exports individual users have saved to SharePoint.
To endorse an asset:
- Navigate to the asset in the Data Catalog.
- Select Edit > Certification and choose Certified.
- Assign a data steward as the certifying contact.
Certified assets are surfaced preferentially in Purview search results and, when integrated with Microsoft Fabric, in Copilot's data grounding pipeline. Non-certified assets (desktop exports, unmanaged SharePoint files) are discoverable but not promoted as authoritative.
Data Stewardship
Technical certification is only as durable as the human process behind it. Assign explicit steward roles for each certified data domain:
| Domain | Steward Role | Responsibilities |
|---|---|---|
| Sales data | Sales Operations | Validate glossary terms, approve new column classifications, review access reports quarterly |
| Financial data | Controller | Certify financial assets, approve column masking policy changes |
| HR / Payroll | HR Manager | Approve access to HR schema, review user mapping table changes |
| Customer PII | Privacy Officer (or equivalent) | Review scan results for new PII columns, escalate classification gaps |
These are not full-time roles — they are accountability assignments that take a few hours per quarter once the initial catalog is established.
Copilot Governance: Scoping AI to Certified Sources
Microsoft 365 Copilot draws from the data its grounding pipeline can reach: M365 content the user has access to, and (with Copilot in Power BI or Fabric) the semantic models the user can query. The governance work done in earlier steps directly shapes what Copilot can and cannot draw from.
Sensitivity labels limit which M365 content Copilot will surface. A document labeled Highly Confidential and encrypted to a Finance group will not appear in a sales rep's Copilot response.
Power BI RLS limits which rows Copilot in Power BI returns. A sales rep asking "how are we tracking against target?" receives data filtered to their territory — not because Copilot has territory logic, but because the semantic model it queries enforces RLS against their identity.
Purview certification guides which data sources Copilot and Microsoft Fabric treat as authoritative when multiple assets contain similar data. When a user asks a Fabric Copilot question against the data catalog, certified assets are weighted higher than uncertified ones.
The ungoverned path to bad AI: If individual users continue exporting ERP data to Excel and storing it on SharePoint, Copilot will find those files and may return answers grounded in stale, ungoverned exports rather than the certified SQL source. The technical controls (RLS, masking, certification) cannot compensate for an uncontrolled export pipeline. Governance requires a change management conversation alongside the technical deployment:
"The Excel exports you've been using for ad-hoc analysis are not going away — but they are not authoritative. Any report or AI response that needs to be trusted by leadership or customers should come from Power BI, which draws from the governed source."
Phased Implementation
The structured data governance work described in this chapter is substantial. A practical sequencing:
| Phase | Focus | Outcome |
|---|---|---|
| 1 — Discovery | Register Azure SQL in Purview Data Map. Run initial scan. Document tables, columns, detected classifications. Map source system roles to SQL tables. | Written data estate map; column classification report |
| 2 — Enforce Access | Implement Power BI RLS aligned to source system roles. Apply Dynamic Data Masking to highest-sensitivity columns. | Source system access model restored in reporting layer |
| 3 — Catalog | Build business glossary for key terms. Certify authoritative assets. Assign stewards. Remove or label non-authoritative SharePoint exports. | Single source of truth established; AI grounding trustworthy |
| 4 — Sustain | Automate mapping table updates from source system role changes. Schedule Purview re-scans for new tables. Quarterly steward reviews. | Governance maintained without manual intervention |
- GCC High
- Commercial
Structured Data — CMMC Control Mapping
Structured data governance addresses CMMC controls in the access control and media protection families, extending coverage beyond M365 content to Azure SQL workloads:
| CMMC Practice | Structured Data Governance Mechanism |
|---|---|
| AC.L2-3.1.1 (Authorized Access) | Power BI RLS restricts report access to authorized users. Azure SQL Dynamic Data Masking limits field-level exposure. User mapping table maintained in sync with Entra ID provisioning lifecycle. |
| AC.L2-3.1.3 (CUI Flow Control) | Purview Data Map classification identifies columns containing CUI-equivalent data (account numbers, PII) in Azure SQL. Column masking prevents unauthorized read of sensitive fields. |
| AC.L2-3.1.5 (Least Privilege) | RLS roles are scoped to minimum required data access. Accounting and executive roles with full access are explicitly documented and reviewed. Service accounts used for Power BI refresh are granted db_datareader only. |
| AU.L2-3.3.1 (Audit Logging) | Azure SQL Auditing logs all query activity to a Log Analytics workspace. Power BI usage metrics track which reports and datasets are accessed by which users. |
| MP.L2-3.8.1 (Media Protection) | Purview Data Map scan results feed the asset inventory for CUI in Azure SQL. Classified columns are tracked in the data catalog as part of the SSP data inventory. |
| CM.L2-3.4.2 (Security Configuration) | Azure SQL server configuration (firewall rules, auditing enabled, TDE enforced, managed identity authentication only) is documented as a baseline configuration in the SSP. |
Structured Data — NIST SP 800-171 Rev. 3 Control Mapping
Structured data governance extends NIST SP 800-171 coverage to Azure SQL workloads beyond M365 content:
| NIST Requirement | Structured Data Governance Mechanism |
|---|---|
| 3.1.1 (Authorized Access) | Power BI RLS restricts report access to authorized users. Azure SQL Dynamic Data Masking limits field-level exposure. |
| 3.1.3 (Information Flow Control) | Purview Data Map classification identifies sensitive columns. Column masking and RLS control the flow of sensitive data to reporting consumers. |
| 3.1.5 (Least Privilege) | RLS roles scoped to minimum required data access. Service accounts for Power BI refresh granted read-only database access. |
| 3.3.1 (Audit Logging) | Azure SQL Auditing logs query activity to Log Analytics. Power BI usage metrics track report and dataset access. |
| 3.4.2 (Security Configuration) | Azure SQL baseline configuration (firewall, auditing, TDE, managed identity) documented as part of the system security plan. |
Structured Data Next Steps
| Topic | Where |
|---|---|
| Sensitivity labels for M365 files | Sensitivity Labels |
| DLP policies for M365 content | Data Loss Prevention Policies |
| Purview deployment phasing | Purview Deployment Blueprint |
| Scanning on-premises file shares | Purview Information Protection Scanner |
| Asset inventory across M365 | Asset Inventory |
📩 Don't Miss the Next Solution
Join the list to see the real-time solutions I'm delivering to my GCC High clients.