TL;DR:
My 3 straightforward and one hidden Joker are:
- Joker #1: Pattern-Driven Repository Structure 🗂 ️
- Joker #2: Organized Code 👩🏻 💻
- Joker #3: (Non-) Embedded Documentation 📜
- [🃏 Hidden Joker: Refinement Loop 🃏]
!["Simple and Consistent. - This is how I would describe to someone the 2 most important dimensions to keep in mind when building a semantic layer." [Photo by Zuzana Ruttkay on Unsplash]](jpg/04gbqrk65hfepzh1c-scaled.jpg)
Semantic – the study of linguistic meaning
According to Wikipedia, the term semantic, which is the study of linguistic meaning, examines what meaning is.
Or per se – how words get their meaning and how the meaning of a complex expression depends on its parts [1].
Although the term semantic is straightforwardly explained, I honestly had to dwell for a while on the part "the meaning of a complex expression depends on its parts", because I wanted to re-use it to explain the semantic layer in analytics.
After rereading it, my explanation goes as follows:
Similar to semantics in the linguistic context, the semantic layer in analytics is about making data meaningful.
Much like words together form a specific meaning, which leads to an understanding of what was said, the raw data from different sources gets enriched and forms specific insights.
Much like meaning depends on the combination of the words in expressions, the outcome derived from the raw data depends on the modelling approach in the semantic layer.
And much like _properly structure_d and well-formed expressions lead to ease of understanding, properly modelled data leads to quality data insights.
In summary, it is all about how one can create better, faster, and novel __ value from raw data, which leads to insights and understandin_g of_ business action(s) to take.
This is the core purpose of the semantic layer, and building one is filled with numerous challenges.
The two key ones that I always face when building a semantic layer are (1) simplicity and (2) consistency.
Or, better said, how to achieve them both.
- When I try to simplify my semantic layer by placing focus on one area of my analytical development, this usually undermines my consistency.
- And when I try to make my semantic layer consistent, it usually results in being overly complex.
So, finding a balance, or more precisely, the structure of the semantic layer that will be both simple and consistent, is relevant as it impacts the speed and sometimes the quality of delivering business value.
Hence, in this post, I’ll share my 3 "jokers" for balancing these challenges while creating a semantic layer that’s built to evolve with your business needs.
My explanations, with the provided visual templates, will mostly be based on how I built a semantic layer using Looker.
However, they will be generalized and can be applied to building a semantic layer in any other BI or modelling tool (like dbt Core or Dataform).
Let’s dive in.
Joker #1: Pattern-Driven Repository Structure 🗂 ️
The repository structure of the semantic layer is the blueprint that you need to focus on. As a foundation task, it really needs to be both simple and consistent.
Why, you ask?
- Because the proper repository structure of the semantic layer needs to serve both technical (for data modelling and testing development) and business colleagues (for the self-service analytics part). Long story short, it needs to be understandable to both sides.
- Because business requirements will "explode" with time, and so will your analytical development. If you don’t keep track of the semantic layer repository structure patterns, the chances are high your development will result in redundancy and it will lack proper development standards.
So, what are the key components that a good semantic layer repository should contain:
- (1) Layers of folders or sub-folders—create clear layers in your repository hierarchy, separating business logic from technical logic. This way, both technical and business users can "be on the same page" when they talk about the same data model or the same method for comparing the data insights (e.g., specific period-over-period method or specific forecasting model).
- (2) _Naming conventions—_correlated to the first component, naming conventions play an important part in standardizing development and keeping your repository structure tidy. Properly defined naming conventions result in faster data modelling by following consistent patterns and also ease navigability during the troubleshooting process.
To provide more context to my theory above, I will explain visually my blueprint for the semantic layer repository structure.
.
└── Semantic Layer/
├── area_association_rules/
├── models/
├── association_rules.model.lkml
├── frequent_items.model.lkml
├── views_shop/
├── association_rules.view.lkml
├── frequent_items.view.lkml
├── area_benchmarking/
├── area_demand_forecast/
├── area_financial_forecasting/
├── area_customer_intelligence/
├── ....
├── area_business_controlling/
├── models/
├── views/
├── views_derived/
├── views_aggregated/
├── area_finances/
├── area_logistics/
├── area_performance_marketing/
├── ....
├── base_views/
├── base_date_granularity/
├── base_date_granularity_customised/
├── base_pop_logic/
├── base_pop_logic_customised/
├── ....
├── data_tests/
├── documentation/
├── locales/
├── manifest.lkml
├── README.md
How I usually structure my semantic layer repository is that I separate it into the area_*
__ and the base_*_
_ folders:
(1) The **area_***
folders.
- The use of these folders is to focus analytical development on organizing the code into specific business areas (e.g.,
area_finances
orarea_marketing
) OR the cross-shared business analytical cases (e.g.,area_demand forecast
,area_benchmarking
, etc.). - Each
area_*
folder encapsulates the logic and code relevant to that specific business department or use case, and it’s further split intomodels
andviews_*
folders. - The
models
folder in the semantic layer contains themodel
files, while theviews_*
folders contain theview
files or aggregated/derived views. | 👉🏼 Note: More info about these files can be found in the next section.
(2) The **base_***
folders.
- The use of these folders is to focus on separating the core logic or functionality, like customized timeframe analysis methods, and cross-shared
views
that are used in multiple business areas and encapsulated within numerousmodel
files of the semantic layer. - This organization ensures that the common business modelling logic is not scattered across multiple areas, and it’s centrally managed, resulting in consistency across the whole semantic layer.
Applying this template, I experienced faster data development (of the whole team) and speed/ease in troubleshooting because I focused on creating a pattern-driven repository structure.
Having in mind the repository folders themselves contain different files, keeping the code tidy is equally important.</p><!-- /wp:paragraph -->
<!-- wp:paragraph --><p>This leads me to my second Joker - creating clean code files.</p><!-- /wp:paragraph -->
<!-- wp:heading {"level":2} --><h2 class="wp-block-heading">Joker #2: Organized Code 👩🏻 💻</h2><!-- /wp:heading -->
<!-- wp:paragraph --><p>In most of the semantic layers, there are common types of <code> files that a data project repository contains.</p><!-- /wp:paragraph -->
<!-- wp:paragraph --><p>As an example, in the Looker semantic layer, the 3 main types of <code> files are <strong><a href="https://cloud.google.com/looker/docs/lookml-terms-and-concepts#model">model</a></strong>, <strong><a href="https://cloud.google.com/looker/docs/lookml-terms-and-concepts#view">view,</a> and <a href="https://cloud.google.com/looker/docs/reference/param-project-manifest">manifest</a></strong> [2].</p><!-- /wp:paragraph -->
<!-- wp:paragraph --><p>| 👉🏼 N_<strong>ote: T</strong>here are m<a href="https://cloud.google.com/looker/docs/lookml-project-files#view_files">ore file types in Looker,</a> but I will focus on the three main listed above._</p><!-- /wp:paragraph -->
<!-- wp:paragraph --><p>Every single one of the listed files has its own code logic, and this code logic can be organized.</p><!-- /wp:paragraph -->
<!-- wp:paragraph --><p>Let me provide you again with visual templates of how I organized my code within specific files in Looker:</p><!-- /wp:paragraph -->
<!-- wp:paragraph --><p><strong>(1) The</strong> <code>model
file – contains information about the views (tables) and how they should be joined together in Explore [2].
###########
# MODEL_NAME & METADATA
# Description: This model reflects the [business context].
# Author: [Your Name] | Created on: [Date]
# Contributors: [Your Name] | Last change on: [Date]
###########
#########
# 1. DISPLAY & CONNECTION PARAMETERS
#########
## 1.1 Label and Connection Setup
connection: "your_connection_name"
label: "Your Model Label"
#########
# 2. STRUCTURAL PARAMETERS
#########
## 2.1 Include Statements for Views
include: "../views/*.view"
include: "../views_aggregated/*.view"
include: "../views_derived/*.view"
## 2.2 Additional Include Statements (E.g. Logistics and Base Logic Views | Optional)
include: "/area_logistics/views/*.view"
include: "/base/*.view"
include: "/base_granularity/*.view"
## 2.3 Include Statements for Data Tests
include: "../../data_tests/*.lkml"
include: "../../data_tests/views/*.view"
#########
# 3. ACCESS CONTROL (Optional)
#########
## 3.1 Define Access Grants
access_grant: access_grant_name {
user_attribute: user_attribute_name
allowed_values: ["value_1", "value_2"]
}
#########
# 4. EXPLORES
#########
## 4.1 Main Explore for Your Data Model
explore: explore_name {
label: "Explore Label"
view_name: view_name
persist_for: "N hours"
## 4.1.1 SQL Filters and Conditions (Optional)
sql_always_where:
@{sql_always_where_condition_1}
AND
{% if some_field._in_query %} ${some_field} IS NOT NULL {% else %} 1=1 {% endif %}
AND
@{sql_always_where_condition_2};;
## 4.1.2 Joins for Additional Views
join: another_view_name {
type: left_outer
relationship: one_to_many
sql_on: ${view_name.field_name} = ${another_view_name.field_name};;
}
join: another_join_view {
type: inner
relationship: one_to_one
sql_on: ${view_name.field_name} = ${another_join_view.field_name};;
}
# Add more joins to create data model
}
#########
# 5. DATA TESTING EXPLORE
#########
## 5.1 Define Explore for Data Testing (Optional)
explore: data_testing {
label: "Data Testing Explore"
view_name: view_name
hidden: yes
join: another_test_view {
type: left_outer
relationship: one_to_one
sql_on: ${view_name.field_name} = ${another_test_view.field_name};;
}
# Add more testing joins as required
}
#########
# 6. MAP LAYER (Optional)
#########
## 6.1 Define Map Layer for Geographic Data (Optional)
map_layer: map_name {
file: "../files/your_map_file.json"
property_key: "your_property_key"
label: "Map Label"
format: topojson
max_zoom_level: 15
min_zoom_level: 2
}
(2) The view
file – contains dimensions and measures accessed from a specific database table (or across multiple joined tables) [2].
###########
# VIEW_NAME & METADATA
# Description: This view reflects the [business context] or [data source] it represents.
# Author: [Your Name] | Created on: [Date]
# Contributors: [Your Name] | Last change on: [Date]
###########
view: view_name {
sql_table_name: `project.dataset.table_name` ;;
#########
# 1. DISPLAY PARAMETERS
#########
## 1.1 Label for View
# Specifies how the view name will appear in the field picker
label: "Your View Display Name"
## 1.2 Fields Hidden by Default
# When set to yes, hides all fields in the view by default
fields_hidden_by_default: yes
#########
# 2. STRUCTURAL & FILTER PARAMETERS (Optional)
#########
## 2.1 Include Files
# Includes additional files or views to be part of this view
include: "filename_or_pattern"
## 2.2 Extends View
# Specifies views that this view will extend
extends: [another_view_name]
## 2.3 Drill Fields
# Specifies the default list of fields shown when drilling into measures
drill_fields: [dimension_name, another_dimension]
## 2.4 Default Filters for Common Queries
filter: default_date_filter {
label: "Date Filter"
type: date
sql: ${order_date} ;;
description: "Filter data based on order date."
}
## 2.5 Suggestions for Dimensions
# Enables or disables suggestions for all dimensions in this view
suggestions: yes
## 2.6 Set of Fields
# Defines a reusable set of dimensions and measures
set: set_name {
fields: [dimension_name, measure_name]
}
#########
# 3. DIMENSIONS
#########
## 3.1 Simple Dimensions (Directly from DB)
dimension: dimension_name {
label: "Dimension Display Name"
type: string
sql: ${TABLE}.column_name ;;
description: "This dimension represents [business context] and contains values like [example]."
}
dimension: another_dimension {
label: "Another Dimension Display Name"
type: number
sql: ${TABLE}.other_column ;;
description: "Explanation of the dimension, including business context and possible values."
}
## 3.2 Compound Dimensions (Concatenated from Existing Dimensions)
dimension: compound_dimension {
label: "Compound Dimension Name"
type: string
sql: CONCAT(${dimension_name}, "-", ${another_dimension}) ;;
description: "A compound dimension created by concatenating [dimension_name] and [another_dimension]."
}
## 3.3 Derived Dimensions (Filtered/Grouped Values from Existing Dimensions)
dimension: filtered_dimension {
label: "Filtered Dimension Name"
type: string
sql: CASE
WHEN ${dimension_name} = 'specific_value' THEN 'Subset Value'
ELSE 'Other'
END ;;
description: "This dimension subsets values from [dimension_name] based on specific business rules."
}
## 3.4 Tiered Dimension (Grouped by Tiers)
dimension: order_amount_tier {
label: "Order Amount Tier [€]"
type: integer
tiers: [50, 100, 150]
sql: ${revenue_column} ;;
description: "This dimension creates tiers of order amounts based on thresholds (50, 100, 150)."
}
#########
# 4. MEASURES
#########
## 4.1 Simple Aggregated Measures (Sum, Count, Average)
measure: total_revenue {
group_label: "KPIs"
label: "Total Revenue [€]"
type: sum
sql: ${revenue_column} ;;
value_format_name: currency_format
description: "Total revenue, summing up all revenue from each record."
}
## 4.2 Calculated Measures (Derived from Existing Measures)
measure: profit_margin {
group_label: "KPIs"
label: "Profit Margin [%]"
type: number
sql: (${total_revenue} - ${cost_column}) / NULLIF(${total_revenue}, 0) ;;
value_format_name: percent_2
description: "Calculated profit margin as (Revenue - Cost) / Revenue."
}
}
(3) The manifest
file – is a configuration file that contains project constants, code for using files imported from another project(s), code for localization settings, and serves for adding extensions or custom visualizations [2].
###########
# MANIFEST_INFO & METADATA
# Description: This file reflects the [business context].
# Author: [Your Name] | Created on: [Date]
# Contributors: [Your Name] | Last change on: [Date]
###########
#########
# 1. STRUCTURAL PARAMETERS
#########
## 1.1 Project Name & LookML Runtime
project_name: "Current Project Name"
new_lookml_runtime: yes
## 1.2 Local Dependency
local_dependency: {
project: "project_name"
override_constant: constant_name {
value: "string value"
}
}
# Add additional local dependencies as needed.
## 1.3 Remote Dependency (Optional)
remote_dependency: remote_project_name {
url: "remote_project_url"
ref: "remote_project_ref"
override_constant: constant_name {
value: "string value"
}
}
# Add additional remote dependencies as needed.
## 1.4 Constants (Optional, but useful)
constant: constant_name {
value: "string value"
export: none | override_optional | override_required
}
#########
# 2. LOCALIZATION PARAMETERS
#########
## 2.1 Localization Settings
localization_settings: {
localization_level: strict | permissive
default_locale: locale_name
}
#########
# 3. EXTENSION FRAMEWORK PARAMETERS (Optional)
#########
## 3.1 Application Definitions
application: application_name {
label: "Application Label"
url: "application_url"
file: "application_file_path"
## 3.1.1 Mount Points
mount_points: {
# Define mount points here (refer to the application page for more details)
}
## 3.1.2 Entitlements
entitlements: {
# Define entitlements here (refer to the application page for more details)
}
}
# Add additional application declarations as required.
#########
# 4. CUSTOM VISUALIZATION PARAMETERS (Optional)
#########
## 4.1 Visualization Definition
visualization: {
id: "unique-id"
label: "Visualization Label"
url: "visualization_url"
sri_hash: "SRI hash"
dependencies: ["dependency_url_1", "dependency_url_2"]
file: "visualization_file_path"
}
# Add additional visualizations as needed.
Finally, after presenting the visual examples, I will list the two traits of organizing __ the model,
view,
and manifest
files in a similar way:
- (1) Design clarity: By using enumeration in the code sections, the code logic can be more easily tracked, adapted, extended, and troubleshot.
- (2) Consistency framework: By creating patterns of your code files, it creates uniformity in development. This results in easier collaboration and faster development delivery.
These two traits lead me to a Joker that is correlated with the code structure and organization, i.e., embedding of the documentation.
Joker #3: (Non-) Embedded Documentation 📜
When I start with my analytical development in the semantic layer, I always focus on the input-output flow or the process flow.
I can’t stress enough the reason for this, so I will repeat it again: I want my semantic layer to be understandable to both technical and business colleagues.
And to achieve this, I need to document my inputs, explain how I modelled them, and then elaborate on derived outputs.
In my early days of development, I always used Confluence for documentation of my flows and explanation of the data models in my semantic layers.
Lucky for me now, most of the analytical tools offer the embedded documentation feature, and the two ones that I tend to always adopt are:
_(1) Data dictionary_
Serves as my reference guide to explain to my stakeholders the core of the data models—i.e., dimensions and measures.
I tend to embed the formulas, business rules, and terminology within the code files, and then, by using the Data Dictionary, educate my end-users on how they can explore the data models.
With this approach, I save myself numerous minutes, if not hours, per week on responding to chat and mail enquiries about the meaning of the specific measures and dimensions.
On top of this, it serves me for the technical purpose of identifying if I have redundant fields in my project or if I have fields without annotated descriptions.
_(2) Data lineage_
Nowadays, a must-have for me in my data project is to have a graphical explanation of where my data comes from, how it flows through transformation, and where it ends up.
For this purpose, I tend to adopt tools like dbt Core with the native Data Lineage feature. And if I don’t have tools in my data architecture/project with native data lineage features, I will make sure to get one that only serves this purpose.
This gives my data team and stakeholders a clear view of the data’s entire journey – from the raw tables to presentational tables.
It makes code troubleshooting easier and elevates the usability of the Semantic Layer by providing a clear data path.
And the usability topic leads me to my final Joker, which is to constantly work on the improvement of your semantic layer.
[🃏Hidden Joker: Refinement Loop🃏]
My Refinement Loop: Iterate, Correct, Educate 🔄
This is where it all comes together if you want to build a semantic layer that is both simple and consistent.
To achieve this, you will probably need to go through the process of constant iteration, correction, and education.
— Why?
Because every semantic layer is built by the people to benefit people – or in a more catchy title "For the people, by the people."
With this sentence, I want to remind you that the semantic layer is a living and evolving component of the data project that usually grows together with the data team and rising business requirements.
So, even if you currently don’t have enough knowledge or experience to build a semantic layer that is both simple and consistent, just keep investing time and effort until it becomes one.
With each improvement iteration, the goal is to think about the end users, how to increase the usability of the insights created within the semantic layer, and how to create business value faster by delivering quality insights.
And if you manage to bring more value to your stakeholders in a shorter time, that’s the real Joker that can position you and your data team in the business landscape.
Thank you for reading my post. Stay connected for more stories on Medium, Substack ✍️ and LinkedIn 🖇 ️.
Looking for more analytical [templates | tutorials | reads]?
My 3 posts **** related to this topic:
- The One Page Data and Analytics Templates | Master Data and Analytics Reports and Processes with 5 Templates
- 3 Clusters of Knowledge Base Templates for Data and Analytics Teams | Create External, Shared and Internal Documentation in a Systematic Way
- Decoding Data Development Process | How Is Kowalski Providing "Ad-Hoc" Data Insights?
References
[1] Wikipedia: Semantics, https://en.wikipedia.org/wiki/Semantics, accessed on September 18, 2024
[2] Google Cloud Documentation: LookML terms and concepts, https://cloud.google.com/looker/docs/lookml-terms-and-concepts, accessed on September 19, 2024