I work for a medium-sized organization doing big things with data. We run lean and mean, with a handful of extremely talented people making our data architecture serve the needs of the entire organization. We prefer to buy technology over build something custom. So, when Tableau Catalog came on the scene, I was interested in filling a data governance need we had.
Tableau Catalog is part of the Tableau Data Management add-on, which means it is an additional cost. Having this additional component for your Tableau Server or Tableau Cloud deployment unlocks a variety of capabilities. While there are many features included in Tableau Catalog, here are my favorites.
Lineage and Impact
Out of the box (once you have licensed and enabled the add-on), the first thing you'll get is a new Lineage tab for every object (workbooks, data sources, lenses, etc.). From here you can explore where the data is coming from and where it is going.
After navigating to a workbook's lineage tab, the right side of the screen shows that the data is coming from one database (in this case it is a Google Sheets file) and four separate tables. The data is going to six separate sheets, one dashboard, and three metrics. Each of those lineage steps are clickable and will list the corresponding objects.
Selecting the check box for a field on the left updates the right side of the screen. Since the left side of the screen shows a single data source at a time, by selecting a field from this data source, the number of tables on the right side is updated accordingly to show that this field belongs to one table. The selected field is used in two sheets, one dashboard, and three metrics.
When selecting the Owners section of the lineage diagram, all the people who own content connected to the workbook, data source, field, etc. will be listed. Clicking the Select All button at the top or manually selecting certain individuals enables the Send Email button, which makes it really easy to let people know that changes to a data source or field might impact them. Tableau Catalog does all that heavy lifting.
Field Descriptions
Did you see them? In two of the screenshots above, field descriptions are visible. This not only helps understand the data from a lineage perspective, but also when it comes to usability. When an individual is using the data source (whether via Ask Data, in Tableau Desktop, or elsewhere), field definitions will be there to help. In fact, this is not the last time field descriptions will be available to users. More on that later.
Unfortunately, field definitions do not come out of the box. They must be defined and added by the individual creating the data source. But this effort is not wasted. Not only will the process help the initial developer create a more thoughtful data source, but anyone who uses the data source will benefit. I'm looking forward to a future integration between a business glossary and Tableau Catalog, but for the time being, see my post 12 Enterprise Tips & Best Practices for Tableau on how to create these definitions.
Data Quality Warnings
Another feature that is immediately available upon configuring Tableau Catalog is the data quality warning, which applies to published data sources. There are two options: a quality warning and a monitoring warning.
The quality warning is set and removed manually by the data source owner or administrator. There are a variety of settings, such as the type of warning and how visible the warning should be to end users. A custom message can also be configured. Ultimately, this warning is shown on the data source page, in Tableau Desktop, in the Data Details pane, and in subscriptions where the affected data source is used.
The monitoring warning is automatically applied in the event a refresh does not complete successfully, once it has been configured. It shows up in the same places the quality warning appears.
Tableau Catalog makes it incredibly easy to let the organization know that:
- a data source is being worked on
- data is delayed due to a third-party vendor issue
- a data source will eventually be retired
- a data source contains sensitive information
- etc.
The possibilities are endless. I plan to explore this further and provide data quality templates. So, make sure you subscribe!
Data Details
With Tableau Catalog enabled, field descriptions defined, and data quality warnings in place, the Data Details pane becomes an invaluable tool for end users. When looking at a view, click the Data Details button and a pane on the right side of the screen will appear. This screen provides an overview of how often this view is referenced, along with project and owner information. Next, the data sources will be listed, along with any data quality warnings that might be activated. Finally, the fields used in the view are listed and can be expanded to provide data definitions and/or calculated field logic. This pane gives end users additional information about data freshness, behind the scenes logic, and business definitions that allow them to self-serve and get to answers faster.
With the addition of Tableau Catalog to our Tableau deployment, I find that I am curating data sources differently than I did in the past. Here are a few things that have changed:
- Putting more thought into the name of each field
- Ensuring each field has a clear definition
- Creating an Ask Data lens so users can ask questions of the data in an ad hoc fashion
- Setting up a refresh schedule for each data source
- Setting up a monitoring warning for each data source