Need help finding the right cheese? And you think you know data!

Building a Successful Data Organization

5 min readSep 13, 2021

I wanted to share some of my successes in our journey to enable the organization to leverage data more effectively. This story will focus on the Data Ops aspect, and others I’ll write about in subsequent articles. A lot of companies will by now have a data organization but very few I speak to have an effective Data Ops group.

Taking a Step Back

Taking a step back, a data organization gets kicked-off to bring order to what feels like bad data quality. The company is spending money on projects that have a data component, but it still feels like each project has a uphill climb. This looks like each project needs to contact the source systems to get data, ensure it meets the purpose at hand, and then brings it together. The source systems continue to evolve and there’s often unanticipated impacts downstream. Additionally, with changing business demands and people attrition, the need for certain data quality measures on a data point isn’t preserved and so that’s another reason why issues occur. The other symptom that’s prevalent is multiple people using the same or similar data showing different data points. This may be because they are taking the different data points that sound similar (daily vs. monthly), not being part of a corrections process (typically handled via email which then brings manual intervention), taking the same data at different points in time (month-end data on Business Day 1 will be different from Business Day 5), or even taking the data from a downstream source because it seems expedient (the downstream might already have all the data points consolidated).

A Typical Starting Point — The Data Engineering Team

If you’re a IT leader, your organization might already have a data engineering team. This is the group that sources data to bring into a data warehouse or data lake, or whatever term you’re using for where data is consolidated. The role of such a group is easily understood. Often the engineering team works with business users of the data and is driven by a product owner for a set of needs. Data quality issues are handled as they come up, by a combination of the appropriate business user and tech team member adding checks or working with the upstream data provider to establish a different process.

While this org structure is often in place in the name of efficiency, users of the data often build their own data quality processes, including manual checks, or creating copies of the data to expose to their customers. The result is a data delivery pipeline that isn’t efficient, has delays, and tends to be more prone to issues.

Next Step — Data Operations

As the number of users grows for your data lakehouse (today’s popular term for your version of warehouse or data lake), there’s recognition that a more robust data quality process needs to be in place and centralized. The term ‘Data Ops’ is now prevalent and speaks to this data quality monitoring — ensuring values are as expected and within tolerance, data is received on time, and data issues are handled with a repeatable and auditable process. This is looked at from the perspective of the source providers. We expect monthly sales results to be in by business day 3 after month-end and if they aren’t, then someone’s going to investigate. If the trade type field has a new value, we’re going to check. Sometimes, you might hold downstream processes until these issues are resolved.

The Leap — Data as a Product

As you mature the data practice at an organization further, you realize that not only do you need to manage in the manner above, but you also want to manage for cost and assure different levels of service to different data consumers, who might be using the very same data. Some critical data flows might require that you hold up the data while others are ok for the data to get fixed but want to start using it for analytics with the potential for small issues.

The shift that occurs is now you need to manage for data from a consumer’s perspective. The data organization is making a commitment to each consumer that they will get the data they need, timely and without issues. This requires that you grow your understanding to include both the data providers as well as the data consumers. The types of questions such a shift requires to be asked are… what products does X consumer need the data for? what fields are important? how is that data used in processing? how sensitive are they to a 10 minute delay or a 2 day delay? It allows you to prioritize issues. It also allows you to monitor for impacts of upcoming changes so you can either address or get the impacted consumer groups involved with discussions and testing.

The business value of a such a shift are that the data quality functions are now centralized and predictable. The hours or FTE counts baked into the operations of consumer groups can now be released or shifted to more value-added work. These cost efficiencies are somewhat hard to calculate either because the work on downstream consumers can be episodic, distributed, or kept hidden. It’s also not easy to release these costs, since they are likely a small component. However, these benefits are realized as other org changes allow for additional work shifts or allow the teams to absorb more changes.

Most importantly, as new needs come up, this group comes the go-to group to vet requirements and get to solutions faster. This data as a product group has a good understanding of end to end data flows and can call out impacts quickly. Others in the organization will gravitate to this model because it helps them be more successful while off-loading this work that was uncomfortable to perform, since they didn’t know the full complexity of the data. An example might be what product name to use in which context — do you use the 25 digit name or the 40 digit name or which product code to use when shipping a return label.

This group will be a combination of technical and business-oriented resources. It could be a partnership between a IT and a business group, or could sit in IT or the business, so long as the team has members with technical skills, some with an analytical mindset, some that are process-oriented, and still others that are business-minded.

Conclusion

No-one ever says data is easy. As you spend more time with it, some of these issues become more clearly understood. But they remain hard to explain to someone who hasn’t spent time with data. The best approach is to implement these steps incrementally, so people appreciate each step for the benefits it provides, and helps you gain credibility along this journey.