Digital Transformations – Preparing Your Data for AI
Digital transformation and data
In an earlier post, “Why Digital Transformations Fail – Future Proofing”, we advocate that digital transformations must “design for capabilities for which both a strong business case and well defined requirements exist”. We recommend “future proofing enough – but not too much”. Yet, in this post, we present what seems to be a contrarian view: to invest in data in a way that, at first sight, might be guilty of future-proofing.
A digital transformation is a critical event intended, among other goals, to position the company for a new stage of growth for the next 2-5 years. Typically, at the time when a company reaches a maturity level where scaling has become a strategic priority, the value of its data becomes meaningful. An intuitive explanation is that data has reached the critical mass where insights that go beyond intuition can be harvested. As a corollary, data needs to be properly architected in order to yield these insights. Furthermore, proper data collection and curation is a foundational prerequisite for building an Artificial Intelligence (AI) sub-system into the product.
Why think about data during a digital transformation?
Digital transformation empowers a company’s growth to a new stage of maturity – and new business practices. During this transformation, data generated by the product also evolves in three major directions.
The rearchitecture of data
First, data needs to be rearchitected along with the code. Often, data is the primary dimension, more important than code, that drives the architecture.
Localizing data to each microservice is a core design goal of any re-architecture project.
Optimizing performance of data access is often a key driver to increase scale. While code can be scaled horizontally almost ad infinitum, it is much more difficult to do so with data.
With growth, data has to meet more onerous security and compliance requirements – for example data locality to meet GDPR.
Future-proofing your data
When a company is small, the amount of data it holds is small. Insights can be derived by combing over a spreadsheet. As the company grows, and the data it holds becomes larger and more varied, business intelligence and data science tools can discover insights that intuition alone could not have imagined. Consequently, ensuring that data is consistent across the product, as well as with internal company data, will save a huge amount of time in the future. This is where “future-proofing” comes in. Even if there is no immediate plan to harvest product data, it is important to:
Have consistent data formats and meaning across the product, accompanied by data dictionaries.
Promote data sharing along with access and discoverability across all functions of the company, while maintaining proper security, so that each department can experiment with the data in order to gain more insights on its own operations.
Opportunity for increased revenue
Finally, data can be used to increase revenues – which we cover in the next section.
Examples of how data increases revenue
The power of data lies in the diversity of ways it can be applied. Rather than attempt to provide an exhaustive list of applications, this section is meant to provide examples that stimulate the imagination.
Improve decision making
Data can be universally used to improve decision making.
The simplest approach is to combine data gathered in the product with data from internal operations. For example, track which marketing campaigns are the most effective, predict demand and churn.
In addition, by instrumenting the product, product managers can track which features are used, or not, particularly to confirm that a newly introduced feature is seen and used by end users. Similarly product managers can track usage patterns to identify areas of the product that are confusing, or follow patterns that lend themselves to simplification. Finally, tracking usage patterns should confirm how users perceive the value of the product, and thus lead to pricing optimization.
Leveraging user analytics, growth marketers can directly drive revenue growth by using data generated from individuals’ interactions with the product to prompt them to purchase additional features relevant to their usage. For some companies, this is the primary driver of revenue growth.
Generate new sources of revenue
The examples below show various means to increase revenue, either by increasing engagement and the perceived value of the product (and thus increasing retention and the ability to raise prices), by increasing usage by better understanding users’ needs, or by monetizing the data directly.
Trend analysis and recommendation systems increase product and services unit sales by suggesting additional purchases based on purchase history, product similarity or purchases of users with similar profiles. While seasons or news are well understood influencers of purchase decisions, other trends can only be discovered through the application of machine learning.
AI-based language analysis allows a company to ‘read the minds of its users’ by analyzing all text-based and voice-based exchanges from users, as well as prospects, across all communication channels, internal or external to the company such as phone, email, chat, and social media. Companies can thus discover friction with existing features, as well as unmet needs.
Analysis of data aggregated across all of the company’s customers may reveal trends that are not visible at a smaller level, or local trends may be generalized – simply because the aggregated data pool is bigger and broader. As for all complex endeavors, a progressive approach, with measurable success milestones, is recommended. For example, a capability-driven progression could be:
1. Descriptive analytics: Document ‘what happened?’ (e.g. ‘Alert, a server crashed’, ‘N customers bought item X today’.) Nowadays, this capability is expected from any non-demo software.
2. Diagnostic analytics: Explain ‘why did it happen?’ (e.g. ‘what specific service/line of code caused the server to crash?’, ’What drove this customer to purchase item X?’) This is expected from mature software. It is important information to improve the product on both technical and business fronts.
3. Predictive analytics: Predict what will happen. Provide insights into the future. (e.g. ‘this service requires data to be cached’, ‘people who bought this product also bought this other product’.) Thanks to the insights derived from predictive analytics, companies can drive additional revenues and optimize costs. This technology is now widely available.
4. Prescriptive analytics: Forecast ‘how can we make it happen?’ (e.g. ‘automatically increase the compute capacity for a service based on intelligence gathered from data’, ‘automatically order more supplies, or buy more advertising based on algorithms and data’). Decisions are made faster, without a human in the loop, based on the data collected. Billion-dollar companies do this. For smaller companies, gaining and applying this expertise is a clear opportunity to differentiate themselves, and get the associated lift in revenues.
5. AI-driven operations: Discover unknown unknowns. (e.g. improve predictive analytics even further by applying AI algorithms to the company’s data, or leveraging generative AI, which trains its algorithms on vast amounts of data publicly available.) AI-driven operations is leading edge technology, which requires an internal team of experts as well as sustained investment over time to fine tune the technology to the company’s use cases. At the time of this writing, generative AI is an emerging technology, whose applications are yet to be fully discovered.
Finally, provided the company obtains users’ consent, the company can sell its user-generated data.
Preparing for AI
In SVSG’s experience, it is dangerous to attempt to skip steps in the progression presented in the previous section for the simple reason that analytics always produce a result, but do not tell you whether the result is correct, or optimum. It is easy to make a prediction, it is much harder to make a good prediction.
Capture relevant data
A critical first step is to capture all the company’s relevant data, in a clean way, as described earlier in the section “Why think about data during a digital transformation?” The importance of clean data with correct meaning cannot be overstated. Incorrect data will lead to incorrect decisions (to state the often-overlooked obvious). The commonly accepted rule is that 80% of the cost of AI projects is spent in data preparation. Hence, the earlier tools and processes are put in place to curate data, the lower the cost.
Progressing through the first four levels of data-skills demonstrates the company’s skill at collecting and analyzing data correctly, and thus its readiness for AI.
Grow your AI talent
The second step is to acquire AI talent. AI is a different engineering field from software development. The best software engineer without AI education will not deliver quality AI capabilities. To be clear, both skills are needed, yet AI algorithm development is more akin to science. Once the AI team has figured out the algorithms (and the data required) to generate new revenue, then the software team jumps in to productize it.
In practice this means that time and resources for experimentation need to be budgeted for the AI team to research algorithms, tune them, and optimize them to the company’s use cases, and demonstrate business value. Naturally, as for any research project, success is not guaranteed.
AI research and development
Finally, investment in AI, both research and data operations, must be maintained. Unlike software, which can be left alone once it works, AI requires constant optimization as data and the people who generate it change. In addition, processes must be set up to ensure avoiding known side effects such as drift, and bias.
Data is a product
The examples above are not exhaustive by far, yet they illustrate the value of data. While the harvesting of data in a data warehouse, or data mesh, may lag the digital transformation effort, it is critical that during this digital transformation, data be properly architected primarily because the cost and time to do so after the fact is so much greater.
In practice, data must be treated as a product with its own product manager(s) and development team(s).
The data team’s role is to:
Nurture data to ensure accuracy, completeness, as well as correctness.
Ensure quality so that data has well defined and has consistent meaning and format across the product and internal systems.
Provide access and tools to harvest the data across the whole organization, while maintaining security. Insights come from unplanned places and people.
Drive the company’s capabilities along the skill-progression outlined earlier. Exploring the potential use of AI requires enlisting qualified AI engineers, as well as patience in terms of time and budget for the research to demonstrate customer-value and business case.
Final thoughts
With the emergence of generative AI, ‘don’t forget data’ seems like a timid recommendation, yet for most companies, it is the necessary, difficult, first step in a pivot to a world that has become data-first.