Toward customer hyper-personalization experience — A data-driven approach

Abstract Today’s omnichannel business models incorporate physical and digital touchpoints interacting with customers. A hyper-personalization strategy relies on the organization’s capability to gather and transform customer data into personalized experiences; therefore, when a hyper-personalization organizational plan is put in place, it serves two main functions: to deliver personalized experiences and increase the number of customers receiving such experiences. For this to happen, four elements are required for a hyper-personalization strategy: data foundation, decisions, design, and distribution. While customer master data management relies on the correct identification of a customer, a real customer insight can only be achieved when three types of customer data are gathered: Identity, Contactability, and Traceability (I, C, T)- fulfilling the first element of a hyper-strategy. This article aims to identify the benefits in the total number of customers that can receive a hyper-personalization strategy when real-time touchpoints are linked to a customer Master Data Management that integrates the three types of customer data.


PUBLIC INTEREST STATEMENT
The growing volumes of consumer data have turned its analysis into a powerful tool to increase productivity and develop focused strategies for the target consumer. Consumer data is crucial for decision-making and strengthening the companies-customers relationship from the companies' perspective. Although big data plays a more critical role in decision-making, there is still necessary to analyze data with the appropriate techniques depending on the final purpose.This paper focuses on identifying the benefit of customer data integration from digital and physical touchpoints by filling incomplete customer information in databases. The effectiveness of customer database migration and strategies transformation of anonymized records into complete customer records are analyzed to identify the benefits in the total number of customers that can receive a hyper-personalization strategy in real-time.

Introduction
Today, organizations must aspire to an institutional and comprehensive knowledge of all interactions between them and their customers (Jain et al., 2018;Kalia & Paul, 2021;Low, 2000). In addition, the need to deliver personalized communication and services is a challenge that organizations face due to the introduction of digital channels while expanding traditional ones (Andreassen et al., 2018;Bleier & Eisenbeiss, 2015;Wolny & Charoensuksai, 2014).
Each interaction between the organization's touchpoints and its customers generates data, and those touchpoints offer companies the opportunity to record the customers' data that can generate value for them (Erevelles et al., 2016;Rekettye & Rekettye, 2019). Each touchpoint is supported by platforms or systems, which ingest that data into their databases (Ducange et al., 2018).
Customer information must be available for all touchpoints and source systems to deliver hyperpersonalized experiences through an omnichannel business model (Kalia & Paul, 2021). This work uses the proposed definition of personalization established by Imhoff to achieve personalization goals (Imhoff et al., 2001), "Personalization is the ability of a company to recognize and treat its customers as individuals through personal messaging, targeted banner ads, special offers on bills, or other personal transactions." Previous works identified two personalization aims: delivering information relevant to specific individuals or groups of particular individuals in the format and layout specified at appointed time intervals. The second is the increase of the revenue and decrease of the business by applying a one-to -one marketing understanding of its customers' needs, habits, lifestyle, preferences, likes, and dislikes. In the end, the combination of both aims is addressed or at least given the illusion of satisfying the customers' individual needs and preferences (Hess et al., 2020;Jain et al., 2021Jain et al., , 2018Won, 2002).
A hyper-personalization strategy has four elements: data foundation, decisions, design, and distribution (Boudet et al., 2017;Jain et al., 2021). All these elements are necessary; however, data foundation is the starting point because a hyper-personalization strategy needs a customer's feedback to deliver the experiences. For example, if all customers are unknown, a hyperpersonalization approach will help build the customer database to receive hyper-personalized experiences later on.
Digital clienteling is defined as improving customer engagement by providing a unified customer information resource (Jain et al., 2021); the increasing use of smartphone applications makes this device the preferred tool for customization (Mallya & Nair, 2016).
As the current literature sets, implementing hyper-personalization strategies can improve the clients' contactability and customer engagement as needed (Jain et al., 2021;Micu et al., 2022;Ngoc Thang et al., 2020). However, it is necessary to study the impact, peak, and limit of these strategies and the best possible moment to implement a new approach to continue increasing the number of customers that can receive hyper-personalized engagements. Additionally, most studies rely on studying either e-commerce or physical stores but not a combination of both (Micu et al., 2022;Ngoc Thang et al., 2020). This study intends to demonstrate the added benefits of establishing a hyper-personalization strategy through physical and digital touchpoints concerning the number of customers receiving such a strategy.
An example of the previously mentioned occurred at a retail company selected to run hyperpersonalization strategies. This trial was necessary to demonstrate the impact of implementing digitalization strategies on the number of customers who could receive Hyper-personalization strategies. Such a company made it possible to reach 15 million clients that could be considered during the research to demonstrate the effects of the aims of the project while being monitored for 15 months. In the end, all the work resulted in the increment of the number of customers. The collected data shows that more significant benefits can be obtained when customers receive hyper-personalized offers through such strategies, real-time data input through an architecture that links a customer data platform (real-time) and an MDM (near real-time).

Literature review
Data personalization strategy, data gathering, and master data management are the sections that conform to the theoretical framework. The literature review was performed to support the proposed architecture theoretically for the hyper-personalization experience.

Data for a personalization strategy
The necessity for integrating customer data across the organization has been an old problem that has been addressed by various technologies and governance methodologies (White et al., 2006). Three information types can be gathered from customers' Identity, Contactability, and Traceability; despite customers providing information without being aware-hidden information, like cookies or IP addresses, this type of information can also be grouped into one or all three of those categories when handled correctly.
The customer information can be defined as identity (I): Personal Information from the customer such as their name, last name, date of birth, gender, social security number, tax ID, and all other types of information is used for authenticating an individual excluding contact information (Narayanan & Shmatikov, 2010).

Contactability (C):
Contact information used in campaign management direct and online channels (Anderson, 2009), like address, telephone, cell phone, App, WhatsApp, social networks.

Traceability (T):
identifies that a customer is interacting with the company and is divided into two types.
(1) Transaction or receipt: the interaction where a transaction is carried out (Kurniawan et al., 2017), and there is a record of what occurred (or there should be a record), for example, a purchase, first and second-party data (Paulina, 2017), a complaint, balance inquiry.
(2) Visit: interaction without creating a record, like a store customer's visits without purchase.
Personalization is defined as individualization reached by considering specific customer preferences (Nobile and Kalbaska 2020). Therefore, the path toward hyper-personalization is based on customer information: Identity, Contactability, and Traceability.  Figure 1 shows the relationships among a customer's knowledge elements. Relationship among information gathered from customers enables organizations to map their customer information and allocate it into each of the seven states. The initial result of mapping customer information in states A to Z will vary across industries and business models. Some industries have a more significant customer concentration in state A because of their innate customer relation, for example, e-commerce, banking, retail by membership, among others. However, some strategies help complete data profiles and upgrade towards an A state regardless of industry and business models. The initial result shall reflect its business model and its current process of customer data gathering.
The initial result can be improved by migrating a customer record by completing the missing information (I, C, T). The desired state is A, where all three Customer Information types are present. Personalization can be delivered to customers in conditions A and D; the main difference is that state A allows to address them personally. Therefore, A state represents hyperpersonalization. In addition, both states (A and D) will enable us to analyze and connect with them using historical omnichannel information.

Data gathering across touchpoints
Data gathering has been facilitated by digital technologies which interface through digital touchpoints (Erevelles et al., 2016;Straker & Wrigley, 2018). For pure digital business models, where most of its touchpoints are digital, customer data is easy to find (Wirtz, 2019). However, business models that rely on physical touchpoints must implement extra efforts to collect customer data. Table 1 compares a customer experience between purchasing online and through a physical store. The customer's knowledge and identity are facilitated if digital touchpoints are being created or enhanced through the organization (López García et al., 2019).
Depending on the business model, the customer information comprises different personalization levels; it is easy to imagine the knowledge that a Shared-ride app has about the customers: in addition to contact information, identity, and form of payment, it has collected all geographical coordinates from origins and destinations, AC preference (Gilibert & Ribas, 2019;Shaheen & Cohen, 2019).

Master data management
Master Data Management (MDM) enables organizations to have a customer core entity solution for data consistency, simplification, and uniformity of process, analysis, and communication across the enterprise (Erevelles et al., 2016;White et al., 2006). MDM has proved efficient in breaking data silos of a defined entity across an organization and delivering a unified customer record. However, NA.

Search and Selection
It is known that each element selected, search made, selection, and movement made to the cart NA.

Check out
We know the products, place of delivery, payment method, and contact with the customer, all the elements (I, C, T) are present The products and payment methods are known (T).  its primary purposes have been described as Deduplicate and Regular Batch Matching for Duplicate Removal (Haneem et al., 2017;Oracle, 2015).

Proposed architecture for hyper-personalization experience
The architecture shown in Figure 2 is proposed to fulfill a complete customer information database. The proposed architecture comprises five blocks named Real-time feedback source systems, Other Source Systems, Master Data Management (MDM) Repository, Customer Data Platform (CDP), and Machine Learning Repository (Karanam et al., 2021;Richter & Wood, 2015), and each block is described as follows: (1) Real-time feedback Source Systems: three source systems selected to have real-time feedback for customer interaction-1) Point of Sale, 2) E-commerce, and 3) Delivery. All three source systems would integrate contactability information for real-time matching-Mobile phone number and email. The flow of customer information from these source systems passes through the Customer Data Platform and then to the MDM, allowing the CDP to feed the real-time system required data.
(2) Other Source Systems like customer data from all other source systems follow the traditional path and are connected directly to the Master Data Management system.
(3) Master Data Management Repository contains the Customer Golden record after performing the following processes: Data Normalization, Data Cleaning, Matching, and De-duplicating processes.
(4) Customer Data Platform: a historical version of MDM that includes all Sources Systems and is the real-time enabler between real-time source systems: 1) Point of Sale, 2) E-commerce, and 3) Delivery and the traditional MDM flow information. Through its connection with MDM, any interaction with a known customer triggers a real-time response that includes a Best Next Offer, or Best Next Action prepared for that customer in the Machine Learning Repository.
(5) Machine Learning Repository: a repository that enables advanced matching through Customer Traceability or Contactability Information.

Realtime matching CDP
An algorithm to trigger a real-time response to customer interaction is implemented in the customer data platform. This algorithm works with a machine learning repository to successfully integrate master data management by creating new customer records and avoiding duplications (ActionIQ, 2021;. Figure 3 shows the proposed algorithm.

Architecture methodologies
Without regard to the applied business model or touchpoints, it is possible to have more significant customer participation in state A. However, the path for personalization may begin in the customer's present status. Table 2 shows the different communication efforts that can be performed in each state. When a customer is in the desired state, this does not mean to stop communicating with the customer; therefore, communication should be based on offering the best possible offer for each customer based on data mining-based on the specific preferences of each customer, in what is known as the Best Next Offer (or action) (BNO/BNA; Fabrizi & Banoub, 2014).
States A and D enable personalized communication based on previous interactions where BNO and BNA can be achieved (Liermann & Stegmann, 2019;Valtonen, 2020). Best Next Offer and Best Next Action mean that different customers receive different content depending on their previous behavior. State B enables communication with minimal personalization derived from Identity Data. State Y promotes the simplest form of communication where no data but the contact information is known; therefore, almost all customers receive the same content.
It is crucial to have an omnichannel communication strategy that respects the number of touches for the customer at the company level. It is also fundamental to have a communication strategy that includes direct channels: email, SMS, push notification, www, app, and contact center (phone, WhatsApp, FB Messenger; Goersch, 2002). However, proven strategies can achieve a customer's migration in any status to a better level.
A customer's state determines if the company can develop active or passive communication strategies. Conditions that include Contactability (C) are enabled for active communication strategies, and State Y can be upgraded to D, B, and A, while D and B can be upgraded to A only. If the state does not include Contactability (C), the organization must wait until the next customer interaction to gather information. Strategies that upgrade a Customer Data Profile are shown in Table 3 (Bolton et al., 2008;Wang & Hong, 2006) and can be described as follows: (1) Loyalty Program: A mechanism that allows the customer to access redemptions or levels that benefit under a system of points or visits. The loyalty program of preference must be built based on the customers' Information in A and D to prevent losses (Rigby et al., 2003).
(2) Data Quality Program: The Information about the customers must be kept constantly updated; all the contact points must seek confirmation of the information under business rules that do not create friction. The program monitors the data quality for each contact point and manages it to improve it.
(3) Receipt Registration: a digital system through which customers who use Cash or have not used a digital channel previously are encouraged to register their receipt or folio and, in exchange, will participate in an activity or sweepstake.
(4) Sweepstake or Digital Activation: digitalization of the sweepstakes, through which a customer registers their form of payment or digitalizes their receipts to participate in exchange for a benefit.
(5) Digital Receipt: Process by which sending the receipt is done electronically to the Customer's Email or Cell Phone, offering them the possibility to consult it on their digital profile.
(6) Purchase with Home Delivery: The nature of this process means the customer is likely to give the delivery address and contact data to be constantly updated on the delivery status.
(7) Digital Record: The digital channels offer all the information (I, C, and T), but they need to define the mandatory fields for registering a new customer.
(8) Subscription: this medium keeps our customers close to receiving commercial, informative, and promotional Information.

The methodology
In the study case, authors analyzed migration data when hyper-personalization strategies were applied to a retail company with a client database of 15 million individuals classified in the seven states, according to Figure 1. In addition, a snapshot of migration data was obtained every three months for 15 months after strategies application. All customer information is in the Master Data Management (MDM) and contains a recorded history of 5 years at this study. Before a record is kept in the MDM, a quality process is applied explicitly to each field, but all fields are set as optional to maximize the value of data gathering at each touchpoint; this sets this work apart, for it is customary that an MDM has a set of mandatory fields to be recorded in them. The database consists of 15 million customer records in the MDM, and the result of the first iteration of this methodology resulted in those records being placed in the seven states presented in Figure 1 and represented in Table 4 and 8 Month 0. No new customer records are inputted in the data set, and all migration strategies in Table 3 are put in place, so the data presented in Table 8 represent the state transition due to data enrichment caused by migration strategies, current methodology, and real-time matching.

STAGE 1 completeness
This stage evaluates the completeness of a customer record and focuses on being able to Identify a customer through the set of fields. All records within the MDM were assigned a completeness level based on Table 4. This stage assesses the MDM records containing all required fields to determine a customer's identity. While the MDM of this specific institution accepts null values for any given area, in this process, a mandatory or optional attribute is assigned to each field.

STAGE 2 matching
In order to determine the uniqueness of a customer record, it has to go through a process that evaluates if the record contains enough identity elements. According to Table 5, records with Quality Levels 1, 2, and 3 are assigned an Entity Level. This process matches records with the desired quality within the MDM, identifying which records are unique, which ones will have to go through a merge/ unification process, and flags all records in which it is impossible to determine the unique identity of a customer (namesake). All Entity Level 3 results remain in this stage.

STAGE 3 merge
If two or more records correspond to a unique customer, these records must be merged and receive the same customer ID. Records with Entity Levels 1 and 2 go through a match and combine process to deliver a unique customer ID. Every customer record that is an output of this stage is guaranteed to have the Identity data type.

STAGE 4 evaluating contactability
Once identity has been determined through previous stages, this stage focuses on establishing the completeness of contactability information. All records assigned a unique customer ID are then evaluated on contact information according to Table 6. All contact levels excluding Level 5 guarantee the record to have the Contactability data type. All data from Step 1 with Quality Level 4 containing contact information fall under the anonymous but contactable category.

STAGE 5 evaluating traceability
This stage focuses on linking transactional information to customer IDs, enabling traditional transaction records to be assigned to a customer. Transactional sources include purchases, deliveries, contact center, wedding registry, e-commerce, app, credit card, curve-side pickup, tailoring, etc. All records assigned a unique customer ID are then given a Traceability Level based on Table 7. All other transactions that do not reach a Traceability Level remain anonymous.

STAGE 6 mapping the identity, contactability and traceability framework
This stage then assigned all records within States A, B, C, D, X, Y, Z shown in Figure 1, enabling understanding customer records' current identity, contactability, and traceability maturity.

STAGE 7 design and execute strategies
Once all records are within a State shown in Figure 1, Strategies are designed and executed to help enhance customer information in the MDM. Eight actions were applied at the studied company following the hyper-personalization path, each stage was used at a time, and migration data were analyzed to define the customer state seen in modification Table 8; the applied actions and the results on the customer state are presented in Table 9.
After a record has been placed under one of the seven states and subjected to data enhancement from migration strategies, its transitions can be observed in Table 8 for 15 months. The participation of each state is then recorded every three months. From month 0 to month 3, significant shifts occur in State Z, X, D, and A, almost eliminating only Identifiable records in state Z. For Month 3 to 6, 6 to 9, 9 to 12, and 12 to 15, the transition occurs from state X to state A.
Since all eight migration strategies are put in place simultaneously, it is essential to record those with a significant impact on the record transition. State A is the desired state, and a detail of the migration strategies that helped increase State A participation from 16.53% to 31.37% are detailed in Table 9.
The behavior of the customer migration from different states to A state is presented in Figure 4. When migration strategies are implemented, customer contactability increases with an exponential-like behavior. An exponential saturation equation (Eq. 1) was proposed to model the increment on customers in state A to learn the possible limit of customer migration.
In the proposed model, A represents the saturation value, which means the maximum percentage of customers that can be attracted to the A state, τ is a time constant (in months) that regulates the customer migration to the A state. When data coming from the study case, equation 1 takes values expressed at Equation 2 y ¼ 32:46 À 19:3e À t 5:15 (2) The proposed equation has a 0.99 R-square value indicating a substantial agreement with the proposed model. The model tends to a saturation value of 32.46%; therefore, it can be said that no more customers can be attracted after this value has been reached while implementing the same strategies over time.

Discussion
Personalization strategies depend on organizations' capabilities of delivering personalized actions to their customers through an omnichannel system; most studies go in-depth into which capabilities must be present to implement a personalization strategy. The other key element to deliver a personalized approach is the set of known customers (I, C, T), which methods increase such customer sets and provide a framework to migrate customers' information to the desired state A.
The proposed methodology allows the migration of customer information from incomplete states to a fully identifiable customer; each methodology block helps in the migration strategy. MDMs can be used to complete information and avoid duplicity by continuous update and batch matching for deduplication (DoHaneem et al., 2017;Ng et al., 2017)-MDM deduplication helps in enhancing the customer database behavior (Ng et al., 2017).
Touchpoints enable organizations to deliver personalized service; however, the customer must be identified before a customized service can be provided. This work implemented customer identification in physical touchpoints by requesting Email or Mobile-contactability information and deliberately excluded other customer identification data such as the customers' name and last name. If such information matches with the CDP, a personalized action can occur while updating a State A; if not, a prospect is created that contains a mean for customer contact that achieves a State D or Y.
Matching real-time allows increasing the identifiable customer's database through physical and digital channel integration, enhancing customer identification. Finally, establishing a data migration framework will enable organizations to develop migration strategies to the desired state and is key to the hyper-personalization strategies development.
In the study case, the initial mapping result reflected that State X contained more than half of the records while State A participated only 16.53%. As seen in Table 8, there is a significant upgrade from state X to State D and State A within three months.
After 15 months, States A and D together have total participation of 57.77%; these states enable advanced communications based on customer knowledge such as BNA and BNO, State X decreased 54.29% to 36.82%. Although the proposed model in Equation 2 can accurately describe the studied company behavior, this model is valid in short time-lapses and changes according to the company conditions; therefore, the proposed model can explain the customer evolution for a limited advanced time.
While the model expects state A saturation at 32.46%, new migration strategies could deliver a new curve. During the 15 months of data observation, the sales channel participation remained at the same levels; if the digital sales channels increase their participation, this could also deliver a new curve. This methodology can be used for different industries other than retail and to various international markets where the fields required for determining a customer ID are needed; the detail of migration strategies would have to reflect the nature of the industry, many of the ones presented here have multi-industry purposes: Loyalty program, data quality program, sweepstake or digital activation, digital record and subscription requirements.
For this methodology to work, the existence of an MDM within the organization is required, and it is necessary to apply quality rules to the fields previously.

Marketing implications
States A and D enable the organization to communicate personally with their customers activating one or various personalization initiatives. Enabling personalization requires customer data and marketing technologies such as Campaign management, audience management, customer analytics platforms.
When measuring initiatives by the Return on Investment (ROI), product offerings and recommendations, in-person customer experience, online customer experience, and pricing strategies are already delivering a better ROI than social media that organizations currently render, it has been seen that none is more significant than 40%. Therefore, the ROI increase of these initiatives is expected to grow in the following years, and more adequate initiatives to leverage personalized information are expected to deliver more significant ROIs, as shown in Table 10.
Demonstrating the effectiveness of personalized initiatives and their progression in time will make institutions shift resources and budgets to more effective and measurable marketing initiatives.

conclusions
Personalization strategies help deliver personalized actions to customers and help increase the number of known customers by the organization. This cycle enables the organization to implement and robust its personalization strategies and establish means to migrate customers from poor information states to complete information states.
All migration strategies are intended to complete and upgrade to the original State of Information that an organization already has. Currently, most organizations plan to migrate to State A, which contains the intersection of Identity, Contactability, and Traceability (I, C, T). However, let it be noted that certain conditions are not attributed to the migration strategies that impede certain records from upgrading (invalid data, Identity data error, homonyms, among others). State X, which contains anonymous transactions, contributed more records to the complete State A's transition.
Real-time matching helps deliver personalized experiences and adds prospects to the customer database with an initial state D, which is why the D state increases from 16.96% to 26.40%. It is imperative to note that the main migration path is X state to D state to A state.
The mobile or email request in physical touchpoints increased states Y and D; one of the data issues faced when managing mobile as customer identifier is that sometimes various customers can share such information within their accounts. In our experiment, 0.58% of customers shared a mobile number. A similar issue was found with email as a customer identifier, and it was found that 8.7% of all customers had more than one account, each with a different email.
While conducting this study, all digitalization strategies were implemented simultaneously. This study fails to address the impact, peak, and limit of each digitalization strategy running by itself. This might be because the corporation implemented a digitalization program that contained all eight digitalization strategies for a unified implementation to reduce costs and resource allocation. Therefore, the implementation of each digitalization strategy should run independently in future studies.