1 Introduction

Our lives now incorporate technology in every way. With the help of intellectual technology, tasks that earlier had certain constraints are now flourishing. The “internet of things” has unquestionably enabled homo sapiens to act in remarkable ways. One of the largest e-commerce companies, Amazon, created “Alexa,” a “cloud-based speech service” that provides its consumers with an intuitive type of technological interaction called “Automatic Speech Recognition,” as an example of the aforementioned statement. The wonderful method of turning "spoken words" into text is a seamless technology that has multiplied its usefulness by allowing even children to utilise it.

‘Artificial intelligence’ could easily be understood as ‘how machines, categorically the computer systems are able to simulate human intelligence’. The broad applications of AI are done in the area of ‘natural language processing’, ‘machine vision’, ‘speech recognition’ and ‘expert systems.’ Artificial Intelligence as explained by Stephen Dick in a research article states that AI is not just limited to the fact that humans are trying to replicate human intelligence through mechanical aspects in fact, it glorifies how we have changed thinking about the term ‘intelligence’ itself [13]. In the book “Artificial Intelligence: A Modern Approach” (Pearson, 1995) written by Stuart Russell and Peter Norvig, the authors have defined AI as “the designing and building of intelligent agents that receive percepts from the environment and take actions that affect that environment” [30]. The word “take actions” is the key distinction between general-purpose software and artificial intelligence. Artificial Intelligence enables machines to react autonomously to signals from the outside world—signals that programmers are not directly in control of and cannot foresee [29].

The period of COVID-19 has introduced things that could not be possible otherwise. The work-from-home structure, online-school classes, online exams are a few examples. However, the lockdown made e-entertainment even more necessary. For the existing Over-the-Top (OTT) IPTV (Internet Protocol Television) platforms to thrive in the entertainment business, the years 2019 through 2021 were a significant boom time. While movie theatres were closed, content from various OTT platforms took over to occupy the audience's free time across all age groups. The OTT platforms provided access to movies, original material, and television content.

Over-the-top (OTT) media services use the internet instead of relying on cable or satellite for the transmission and distribution of content. This was the emanation of streaming content. According to the Verizon website, streaming is defined as "any media content, live or recorded, supplied via the internet and played back in real-time to computers and mobile devices. Typical examples of streaming content include podcasts, webcasts, motion pictures, television shows, and music videos [38]. Since OTT IPTV services are more accessible and have access to more content, they have radically changed all other facets of the media landscape [15]. There has been a sharp rise in OTT IPTV service subscriptions, which has spurred a “content arms race” [25].

Customer engagement (CE) and quality of service experience (QoSE) are two additional factors that have been highlighted by Chang and Chen (2008) and Delafrooz et al. (2011) in their studies. Other factors also include customer attitude, usefulness, and quality, ease of use, perceived risk, security, engagement, and service experience, to name a few [9, 10].

1.1 Netflix: the leader

Netflix is unquestionably in the lead among OTT IPTV service providers despite the hefty subscription costs [27]. The oddball content is everything. The enormous “video-on-demand” streaming behemoth is diligently researching methods to improve the viewing experience and keep up with users' escalating expectations [8]. Perhaps the use of captivating artificial intelligence tools and algorithms is necessary to improve the “audience relevance” of the material [1]. After beginning in 1998, Netflix quickly switched to the digital video industry. Along with a solid recommendation system, Netflix also had effective marketing tactics. In order to encourage people to improve upon their current recommendation algorithm, Netflix presented a “$1 million cash prize” in 2006 [24] (Table 1, Fig. 1).

Table 1 Parameters of the survey
Fig. 1
figure 1

The pie-chart reflects the most preferred OTT Platform among the respondents gathered from a survey

A survey of 307 respondents who use OTT platforms in Delhi was performed to determine whether Netflix is, in fact, the market leader. 63.8% of them selected Netflix as their top option (Fig. 2).

Fig. 2
figure 2

Netflix uses a variety of current technologies to deliver its streaming service

1.2 Netflix and technology

These are just a few of the technologies used by Netflix, and the company continues to adopt new technologies as they become available. Apart from these Netflix uses artificial intelligence in various ways to enhance its streaming service. Some examples include:

  • Recommendation systems: Netflix uses AI algorithms to suggest TV shows and movies to users based on their viewing history and preferences. Several algorithms make up the Netflix Recommendation Algorithm (NRE), which filters content according to a user’s profile. The algorithm uses 1300 recommendation clusters to sift more than 3000 + titles based on the tastes of each individual user. Given that personal recommendations account for 80% of Netflix viewing activity, it is effective and pays off. Also, by preventing subscription cancellations, it helps Netflix save billions of dollars annually [20].

  • Content creation: AI is used to analyse data and make decisions about which original shows and movies to produce, and which actors and directors to hire.

  • Quality control: AI algorithms are used to detect and remove unwanted content, such as explicit images or copyrighted material.

  • Personalization: AI is used to create custom and personalized user interfaces, as well as to generate trailers and promotional content.

  • Dynamic Optimization: AI algorithms dynamically optimizes video quality based on a user's network speed, device and other parameters to provide a smooth and uninterrupted streaming experience.

  • These are just a few examples of how Netflix uses AI, and the company is constantly exploring new ways to leverage the technology to improve its service.

The main development in the technology era is media streaming, which enables consumers to access high-quality “on-demand” and “live media” material while on the go, wherever they are, at any time. Due to rising network utilisation and consumer demand for high-quality media content, achieving the optimum quality of service and quality of experience (QoE) is currently a major challenge. Typical streaming methods have a lot of difficulties in providing customers with high-quality media content [23].

As the number of web-based content providers is increasing, all the platforms are busy finding ways to provide relevant content to the users as an “automated recommendation” so that they can choose from the “overwhelming set of choices” [17]. The most widely used algorithm in this field is collaborative filtering, which evaluates user history and makes product recommendations to a target user based on correlation score. The term ‘Collaborative Filtering’ was developed by “Tapestry”; the pioneers in developing recommendation systems explained how different algorithms are formed on the basis of similarity amongst the users who have a certain set of behaviour. For instance, if users X and Y have a similar liking for a program in the past; then it is assumed that they might reflect similar behaviour in the future [17]. However, this method frequently encounters issues with data sparsity. To solve the aforementioned issue, Behera and Nain suggest using a deep nonlinear non-negative matrix factorization (DNNMF) method. To create non-negative vectors, they suggest first putting non-negative constraints in the embedding layer. The deep neural networks (DNN) will then be used to extract the nonlinear interactions between the users and the products. The Adam optimizer simultaneously updates the latent features and parameters [5]. It is seen that the suggested model addresses the CF sparsity issue.

Breese et al. established a popular division of CF algorithms into Memory-based and Model-based techniques. Memory-based approaches attempt to link recommenders through the concept range of distance measurements. On the other hand, model-based methods attempt to develop a compact model from the training data, for instance, by learning the parameters of a parametric posterior distribution [7, 19].

“Collaborative filtering” and “Content-based recommendation system” are two of the primary flavours of the recommendation system, respectively. Collaborative filtering is the method that recommender systems utilise the most frequently. Using information from nearby users or other pertinent users, the Collaborative Filtering algorithm develops predictions [3, 22, 31, 39]. This technique provides a helpful list of movies that are filtered based on how closely their ratings resemble those of other users. The most important phase in collaborative filtering is identifying the user’s preferences that are comparable to those of other users. However, it has drawbacks including sparsity, cold start, and scalability [4, 34]. As per Kanmani, and others in their paper “Recency augmented hybrid collaborative movie recommendation system”, “Amazon and Netflix are the pioneers of the movie recommender systems” [21].

Here are the primary technologies that Netflix utilises to manage big data now that it has fully transitioned to the AWS cloud. Amazon Prime Video, the closest competitor of Netflix on the other hand, stores data using Amazon Simple Storage Service (Amazon S3) for improved scalability, availability, and performance. As a result, not only do machine learning and AI play a crucial part in determining how Netflix and Amazon Prime Video operate, but they also create new potential for both companies to expand [2].

1.3 Importance of thumbnail images for OTT IPTV platforms

The much-hyped OTT IPTV content providers are in the haste of reaching out to a greater number of people and thereby increasing their subscription year by year. Even though there is a minute division of work being done with over thousands of people deployed, companies are leaving no stone unturned to grab extra eyeballs to mint bonus money. Good companies listen to what their clients have to say. However, there are significant differences between what customers demand; and what really do [18].

The dynamic optimisation of Artificial Intelligence (AI) is enhancing the user experience and streamlining the operations at Netflix [1214]. The stereotyped methods are replaced with the power of AI to establish novel recommendation system at Netflix to give customised experience based on the users’ viewing history and preferences [28]. The current use of aesthetic visual analysis (AVA); an image selection tool for individual user thumbnails patented by Netflix is creating a niche for itself in the Over-the-top market [16].

A viewer must be persuaded by a streaming provider that a film is worthwhile. Choosing a related thumbnail to represent the titles is one way to go about it. Finding a visual that is approachable and emphasises the key elements of a title is challenging, though. For this, Contextual Bandits (web-based learning algorithms) takes into account how users have interacted with a particular title, their country of origin, preferred languages, the type of device they are using, the daytime, and other days of the week [37].

The attributes are first supplied into the model. Second, the Artificial intelligence system ranks the photos and forecasts whether a particular image will be played in a particular member situation. Thirdly, it uses these probabilities to order a proposed collection of photos. Finally, the observer sees the image that has the best chance of appearing.

1.4 The patent technology of Netflix- AVA for auto-generation and personalisation

‘Aesthetic visual analysis’ (AVA) is one of the features that Netflix uses for the selection of the thumbnail image. Unlike the traditional techniques of manual selection, AVA aesthetically analyses all the frames in the video and selects the frame that clearly elaborates the show at the same time being visually appealing so that it assists the viewer to have interest in watching it [26] (Fig. 3).

Fig. 3
figure 3

Demonstration of how same programme may have different thumbnails

AVA came into existence as a ‘research project’ through a collaboration between Xerox Corporation and University of Barcelona, Spain to address the concern of images classification and processing on the internet space. AVA classifies and rates the images using an array of statistical measurements like mean, variance and standard deviation [11]. AVA evaluates the semantic difficulties and selects the appropriate images for the database based on the distributions generated from these statistics. The main benefits of AVA are that it trains more images and solves the problems of extended benchmarking. The analysis and evaluation of an image's aesthetic qualities are known as aesthetic image analysis. There are majorly three main types of annotations that are used in AVA:

“Aesthetic annotations”: There is a distribution of scores for each image, and the same corresponds to a single vote. These score distributions are a treasure mine of aesthetic evaluations made by hundreds of skilled amateur and professional photographers. These annotations have a high intrinsic worth because they reflect how both amateurs and professionals perceive visual aesthetics

“Semantic annotations”: There are data sets that are established using semantic tags and images are sampled randomly from AVA in a content based classifier which is better than a generic classifier [21]

“Photographic style annotations”: The style of taking photographs consistently by manipulating the camera configurations such as ISO level, shutter speed, et al. Some of the photographic styles include Complementary Colours, Duotones, High Dynamic Range, Image Grain, Light on White, Long Exposure, Macro, Motion Blur, Negative Image, Rule of Thirds, Shallow Depth of Field, Silhouettes, Soft Focus, Vanishing Point and more [21]

2 Methodology

This paper examines Netflix's usage of AI algorithms to give users personalised viewing experience possible. Meanwhile, a random survey was conducted on 307 respondents who are accessing OTT platforms to understand which is the most popular OTT service provider among people. Data was collected via an online questionnaire survey.

To enrich this study and support the statements, “how this dimension of intelligence is evolving new branches in the entertainment sector” and whether “the exclusive AI technology used by Netflix, i.e. AVA (Aesthetic Visual Analysis) can work more like humans” content analysis methodology was being opted wherein an array of studies that are done primarily on the image selection tool ‘AVA (Aesthetic Visual Analysis) and related literature were studied to magnify the scope of the study and create a better understanding on the subject.

3 Findings

Netflix has a subscriber base of around 220.67 million paid subscribers worldwide [32, 33]. Netflix's success is primarily down to the fact that it began utilising machine learning before its rivals did. One of the keyways that artificial intelligence is assisting Netflix in saving time from the lengthy process of title image selection is by using AVA technology. To preserve the depth of the story in the title image, the significant procedure begins with the analysis of the photos obtained by the process of "frame annotation," which considers each individual frame. Netflix processes videos more quickly by using the “Archer” framework (Fig. 4).

Fig. 4
figure 4

The process of editorial image selection from the source video (Image Credit: Netflix Technology Blog [6])

To facilitate concurrent video processing, Archer divides the footage into incredibly tiny parts. After the frames are acquired, several image recognition algorithms are applied to them in order to create ‘meta-data’. Visual, contextual, and composition meta-data are additional categories for metadata (Fig. 5).

Fig. 5
figure 5

Meta-data categories

The usage of merchandise stills (static video frames) to increase a title's visibility on the Netflix streaming service involves taking static frames straight from the original video content [15]. Three crucial factors—the main actors, the visual field, and the sensitivity filters—are taken into consideration while selecting the “best” image:

  • AVA filters primary characters from supporting characters or extras using a combination of actor recognition and face clustering to determine the main character for a specific episode [36].

  • The contextual meta-data is used to create an image variety index that allows all the relevant photos gathered for a movie or episode to be assessed according to their aesthetic appeal.

  • The picture vectors filter out and give low importance to sensitive elements including violence, nudity, and marketing. In this manner, they are eliminated from the process.

As per Tsao et al., when representativeness is used as a selection criterion for thumbnail images, the relevance of a frame to the video from which it was derived is measured. The essence of the video should be clearly conveyed in a representative frame [35]. For a romantic movie, a frame displaying the interaction of a couple would be a good option. Similar to how a frame depicting what occurs on the battlefield would be a good pick for a war movie [40].

Since there are various ways to understand and describe diversity in imagery, “creative and visual diversity” is one of the subjective fields. The phrase “image diversity” more clearly refers to the algorithm’s capacity to record the “heuristic” variance that happens naturally within a single film or episode. Camera shot types (long shot vs. medium shot), visual similarities (rule of thirds, brightness, contrast), colour (most prevalent colours), and saliency maps are a few of the visual heuristic variables that have been implemented into AVA to surface a diverse image set for a title (to identify negative space and complexity).

4 Conclusion

In the context of AVA working more like or better than humans, the answer is that these AI technologies are performing with ease and more importantly equivalent to or better than humans because these lengthy procedures are completed in a compact time frame.

Netflix has used AI to power content recommendations for viewers; this has enabled the platform to deliver improved and personalised content to its users. Meanwhile, “Collaborative filtering” and “Content-based recommendation” system is intelligently designed to get insights into your viewing patterns and offer recommendations of other alternatives in similar categories. It is considered an excellent method for retaining visitors on your platforms. AI backed available data has the capability of recognizing event-based trends and delivering content to its users. The user data available with the platforms combined with AVA will help the OTT platforms to deliver content based on your favourite actor, director, singer’s birthday, or anniversary. Based on your last watched content it can even recommend upcoming events and news i.e. new movie releases. The strategy will help the platform owners to increase their revenues along with increasing their user base.

The AVA-driven user experiences make it simpler for customers to browse and discover pertinent content. A decreased bounce rate is guaranteed by strong AI-driven infrastructure. As device computing power rises, artificial intelligence algorithms have become more significant in the OTT business and have boosted content delivery across varied audiences. Although AI technology itself cannot actively cause harm, the data it generates can be misused. To secure user data, media firms and regulatory bodies must create a framework that makes sure certain security requirements are satisfied. In the meantime, OTT companies are using data to apply AVA to enhance the user experience. Technologies like AVA or similar are expected to be adapted by other popular OTT platforms in the future, the focus should be to adapt to data-driven personalised content and services and deliver a more immersive and engaging experience for its users.

5 Future scope of research

Since only Netflix is using a patented artificial intelligence technology it is therefore important to know and understand what its close competitors are doing to match up or go ahead from Netflix. With aiming new technologies every now and then, Netflix with its remarkable use of artificial intelligence and machine learning is all set to introduce a new sound system that is assumed to produce studio-quality sound that is clearer, more intense, and more fully immersive. This is also a key innovation that has been introduced by Netflix. Though AVA may efficiently cluster image frames based on a unique vector for variety by integrating these heuristic variables, a diversity index can further be created and the same may be used to evaluate the entire “candidate imagery” included in a particular episode or movie by combining several vectors. Researchers can further explore through primary research on the above-mentioned areas as to how these OTT IPTV platforms are innovating the use of AI and ML.