Methods of data analysis in the problem of optimizing the rental schedule

. In this paper, the main problems of scheduling screenings for a cinema were considered. The activity of one cinema "X" is described and the restrictions that must be observed and taken into account when planning the schedule of sessions are analyzed. The solution to this problem is due to the complexity and time-consuming planning of the cinema schedule. To achieve this goal, the article analyzes the statistical data obtained in the cinema "X". An analysis was carried out using the methods of statistical data analysis and predictive factors of movie attendance were identified, regression models of attendance were developed. Based on the developed regression models, statistical estimates of attendance predictive factors were obtained. Taking into account the results obtained, an optimization mathematical model for the formation of a rental schedule has been developed, which makes it possible to increase the box office of a cinema.


Introduction
Currently, there are a huge number of cinemas throughout the country. Each of them daily demonstrates a large number of films. Film scheduling in practice is a carefully thought out job based more on a gut level than on technical analysis. This article discusses the detailed planning of the schedule in order to optimize the work of the cinema in terms of the formation of the rental schedule.
For each rental week, management must determine which films will be shown in which theaters and at what time. Typically, in each screening room, a cinema may run between five and eight screenings per day, where "screening" is defined as showing a single film, including trailers and commercials.
To determine specific films and their number for a rental week, the cinema enters into an agreement with distributors, which specifies the minimum desired number of screenings per day for a particular film [1].
After all the contracts are approved and the number of films and screenings is determined, the distributors send the keys that allow the demonstration, after which the scheduling begins.
The development of a weekly schedule requires forecasts , the attendance of a movie showing starting at time t, for all movies available for showing at any possible time (broken down into hourly intervals) for each day of that week. We chose simplicity and efficiency and decided to build our forecasts using linear regression models [2].
A well-known factor is that the longer a film is in theaters, the lower the demand for it from visitors. For most films, this data can be estimated from theater attendance reports. There are also other factors, such as holidays, that can affect attendance, and we have also included accounting variables for such factors [3,4].
In addition to the above factors, there are other points to consider. In particular, we introduce variables that take into account the day of the week and the time during the day (in hourly intervals) at which movies are shown. In addition, if the data is available, weather variables can be included, for example, perhaps in rainy weather, the population does not want to stay at home and goes to the cinema, or, conversely, in rainy weather, everyone stays at home and there are fewer visitors to the cinema than usual. Also, you can take into account the holidays of schoolchildren, if there is also accurate data on this [5].

Model and method
In the first step, we use past attendance rates to estimate a demand model that separates the various time-varying models. Formally, we model the attendance of a movie screening , starting from time , as: (1) where: -number of weeks at time t since the first showing of the film m in the cinema; -indicator variable for an event when start time is within hour , ; -indicator variable for an event when the start time is on the day , ; and -two movie-specific parameters reflecting the weekly appeal of an individual movie. In particular, we assume that each movie's attractiveness follows an exponential decay pattern, with characterizing the magnitude of a movie's attractiveness at the movie's launch, and capturing its weekly decline in attractiveness. Another type of time-related variation inherent in attendance is the time preferences of moviegoers when watching a film. These effects are fixed by two sets of parameters, , reflecting the effect of "free time of the year". The day of the week factor is fixed with . The variable reflects the effect of time of day [6]. Since moviegoers tend to have more free time for leisure activities, such as going to the cinema during holidays or school holidays, we expect all of these parameters to be positive. First, five periods of school holidays are identified, then public holidays are added to account for other potential holiday effects. The current demand model assumes that attendance at movie starting at time is independent of other specific movies being shown at the same time [7].
In determining the attractiveness of individual films, two film parameter vectors, and , are critical to our demand forecasting. We split our prediction procedure into two cases: movies with 2 or more weeks of attendance data and newly released movies with no data (because they haven't been screened yet) or with one week's worth of data if the movie has already opened at the box office. When there is two or more weeks of data for a movie title (case 1), we have enough information to estimate both and , using formula (1). To make predictions for such a movie, we use the estimates of and obtained in the first step with the most recent data [8]. On the other hand, for films with limited or no attendance data (case 2), there are no estimates of (and/or ) from the first step. To solve this problem, we first built a regression model that relates and to movie attributes that can explain and . We then use the scores from this regression model and the attribute values for the new movie to estimate the movie's and values. In particular, we regress, for example, the values of , the scores of from the first step, over various movie attributes using the following model: (2) where -indicator variable for the film , which was filmed in Russia; -indicator variable for the film , which was filmed in the USA -indicator variable for the film , which is filmed in other countries; -indicator variable for movie m with a rating from 1 to 4; -indicator variable for movie m with a rating from 4 to 7; -indicator variable for movie m with a rating from 7 to 10; -indicator variable for a drama film; -indicator variable for an action movie; -indicator variable for a comedy film; -indicator variable for a cartoon movie; -indicator variable for a horror film; -indicator variable for a fantasy film; -indicator variable for a fantasy film; -indicator variable for a thriller film; -indicator variable for a war movie; -indicator variable for a melodrama film. At this point, we must generate predictions for all films available for viewing at any possible time in the new film program [9]. In particular, we take all the parameter estimates from the first step, and optionally the estimates from the second step, to predict the expected attendance of the movie starting from the future time : (3) Since our parameter estimates are derived from Equation (1.1), there will be a downward bias in the forecasts. We use a correction factor in equation (3) to compensate for this downward bias [10].

Research and results
Cinema "X" provided data on attendance reports for 2020, on the basis of which further calculations and data analysis will be carried out.
Let's start identifying the significance of factors using contingency tables. To do this, first of all, we need to divide the set of revenue values into categories. Let's single out the following categories: 1) low revenue -from 0 to 7000 rubles; 2) little profit -from 7000 to 25000 rubles; 3) average revenue -from 25,000 to 50,000 rubles; 4) good earnings -from 50,000 to 150,000 rubles; 5) high revenue -from 150000 rubles. Figure 1 below shows a graphical output of the contingency table analysis result. According to the results of the histograms, it is clearly seen that in the autumn period, the lack of revenue is most acute on any day of the week, but high revenue is observed only in the winter period [11].
The dependences of the revenue category by genre are shown in Figure 2, where 1comedy, 2 -cartoon, 3 -science fiction, 4 -fantasy, 5 -melodrama, 6 -drama, 7 -war film, 8 -action movie, 9 -horror, 10 -thriller. Also, for a more visual picture, Figure 3 shows histograms of the dependence of revenue categories on the genre. Figure 4 shows histograms for demonstrating categories of revenue from a week of release, and Figure 5 shows histograms of how revenues depend on two criteria at once: week of release and genre. On the presented histograms by genre, we clearly see that the lack of revenue can be absolutely in any genre, but high revenue is typical for such genres as cartoon, fantasy, melodrama and comedy. As for the "small" revenue category, the graph shows that it is characterized by a zero value only for genre 10, but for the "medium" revenue category, already for genres 10 and 9. In this figure, we see that the longer the film is at the box office, the less revenue it brings, but in the first week of rental, it is most likely to receive revenue typical for the "high" category, and in general, revenue from any category is typical for the first week of rental [12]. So, according to the data presented above, we see that revenue depends on quite a few factors. Now we will use the clustering method and divide the entire list of films into several groups in order to determine whether the application of this method is significant. In case of significance, when a new film is released, we could immediately assign it to one or another cluster by features and understand exactly where in the schedule it is necessary to put the release of this film. Figure 6 below shows the division of films into 4 clusters [13]. In the figure above, we see that the clustering was carried out according to the characteristics of films, namely: by genre, film rating, age limit, film duration, film format, and country code of the producer. It can also be noted that from the entire set of features in each cluster, one main feature was selected, namely: for the first cluster, the main feature is all films in 3D format, for the second cluster, the absence of films with an age limit of 18+, for the third cluster, films with the longest duration and the fourth cluster is characterized by films for the older generation [14].
So, at this stage of the work, the main factors influencing revenue were identified and the entire list of available films was distributed into clusters. We pass to the next stage of the analysis -regression analysis. At this stage, we have to identify whether there is a difference in the significance of factors in each cluster separately or whether the factors of influence do not change regardless of the cluster [15].
All the above results were carried out for the entire list of films, but for more accurate scheduling, we need an analysis for each film separately.
In order to get results using regression, you first need to convert the data to indicator variables. In table 1 shows a fragment of the initial data converted into indicator variables for assessing the demand model according to formula 1. We will consider the results of calculations using the example of the film 1917 for two weeks of distribution. After the table is completely filled, we can proceed to assess the demand for attendance. Since we are evaluating the attendance of each movie, first of all we need to take the logarithm of the percentage of attendance of each movie [16].
After we have taken the logarithm of the data we need, using formula 3 we find the film attendance estimates at the beginning of its release and at the end of the release. In table. 2, 3 and 4 present the results of regression analysis, analysis of variance and the values of variables, respectively.  So, according to the data presented above, we can see that at the beginning of the release, the attendance of the film was not the highest, but it was for the film 1917 that the attendance increased towards the end of the release. In addition, the results show that the most visited sessions were between 15:00 and 20:00, but the sessions from 12:00 to 15:00 and from 20:00 to 22:00 were also in fairly good demand. With regards to the days of the week, the highest attendance for this film is observed on Saturday [17].
Similarly, we calculate for each available film from the following list: "Angry Birds 2 in the cinema", "Invasion", "Kola Superdeep", "Antebellum", "(Not) an ideal man", "Agent Eve", "Greenland", " Little Women", "Pinocchio", "Running", "Grinch", "Ice 2", "Not Everyone Is Home", "One Breath", "Toy Story 4", "Capone. Scarface", "Kalashnikov", "Mirrors of Incarnation", "Gentlemen", "Mulan", "To the Stars", "Chicken Race", "Call of the Wild", "Star Wars: Skywalker", "Gnomes in Action", "In someone else's skin." After we have received attendance estimates for each film, we proceed to determine the parameters of the film using formula 1.2. In table. 1.5 contains a piece of data to determine the parameters of the film, presented as indicator variables, except for attendance ratings. To evaluate the parameters of a film according to formula 1.3, data are needed on the rating of films, on the country of production of the film, and on the genres in which these films were shot [18]. Based on the filled in data, using formula 1.3, we obtain data on the parameters of the film (Table 1.6). Based on the results of the data obtained, we can conclude that cartoons are the most popular of all genres, as well as genres such as fantasy, drama and melodrama are preferred by visitors, but military films or horror films are in the least demand.
So, we have received the results of the assessment of the demand model and parameters of the movie based on the statistical data for the past year. Now we can start predicting attendance and further compare with the available data and check the correctness of the forecasts [19].
Using formula 1.3, we get the attendance forecast for each movie. In order to check how accurately we predict attendance, the film "Forward" was chosen, since it has statistics on attendance for the first week of release. So, for each time interval, an attendance forecast was given, and the results of the comparison were entered in Table 1.7. According to the data from Table. 1.7 we see that it was not possible to check the data for the time from 9:00 to 12:00 and from 20:00 to 00:00, due to the fact that this film was not shown at the specified time. Such a schedule was compiled by a cinema employee due to the fact that it was assumed that there would be no visitors to this film in the morning and evening hours. However, according to forecasts, we see that in the morning hours of Saturday and Sunday, a minimum of 20% occupancy of the hall is expected [20].

Conclusions
The creation of security systems based on security quality management should be carried out in conjunction with the use of best practices, best available technologies (BAT), energysaving technologies. These works should be carried out with the support of insurance companies directly interested in the effectiveness of the protection of the insurance object. In this case, the facility's security system can be turned from a planned unprofitable system into a cost-effective system. Purposeful operational formation of security quality control, increasing the level of security culture allow us to effectively solve the pressing security problems facing our state in a crisis, sanctions and hybrid war.
Similarly, considering the forecast data and the actual data in the same time interval, we see that in most cases the deviation of the forecast from the actual values is minimal, which indicates that the forecast values are in line with expectations.
Summing up, we can conclude that as soon as a new film is released, first of all, it is necessary to evaluate its criteria and predict attendance. After concluding an agreement with distributors, for a new distribution film, first of all, we pay attention to its genres, in accordance with them, with the film's anticipation rating, data on the country of production, we substitute the corresponding criteria values and get an attendance forecast for each time interval. Based on the data received and based on the number of sessions specified in the contract, we draw up a schedule for the day.
Note that the schedule in the cinema "X" in most cases is the same for the whole week. Thus, we automatically receive a schedule for the entire rental week. But making adjustments will not be difficult, since we have an attendance forecast for the entire first week of rental.
Summing up, we can note that in the course of this work, we became convinced of the significance of the estimated coefficients. We also note that it is quite significant that the values of these coefficients correspond to expectations, namely: attendance increases in the evening (until 20:00), and in the estimates of attendance by day of the week, the largest are on Saturday and Sunday.