Assessing the Efficacy of the UV Index in Predicting Surface UV Radiation: A Comprehensive Analysis Using Statistical and Machine Learning Methods

ABSTRACT


I. Introduction
The research originated from a deep concern about the impact of ultraviolet (UV) light exposure on human health and the environment.As global warming increases and the ozone layer depletes, the intensity of UV exposure, particularly UVA and UVB, increases [1]- [3].Excessive UV rays can cause a range of serious health problems, including a higher risk of skin cancer, faster skin aging, and various eye disorders [4]- [6].UV rays cause ecosystem damage, damage crops, and degrade water quality [7]- [9].The problem is becoming increasingly urgent due to worsening global climate change, which exacerbates UV exposure [1], [3], [10].In an effort to understand and address the negative impacts, research focuses on analyzing the influence of the UV Index on UVA and UVB variables and investigating the relationship between them.The research developed effective mitigation strategies to protect human health and preserve the environment.Research highlights and attempts to address problems caused by excessive UV exposure.Careful and comprehensive analysis, contributing to the assessment of UV-related risks, and assisting in the development of better analysis methods and prediction models.The research is expected to be used by policy makers, researchers, and health practitioners to implement more effective measures to protect human health and the environment from UV hazards.
The study aims to analyze the effect of UV Index on UVA and UVB variables.The research measures the extent to which the UV Index affects the level of UVA and UVB exposure under various environmental conditions.The study determined whether there were significant differences in UVA and UVB exposure based on variations in the UV Index.The research investigated the relationship between UVA and UVB variables.The research examined the correlation between the two variables to understand the interactions that occur.Research identifies factors that may influence the relationship between UVA and UVB.Research evaluating the performance of Naive-Bayes classification models in predicting UV level categories.The research developed a prediction model that can categorize UV levels into various classes, such as "Very Low", "Low", "Medium", "High", and "Very High".The research tested the accuracy and reliability of the Naive-Bayes model in predicting UV level categories based on observational data.The research used the decision tree method to estimate the UV Index value.The research involved creating a decision tree model that could predict UV Index values based on various observed features.The research analyzed how each feature contributed to the prediction of the UV Index value.The research developed an understanding of the impact of UV exposure on health and the environment.Research provides scientific knowledge on how variations in the UV Index, UVA and UVB can affect human health, including the risk of skin cancer and premature aging.Research assesses the environmental impacts of increased UV exposure, including effects on ecosystems and air quality.
Research is expected to provide a range of significant benefits.Research contributes to science and research by deepening scientific understanding of the relationship between the UV Index, UVA and UVB and their impact on human health and the environment.The data and findings from the research are used by other researchers for further research or for policy development relating to UV exposure.This study developed an effective prediction model.The accurate prediction models for UV level categories generated from this research can be used by authorities to provide early warnings to the public regarding the risks of UV exposure.This research provides a decision tree-based analysis tool that can be used to identify the main factors affecting UV Index values.This research will improve public awareness and health policy.Useful information on the dangers of UV exposure and ways to protect themselves from excessive UV exposure generated from this research can help people to be more vigilant.Research can assist governments and health agencies in designing educational programs and policies to reduce health risks associated with UV exposure.The research has applications in environmental management.Data generated from research can be used to monitor and manage the environmental impacts of UV exposure, such as impacts on plants, animals and water quality.Research can support the development of mitigation strategies to protect ecosystems from damage due to increased UV exposure.This research will evaluate the performance of the analysis and classification models used.The research will assess the effectiveness of the ANOVA analysis method and the Naive-Bayes and decision tree classification models in the study.Based on the research findings, the research will provide recommendations for the use of better analysis and classification methods in the future.
The study has some limitations that need to be noted to understand the interpretation of the results.The study used data sourced from a single training dataset.The use of data from a single source may limit the generalizability of the results to larger or different populations [11], [12].Additional data from multiple sources or different geographical locations can provide more comprehensive results [13].The variability of the data in the dataset used does not reflect the variation that exists in the wider population, which can affect the accuracy and validity of the model built.The study has limitations regarding the variables studied.The main focus is only on three variables: UV Index, UVA, and UVB.Influential variables, such as atmospheric conditions, pollution levels, or weather conditions, were not included in the analysis.This limits a thorough understanding of the factors that influence the UV Index.The study did not explore in depth the interactions between other variables that may play a role in influencing UVA and UVB, such as geographical or temporal influences.The use of ANOVA and Naive-Bayes classification methods and decision trees are specific method choices.They have limitations in identifying non-linear patterns or complex interactions between variables.Other methods such as non-linear regression, artificial neural networks, or ensemble methods can provide different explanations.The decision tree method used in this study is the rpart method, while there are other methods in decision tree modeling such as Random Forest or Gradient Boosting that provide more accurate results and can consider more features and interactions between variables [14]- [16].The model evaluation showed some limitations.The Naive-Bayes model evaluation showed good performance in the "Very Low" and "Low" categories, but was less effective in the other categories.This suggests that the model has limitations in predicting data with more complex or varied distributions.The study did not provide complete information on the evaluation metrics used to assess model performance, such as precision, recall, F1-score, or AUC-ROC.The study did not consider external factors that could influence the results, such as climate change, environmental policies, or UV protection technologies.The influence of these factors can be significant and needs to be considered in further research.The data used was taken over a period of time and does not reflect seasonal changes or long-term trends.Long-term and seasonal data may provide a more accurate picture of the patterns of UV Index, UVA, and UVB.Results cannot be directly applied to the wider population without further verification.Further studies with various data sets are needed to ensure generalization of the results.The implementation of research results in policies or mitigation measures needs to be further tested to assess their effectiveness in the real world.This includes assessing the costs, benefits and impacts of implementing the research results.
The research fills several gaps in the literature regarding the analysis of the influence of the UV Index on UVA and UVB variables and the use of UV prediction models.Most studies tend to focus on the individual impacts of UVA and UVB on human health and the environment, without comprehensively combining and analyzing the relationship between the UV Index and these two variables [17], [18].Previous studies have often only examined one type of UV radiation at a time, lacking the ability to provide a complete picture of how UVA and UVB interact and contribute to total UV exposure [19], [20].Many studies use simple statistical methods that are insufficient to capture the complexity of the relationship between UV Index, UVA, and UVB.There is a need for the use of more sophisticated and diverse analytical methods to understand the deeper dynamics of UV data.Previous research has often not considered alternative analysis and classification methods that may be more effective or provide a different perspective.For example, the use of other machine learning methods such as Random Forest or Support Vector Machine (SVM) for classification and prediction is still rarely applied.Time series analysis and dynamic prediction models that consider temporal changes in UV data are rarely used in existing studies.Many previous studies used limited datasets or data from specific locations, which do not represent global conditions or wider geographical variations.Many studies do not provide an in-depth evaluation of the performance of the prediction models used.Model evaluation is often limited to prediction accuracy without considering other metrics such as precision, recall and F1-score.The lack of cross-validation analysis and model testing with independent data to ensure generalizability of results is also a common weakness in previous studies.A comprehensive evaluation is essential to ensure that the model is reliable and performs well across different data conditions.Existing studies often do not link their scientific findings to practical implications or policies that can be taken to reduce the risk of UV exposure.The lack of clear recommendations on mitigation measures that can be taken by individuals or governments based on research results points to the need for more applicable and public policy-relevant research.
Research offers a significant contribution in deepening the understanding of the relationship between the UV Index, UVA, and UVB.While there have been previous studies on this topic, this research takes a more specific approach by highlighting the complex interactions between the three.Integrating statistical analysis methods such as ANOVA to test the influence of the UV Index on UVA and UVB variables, as well as utilizing predictive models such as Naive-Bayes and decision trees, the research not only confirmed the significant influence of the UV Index on both variables, but also provided a deeper understanding of how these variables influence each other.Performance evaluation of predictive models, such as the one conducted in the study showed the Naive-Bayes model to be well capable of predicting UV Index levels with some accuracy, although it has limitations in certain categories of predictions.The visual representation of classification predictions not only helps in assessing the accuracy of the model, but is also important in the context of risk assessment related to UV exposure for humans and the environment.With increasing attention to the health impacts of UV exposure, the research emphasized the importance of understanding the relative contributions of UVA and UVB to the UV Index separately.Research provides a firmer foundation for the development of more effective and responsive mitigation strategies against health and environmental threats caused by UV light.

II. Theory The Role and Impact of the UV Index on Public Health
The UV index is an international standardized measure of the intensity of ultraviolet radiation from the sun reaching the Earth's surface [21]- [23].UV index developed by World Health Organization (WHO), World Meteorological Organization (WMO), United Nations Environment Programme (UNEP), dan International Commission on Non-Ionizing Radiation Protection (ICNIRP) [24]- [26].The aim is to raise public awareness Ultraviolet A radiation (UVA) is the part of the UV spectrum that has a wavelength between 320 and 400 nm [27]- [29].UVA plays a role in skin penetration and can cause skin aging and contribute to the development of skin cancer [30]- [32].
Research by [33] found that the UV Index can be used as a proxy to measure UV radiation exposure, including UVA, although the UV Index is influenced more by UVB.The study emphasized that although the UV Index was originally developed to measure UVB, it also reflects significant UVA exposure.Research conducted by [21] found that variations in the UV Index were significantly related to changes in total UV radiation levels, which includes UVA.The results suggest changes in UV Index values can be used to estimate variations in UVA exposure levels, providing an empirical basis that the UV Index is an effective indicator of the different types of UV radiation reaching the Earth's surface.Research by [34] showed that variations in the UV Index significantly affect UVB exposure in different geographical locations.Studies show that differences in the UV Index can be attributed to differences in UVB exposure at the Earth's surface.This is due to factors such as altitude, latitude, and time of year that affect the intensity of UV reaching the Earth's surface [35].According to [36], The UV Index is an indicator designed to convey the level of risk of UV exposure that varies by environmental and geographical factors.Inter-group differences in the UV Index reflect variations in UVB exposure resulting from different environmental conditions.Research by [37] showed that the UV Index significantly influences the level of UVB radiation measured at the Earth's surface.Research reveals that variations in UV Index are closely correlated with changes in UVB intensity, with an increase in UV Index indicating an increase in UVB levels.Research by [38] found that the UV Index can be used as a predictive tool to estimate UVB radiation levels.In the study, UVB radiation measurement data showed a strong correlation with the observed UV Index values, supporting the claim that the UV Index is a valid indicator for UVB variations.Research by [39] supports the assertion that the UV Index is a strong indicator for predicting UVB radiation levels.Studies show that UV Index values can be used to estimate the risk of UVB exposure, which is important for public health protection measures.Furthermore, studies by [40] confirms that the UV Index is a reliable tool for monitoring and predicting UVB exposure.
Studies using analysis of variance (ANOVA) to explore the relationship between environmental variables, particularly UV radiation exposure, and health have provided strong evidence of the effectiveness of this method in evaluating their impact.Studies conducted by [41] illustrates the use of ANOVA to analyze the effects of UV radiation on DNA.Research shows that variations in UV exposure can significantly affect the extent of genetic damage.Research conducted by [42], where they used ANOVA to evaluate the relationship between UV exposure levels and skin cancer incidence.The study results showed that variations in UV exposure levels correlated significantly with variations in skin cancer incidence.The findings underscore that ANOVA is not only able to identify significant relationships between environmental exposures and health, but also provides a strong scientific foundation to support the need for further research on environmental factors in public health.

Naive Bayes Model for UV Index Prediction
Naive Bayes is a classification algorithm based on Bayes' Theorem with the assumption of independence between features [43]- [45].In UV Index prediction, the model uses features such as UV-A radiation, UV-B, and other environmental parameters to estimate UV Index values.The model is highly efficient and fast, and is suitable for large datasets.
Research using the Naive Bayes model to predict UV Index values is based on the concept that this algorithm can identify patterns from training data to predict target values in test data [46], [47].The basic concept of machine learning emphasizes the development of algorithms that allow computers to learn from data and make decisions based on identified patterns [48], [49].
Research [50] shows that the algorithm can be effective in predicting environmental values, including in cases such as UV Index prediction which utilizes UV-A and UV-B radiation data as predictor features.Research conducted by [51], and [52] discussed in detail the strengths and weaknesses of the Naive Bayes model and its applications in various domains, including sensor databased prediction.They underline that despite its simplicity, Naive Bayes can provide good results in situations where the independence assumption is reasonably close.This concept is supported by the explanation of [53] which outlines that machine learning does not only rely on Naive Bayes as the only algorithm, but also includes various other techniques and models used to analyze and predict environmental data.In the evaluation of prediction model accuracy, as in research, data visualization and summary statistics play an important role.Techniques such as distribution plots, confusion matrix, and evaluation metrics such as accuracy, precision, recall, and F1-score are used to provide a deeper understanding of how Naive Bayes models predict UV Index values.[54] and [55] separately emphasized the importance of effective visualization and summary statistics in conveying information from data analysis, which can help researchers evaluate and improve the performance of prediction models.

Decision Tree and Artificial Neural Networks for UV Index Prediction
A decision tree is a machine learning algorithm used for classification and prediction based on measured features [56], [57] dataset into subsets based on the most significant features, forming a tree-like structure with nodes as features and branches as decision rules [58].
The study used decision trees and artificial neural networks to predict and classify UV Index values.Both methods have their own advantages in handling complex and heterogeneous environmental data.The use of decision trees in research is supported by studies conducted by [59].Which shows that decision trees are very effective in performing classification and prediction due to their ability to handle complex and heterogeneous datasets.Decision trees work by dividing a dataset into smaller subsets based on the most significant features in the dataset.The process forms a tree-like structure, where each node represents a feature and each branch represents a decision rule [58].At the lowest level of the tree, the leaves, there is a final decision or prediction [60].Decision trees can decompose large and complex datasets into a form that is easier to understand and interpret.
A decision tree can be used to classify UV Index values based on several measured features.The features include ultraviolet A (UVA) and B (UVB) radiation intensity, time of day, and weather conditions.The decision tree explains how each of these features affects the UV Index value.For example, UVB radiation intensity may be more significant in determining the UV Index value compared to UVA radiation intensity, or certain weather conditions such as clouds or rain may have a large impact on the UV Index value.From the decision tree, it can be shown that on sunny days with high UVB radiation intensity, the UV Index value tends to be high.Conversely, on cloudy days with low UVB radiation intensity, the UV Index value may be lower.Another advantage of decision trees is their ability to handle missing data and handle various types of data, both numerical and categorical [61], [62].This makes decision trees a flexible and powerful tool in the analysis of environmental data such as the prediction of UV Index values.Research by [63] shows that decision trees can be interpreted easily and provide a clear view of the relationship between input and output variables.Decision tree analysis can help identify the most significant features in influencing the UV Index value.
Artificial neural networks are machine learning models inspired by how the human brain works [64], [65].The model consists of layers of neurons that are connected and work in parallel to process information.Artificial neural networks are capable of handling large and complex datasets, as well as discovering non-linear patterns that may not be visible with other methods [66], [67].Study by [68] introduced the concept of backpropagation, which is a key algorithm in training artificial neural networks.Research can develop complex and accurate models to predict the value of the UV Index.Artificial neural networks have the ability to learn from training data and correct the weights of connections between neurons based on prediction errors [69], [70].This allows neural networks to produce accurate predictions despite variations in the data.Research by [71] shows that artificial neural networks, especially deep learning, have the ability to perform highly accurate predictions in various domains, including the prediction of environmental values such as UV Index.

Support Vector Machine (SVM) in UV Index Prediction
Support Vector Machine (SVM) is one of the machine learning methods often used for classification and regression tasks [72], [73].SVM works by finding a hyperplane that separates the data into different classes with maximum margin [73], [74].SVMs are very effective in high-dimensional spaces and in cases where the number of dimensions is greater than the number of samples [75], [76].SVMs are known to be resistant to overfitting, especially in high-dimensional spaces [76], [77].
The basic concept of a Support Vector Machine (SVM) involves several important elements that make this method effective in classification and regression tasks [73], [78].One of the key elements is the hyperplane and margin.A hyperplane is a plane in high-dimensional space that is used to separate data sets into different classes [79].SVM has the main goal of finding a hyperplane that maximizes the distance or margin between the hyperplane and the closest data point of each class [73], [74].Maximizing the margin, SVM models can improve generalization ability and reduce the possibility of overfitting [80], [81].SVM uses a technique called kernel trick to handle data that is not linearly separable.There are several types of kernels that are commonly used in SVM.Linear kernels are used for linearly separable data, where a simple hyperplane is sufficient to separate the data [82].Polynomial kernels map data to a higher dimensional space using polynomials, which allows for the separation of more complex data [83].Radial Basis Function (RBF) kernel, also known as Gaussian kernel, is one of the most effective kernels for non-linear data [84].The RBF kernel maps the data to a higher dimensional space using a Gaussian function, thus enabling the separation of more complex and non-linear data [85].Another important parameter in SVM is the regularization parameter (C) [86].The parameters control the trade-off between maximizing margin and minimizing classification error [87].The C parameter determines the extent to which the SVM model will try to separate the data by a large margin while reducing the misclassification of the training data [88].The C parameter can help SVM models achieve an optimal balance between bias and variance, which is important for good performance on unseen data in advance [89].In UV Index prediction, basic concepts are used to train SVM models that are able to predict UV Index values with high accuracy.Hyperplane and margin help in separating UV Index data based on environmental parameters, kernel trick allows handling complex and non-linear data [90].Proper setting of regularization parameters ensures that the SVM model is not only accurate on training data but also has good generalization ability on new data [78].

Indonesian Review of Physics (IRIP)
Vol.6, No.One of the most important studies conducted by [51], They used SVM to predict air quality based on environmental parameters such as humidity, temperature, and pollutant concentration.The results showed that SVM has high accuracy in air quality prediction, proving the effectiveness of this method in handling complex environmental data.Study conducted by [91] and [92], applied SVM to predict weather conditions using historical meteorological data.The research showed that SVMs were able to produce accurate and reliable predictions for weather conditions, once again confirming the ability of SVMs in environmental data analysis.Research by [93], [94] discussed the use of various machine learning models, including SVM, to predict UV radiation exposure.The study showed SVM has superior performance in UV radiation prediction compared to other models.

K-means Clustering in UV Index and UV Measurements Analysis
K-means clustering is an unsupervised learning method used to group data into a certain number of clusters based on similarity of features [95], [96].The algorithm aims to partition n data into k clusters, such that each data belongs to the cluster with the closest centroid, as measured by the Euclidean distance [97], [98].
The process of the K-means clustering algorithm starts with initialization.The desired number of clusters (k) is determined.Determining the number of clusters is a crucial step because it will affect the final result of clustering.After the number of clusters is determined, the next step is to select the initial centroid.The initial centroid selection can be done randomly, but there are other methods such as K-means++ that can be used to select the initial centroid in a smarter way to speed up convergence and improve clustering accuracy [99], [100].The algorithm proceeds to the cluster assignment stage.Each data in the dataset is measured for its distance to each selected centroid.The most commonly used distance is the Euclidean distance, although other distance metrics can also be used depending on the data.Each data is then assigned to the closest centroid based on the calculated distance.The process groups the data into temporary clusters each centered on its nearest centroid.Once the data is assigned to its respective cluster, the centroid is updated by calculating the average position of all the data belonging to that cluster.The update is done by calculating the average value of each data feature in the cluster.The new centroid then becomes the new center of the cluster.The K-means algorithm then repeats the cluster assignment and centroid update steps.At each iteration, the data is again measured its distance to the new centroid and assigned to the nearest centroid.The centroid is again updated based on the new cluster assignment.The iterative process continues until convergence is achieved, i.e. when there is no significant change in the cluster assignments or centroid positions [101]- [103].Convergence indicates that the algorithm has found a stable cluster split.
In the study of UV Index and UV measurements, Kmeans clustering was used to group data based on similarities in UV Index, UVA, and UVB measurements.The process makes it possible to discover hidden patterns in the data that may not be detected through simple descriptive or statistical analysis.Grouping data into meaningful clusters allows for the identification of groups of data that have similar characteristics, such as high or low levels of UV exposure [104].Data grouped in clusters with high UV Index indicates certain areas or conditions that are susceptible to excessive UV exposure, clusters with low UV Index may indicate safer conditions.
Research by [105] shows how K-means clustering can be used to group environmental data.The study was able to identify statistically significant patterns.Applying the K-means algorithm, researchers were able to cluster geographic areas based on similarities in UV exposure [105].Clustering helps in understanding the variation of UV exposure across regions and time, which is very important in epidemiological studies related to the impact of UV exposure on public health.The results showed that K-means clustering can identify groups of data with similar characteristics.Studies conducted by [105] and [106] highlights the application of K-means clustering in weather data analysis to predict extreme atmospheric conditions.In the study, K-means was used to cluster complex meteorological data, including UV light intensity.The results show that K-means clustering can effectively cluster weather data, which provides more in-depth information about atmospheric conditions and variations in UV light intensity.The study supports the use of Kmeans in clustering UV data to understand variations in light exposure and aid in the prediction of extreme weather conditions.

III. Method Anova Method
The method used in the research is ANOVA (Analysis of Variance), using the `aov` function [107].The study analyzed the impact of the UV Index variable on the UVA and UVB variables.The procedure begins by performing an ANOVA analysis using the formula `UVA ~ UV Index` on the `data train` dataset.In the analysis, the model consists of two terms namely UV Index and Residual.For each term in the model, the sum of squares was calculated, providing a measure of the total variation attributable to each term [108].The degrees of freedom (Df) for each term are calculated, representing the number of independent values that can vary [109], [110].The standard error of the residuals is then determined, which serves as an estimate of the error in the model [111], [112].To summarize the results, the `summary (anova_result)` function is used.The summary provides comprehensive information regarding the degrees of freedom (Df), sum of squares (Sum Sq), and mean square (Mean Sq) for each term in the model [113], [114].F values and probabilities The study evaluated the influence of UV Index variables on UVA and UVB variables.Using ANOVA analysis, the method provides an understanding of whether there is a significant difference between the groups formed based on the UV Index variable for each of the variables under study.For the UVA variable, the ANOVA analysis is represented by hypotheses, where the null hypothesis (H0) states that μ1 = μ2 = μ3 = ...= μk, indicating that there is no significant difference between the groups formed based on the UV Index variable [115], [116].The alternative hypothesis (H1) states that at least one pair of groups has a significant difference, implying that the UV Index variable does have a significant impact on the UVA variable.Similarly, for the UVB variable, ANOVA analysis was conducted with the hypothesis that the null hypothesis (H0) states μ1 = μ2 = μ3 = ... = μk, which indicates that there is no significant difference between the groups formed based on the UV Index variable and the UVB variable [117], [118].The alternative hypothesis (H1) states that at least one pair of groups has a significant difference, indicating that the UV Index variable significantly affects the UVB variable.

Naive Bayes Classification Method
The research method used is Naive Bayes classification [119].In this method, there are two types of data used, namely "predictions" and "data test".The "predictions" data contains the predicted values generated by the model for the target variable being tested.The "data_test" data consists of several observed variables, such as "Year", "Month", "Day", "UVA", "UVB","UV Index", and "Prediction".Summary statistics are given for each of these variables.
To visualize the classification results, a classification plot using the Naive Bayes method was used [120].This plot shows the distribution of classification results on the test data.The x-axis of the plot shows the actual value of the UV Index variable, while the bars on the plot are filled with colors that represent the prediction results of the Naive Bayes model.The height of each bar on the plot indicates the number of observations or frequency of each prediction category in the test data.This plot provides a visual understanding of how well the Naive Bayes model classifies the test data.
The results of the confusion matrix are also given.Confusion matrix shows the prediction results of the Naive Bayes classification model on the test data [121], [122].The results show the number of predictions that match the actual class, as well as the prediction error that occurs.The performance of the model in predicting the "Very Low" and "Low" categories based on the available test data is mentioned, as well as the inability of the model to make predictions for the other categories.
Plot displays the proportion of actual classification and predicted classification.The x-axis shows the actual classification, while the y-axis shows the proportion.A different color on the plot indicates the predicted classification.This plot provides visual information about the degree to which the predicted classification matches the actual classification.
The research method used, namely Naive Bayes classification, can be represented in algebraic mathematical formulas [123].To calculate the probability of the target class (category) based on the observed data: Where P(C | X) is the probability of target class (category) C based on data X, P(X | C) is the probability of data X occurring if target class (category) C is true, P(C) is the probability of target class (category) C as a whole, P(X) is the probability of data X occurring as a whole.To calculate the probability of data X occurring if target class (category) C is true: Where To calculate the probability of data X occurring overall: Where P(X) is the probability of data X occurring as a whole, P(Xi) is the probability of attribute value (variable) Xi in data X as a whole, and n is the number of attributes (variables) in data X.These formulas are used in the Naive Bayes method to classify the test data based on the probabilities calculated from the training data [43], [44], [124].Visualization of the classification results using classification plots and confusion matrix provides a visual understanding of the model's performance in classifying the test data, as well as the degree to which the predictions match the actual classification.

Decision tree method
The research uses a decision tree modeling method using the rpart algorithm [14], [56].A decision tree is used to predict UV Index values based on relevant features.The approach involves data partitioning based on important attributes such as UVB, UVA, Month, Year, and Day.Each node in the decision tree represents the partitioning of the data based on the attributes.The decision tree starts with a root node that contains a number of observations and the average value of the UV Index [56], [125], [126].The data is divided into two branches based on the UVB feature values.The left and right branches show different mean values and Mean Squared Errors (MSE) [127].The process of division and branching continued at each node, taking into account the complexity of the tree and the importance of the features used [128], [129].This aided in classifying observations and predicting the UV Indexvalue based on relevant features.
The decision tree involves observing the distribution of data at each node [56], [130]- [132].Starting from the root node, data is partitioned based on the condition of a particular variable such as UVB.Each node provides information such as the number of observations, deviation value, and predicted value [133], [134].Deviance measured the discrepancy between the model's predictions and the actual values, with the primary objective being to minimize deviance during the division process [135].The average value of UV Index and Mean Squared Error (MSE) were used to evaluate the accuracy of the model in predicting the target value.To predict the target value, yhat, based on the decision tree [56], [130], [136] Where T(x) is a function that generates predictions based on splitting the data at each node in the decision tree.To represent each node in the decision tree: Where yvali is the predicted value at node i, Ri is the region defined by the separation condition at node i, and I(x ∈ Ri) is an indicator function that takes the value 1 if x is in region Ri and 0 otherwise.The splitting at each node in the decision tree is based on a splitting condition that splits the data based on a particular feature value [125], [137].If splitting is done based on feature F with a splitting boundary c, then the splitting condition can be expressed as:

 
x F c  (7) At each node, there is also the mean of the target value y and the Mean Squared Error (MSE): Where yi is the actual target value, pi is the proportion of observations falling into the Ri region, and yvali is the predicted value at node i.

Neural Network Method
The method used is the development and training of artificial neural network models [138]- [140].The model was designed with 5 input variables and 1 output variable, using observed data consisting of responses and covariates for training purposes.The model was built by incorporating appropriate error and activation functions, as well as non-linear outputs [141]- [143].
The data used in the study is organized in a data.frameformat, which consists of 6 columns.The neural network model was then used to predict outcomes based on the data [144], [145].The results showed a tendency for the model to predict the same outcome for most of the test data.However, there were notable differences in the range of values between the predicted and actual values in the test data.The study reported the minimum, average, and maximum values for the predicted and actual results.The difference in the range of values between the predicted and actual results highlights the potential for improving the accuracy of the model [146], [147].During the construction of the neural network, the number and size of the hidden layers in the model are determined, and the activation functions used for the neurons in the hidden layer and output layer can also be observed [148]- [150].A graphical representation of the structure and architecture of a neural network helps in understanding how the model works and the flow of information within it, including the connections between neurons [151], [152].
A neural network diagram visualization was created to represent the structure and interconnections among the layers in the model [151], [152].Each node in the plot represents a neuron or unit in the model, the lines connecting the nodes describe the relationship between neurons in different layers [153], [154].The plot depicts the weights or parameters that connect the neurons.From the visualization, a better understanding of the flow of information through the network and the interconnections between the neurons can be gained [155], [156].
The method involves developing and training an artificial neural network model using mathematical equations.The artificial neural network model is structured with an input layer consisting of five variables namely x1, x2, x3, x4, and x5.These input variables are processed through the first hidden layer, which contains five neurons namely h1, h2, h3, h4, and h5.The neurons in the first hidden layer are interconnected with the neurons in the second hidden layer, which consists of three neurons namely h6, h7, and h8.The connection between the first and second hidden layers is determined by a set of weights namely w1, w2, w3, w4, w5, w6, w7, and w8.The neurons in the second hidden layer are then connected to the output layer, which generates the final output variable, y [148], [157].The weights connecting the neurons in the second hidden layer are then connected to the output layer [158], [159].The weights connecting the neurons in the second hidden layer to the output are represented by v1, v2, v3, v4, v5, v6, v7, and v8.Through training, the neural network model adjusts these weights to minimize the error between the predicted output and the actual output, thus optimizing its performance in predicting y based on the given input variables [160], [161].
In the development of artificial neural network models, the activation function () x  plays a very important role [143], [149].The function can be sigmoid, ReLU, or another type of non-linear activation function, depending on the model used [162], [163].The formula for obtaining the output (y) of the neural network model is: To calculate the value in the first hidden layer (h1, h2, h3, h4, h5), use the formula: Value in the second hidden layer (h6, h7, h8), the following formula is used:

() h w h w h w h w h w h
The weights (w1, w2, ..., v7, v8) and input values (x1, x2, ..., x5) used in this calculation are obtained through the model training process using the observed data.In the process, the neural network model learns to adjust these weights so that it can make predictions based on the input data [164], [165].The complex interplay between the activation function and the number of weights allows neural networks to learn and make accurate predictions [143], [149].
After formula implementation, the artificial neural network model was successfully developed and provided significant prediction results.Evaluation of the prediction results showed a tendency to predict similar results for most of the test data used [166].The difference in the range of values between the predicted and actual values indicates the potential to improve the accuracy of the model in making predictions [167].

Support Vector Machine Method
Research using the Support Vector Machines (SVM) method with predetermined parameters [72], [168].The SVM model type is eps-regression with radial SVM-Kernel, with cost, gamma, and epsilon parameters [169].SVM model prediction analysis results on test data.In the resulting plot, the x-axis (horizontal) shows the actual values of the test data, while the y-axis (vertical) shows the predicted values of the SVM model.Each point on the plot represents one test data, reflecting the actual value and the corresponding predicted value.The points on the plot are marked in blue, indicating the mapping between the actual and predicted values.To facilitate the interpretation of the performance of the SVM model in predicting actual values, a diagonal line with red color and a dashed line as a reference line are used.The lines indicate the expected position if the actual and predicted values are similar.If the points on the plot tend to approach the diagonal line, it can be concluded that the SVM model has a high level of accuracy.However, if the points are widely scattered around the plot, it indicates a significant difference between the actual and predicted values generated by the SVM model.

IV. Results and Discussion The Relationship between UV Index and Ultraviolet a Radiation Measurements using ANOVA
The analysis uses the aov function with the formula UVA ~ UV Index on the train data set.There are two terms in the model, namely UV Index and Residual.Sum of Squares for UV INDEX is 13901.489,Residual is 988.621.The Degree of Freedom for UV Index is 1, and for Residual is 2,811.The standard error value of the Residual is 0.5930407.The results show that there is a relationship between UV Index and UVA.UV Index represents the index of ultraviolet (UV) light on the surface, UVA is the actual UV light measurement.The use of the aov function and formula are used to examine the relationship between the UV index and the actual UV light measurement.In the model, the two terms identified are UV Index and Residual.The Sum of Squares used measures the variation explained by each term in the model.The Sum of Squares of the UV Index indicates the extent to which the variability in the actual UV light measurement can be explained by the measured UV light index.Sum of Squares of Residuals describes the variation that cannot be explained by the UV light index, and is therefore considered the "error" in the model.Degrees of Freedom is the number of values that can vary in a statistical test.In this case, the Degree of Freedom for the UV INDEX is 1, which indicates that the UV light index has one degree of freedom in explaining the variation in the actual UV light  The results of the analysis showed a significant effect between the two variables.To summarize the results, the summary (anova_result) function was used.Two degrees of freedom (Df), with 1 for UV INDEX and 2811 for Residual.The sum of squares (Sum Sq) for the variable UV Index is 13901, Residual is 989.The Mean Sq for UV Index is 13901, while for Residual is 0. The F value is 39527 with a very low probability value (Pr(>F) of <2e-16), indicating the UV INDEX variable has a significant influence on the UVA variable.The significance code (***) indicates a very high level of significance.The UV Index variable has a significant effect on the UVA variable, with a high F value and a very low significance level.This indicates a strong relationship between the two variables in this model.The results show that the UV Index value has a significant influence on the UVA value.There is a strong relationship between the ultraviolet light index and the detected ultraviolet radiation level.This indicates that changes in the ultraviolet light index can have a significant effect on the level of ultraviolet radiation detected on the sky surface.

Figure 2. Analysis graph for Anova test of SFC UVB against UV Index
Analysis of Variance (ANOVA) test was conducted to compare the effect of the UV Index variable on the UVB variable.The results showed that there were significant differences among the groups formed based on the UV Index variable on the UVB variable.The ANOVA results showed that the UV Index variable had a sum of squares of 14.724 and a mean square of 14.72, with an F value of 411.821 and a very small p value (<2e-16), indicating a 0 .90 .9 1 0 .9 2 0 .9 3 0 .9 4 0 .9 5 0 .96 0 .97 0 .98 0 .9 9 1 1 .0 1  significant difference between groups.Further analysis was done by examining the summary of the ANOVA test results.The summary shows that the UV Index variable has 1 degree of freedom and a sum of squares of 14.724, the residuals have 2.811 degrees of freedom with a sum of squares of 0.101.There is a residual standard error of 0.005979357.

The Relationship between UV Index and Ultraviolet B Radiation Measurements using ANOVA
Analysis of variance (ANOVA) showed a significant relationship between the predictor variable "UVA" and the target variable "UVB".This is evident from the very high F value (51.708) and very low p value (<2e-16), indicating a significant difference between the groups distinguished by the predictor variables.In the ANOVA model, the predictor variable "UVA" contributed significantly to the variation in the target variable "UVB".The predictor variables explained most of the variation in the target variable, as indicated by the high Sum of Squares value (14.060) compared to the Sum of Squares value for the Residual (0.764).The evaluation results show that the predictor variable "UVA" has a strong influence on the target variable "UVB" in the dataset.The information can be useful in further modeling and analysis.

Evaluation and Interpretation of Naive Bayes Classification Predictions
In the research results, there are several prediction values generated by the model.These values show the prediction results for the tested target variables.In the dataset, there are various prediction values such as 2.12, 1.95, 1.73, and others.The sum of each prediction value is also shown.The prediction process involves the use of a pre-studied model.The model uses various methods and algorithms that have been learned from the training data to produce accurate predictions.The model uses machine learning or deep learning techniques to learn patterns and relationships in the training data, so that it can make relevant predictions for new data.At each iteration or trial, the model generates a different prediction value for the target variable under test.It can be seen that there is variation in the predicted values produced by the model for the same target variable.Variations can be caused by several factors such as data complexity, sample size, and the method used in making predictions.
The study examined the results of data known as "test data", which included several observed variables including "Year", "Month", "Day", "UVA", "UVB", "UV Index", and "Prediction".For each variable, comprehensive summary statistics such as minimum value, first quartile, median, mean, third quartile, and maximum value were calculated.In particular, for the variable "Year", the data ranges from a minimum of 2010 to a maximum of 2023, with a median year of 2015.The analysis includes a detailed calculation of certain values in the "Prediction" variable; for example, there are 116 observations where the prediction is 2.12.
The visual representation in the form of a plot illustrates the distribution of classification results across the test data.Here, the x-axis represents the actual value of the UV Index variable, while the bars are color-coded to reflect the predictions generated by the Naive Bayes model.Each different color corresponds to a different prediction category, with the height of each bar indicating the frequency of predictions in the test dataset.This graphical depiction facilitates a direct comparison between predicted and actual values, thereby assessing the performance of the model.Instances where predictions align well with actual UV Index values indicate good model accuracy, whereas significant differences prompt further examination of the model or the quality of the underlying data.The confusion matrix results illustrate the performance of the Naive Bayes classification model on the test data set.Specifically, the matrix shows that no predictions were made for the "Medium", "High", "Very High", and "Extremely High" categories, reflecting the absence of corresponding data in the test set.In contrast, the model showed effectiveness in predicting instances categorized as "Very Low" and "Low".Specifically, the model accurately predicted 579 instances as "Very Low" and 588 instances as "Low", matching the actual classes.Some discrepancies were noted, with 29 examples from the "Very Low" category incorrectly predicted as "Low" and 10 examples from the "Low" category incorrectly predicted as "Very Low." To visually represent the findings, classification plots were created using the ggplot2 library.The plot depicts the distribution of actual and predicted classifications, where the x-axis shows the actual classification categories and the y-axis shows their proportions.Different colors on the plot indicate the predicted classification.Titled "Naive Bayes Classification," the plot uses a minimalist theme with adjusted text size to improve clarity.

Decision tree classification
Results with n = 4019, The first node (root) has a total of 4019 observations with a mean value of 1.957345 and MSE (Mean Squared Error) 0.1529194.The node divides the data into two branches based on the UVB value.If the UVB value is less than 0.365, the observation will be classified on the left branch (node 2) with a mean value of 1.554443 and MSE 0.08087049.If the UVBl value is greater than or equal to 0.365, then the observation will be classified on the right branch (node 3) with an average value of 2.187958 and MSE of 0.0480621.The results provide an understanding of how the measured features affect the UV index values at the atmospheric surface.Using the decision tree, it is possible to classify observations based on relevant features and predict UV index values based on the features.
In a decision tree plot, each node represents a split of data based on the condition of one of the variables.Each node contains information such as the node number, the number of observations included in the node (n), the deviation value, and the predicted value (yval).When interpreting this plot, one can start from the root node, which has 4019 observations.The root node has the first separation based on the UVB variable with a separation threshold of <0.365.The number of observations that meet this condition is 1463, while those that do not is 2556.The root node then branches into two child nodes (left and right) according to the splitting condition.This splitting and branching process continues at each node, where each split is based on the variable that provides the most significant increase in deviance.Deviance measures how much the predictions in the model differ from the actual values, and the goal is to minimize deviance as much as

Neural network analysis
The results show that the neural network model that has been built has a structure consisting of 5 input variables and 1 output variable.The model was trained using 2813 observation data as responses, with 14065 observation data as covariates.The model uses appropriate error functions and activation functions, and uses nonlinear outputs.The data used consisted of 6 columns in data.frameformat.The prediction results from the neural network model show that most of the predictions have a value of 1 [170].Indicating the model tends to predict the same result for most of the test data.The results show that the predicted and actual values on the test data have different value ranges.The minimum predicted value is 1, while the minimum actual value is 0.390.The average predicted value tends to be close to 1 with an average of about 1.968, while the average actual value is about 1.968.The maximum predicted value and the actual value are also different, with a maximum predicted value of 1 and a maximum actual value of 3,210.The neural network model developed tends to produce the same predictions for most of the test data.However, there is a difference in the range of values between the predicted and actual values.This shows that the model still has the potential to be developed in order to provide predictions that are more accurate and in accordance with the actual values.The research produces a plot in the form of a neural network diagram, which illustrates the structure and relationship between the layers in the neural network model built [152].Each sphere in the plot represents a neuron or unit in the neural network model.The lines connecting these spheres represent the relationships between neurons in different layers.Labeled lines indicate weights or parameters that connect neurons between layers.The number and size of hidden layers in the model is determined by the "hidden" argument during the construction of the network model [148].There are two hidden layers with 5 and 3 neurons respectively.A sigmoid activation function is used for the neurons in the hidden layer and the output layer [171].This can be observed from setting linear.output= FALSE during model construction.The plot represents the structure and architecture of the constructed artificial neural network model.The plot provides a visual representation of how information flows through the network and how each neuron is interconnected.This helps in understanding and interpreting the results of the artificial neural network model.

Support vector machine analysis
The research uses the Support Vector Machine (SVM) model with parameters determined including SVM-Type: eps-regression, SVM-Kernel: radial, cost: 1, gamma: 0.2, and epsilon: 0.1.The number of Support Vector used in this model is 508.The prediction results of the SVM model on the test data show a minimum value of 0.5836, a first quartile value of 1.7478, a median value of 1.9943, an average value of 1.9695, a third quartile value of 2.2345, and a maximum value of 2.9193.The SVM model with predetermined parameters can predict the UV index value with an average of 1.9695 on the test data.
SVM model is a method used for classification and regression [172].SVM works by constructing a hyperplane or dividing surface that separates two different classes of data with a maximum margin [73], [74].In the case of regression, SVM is used to predict continuous values based on the given data.In this study, an SVM model with epsilon regression (eps-regression) type is used.The SVM kernel used is a radial kernel, which maps the data to a higher dimensional space to facilitate class separation [173], [174].The cost parameter controls the trade-off between error margin and data labeling error, while the gamma parameter controls how much influence each data sample has in the formation of the hyperplane [175], [176].The epsilon parameter is used to determine the level of error tolerance in prediction.In this study, the SVM model with predetermined parameters provided prediction results for the UV index value in the test data with an average of 1,969.

Figure 8. SVM Prediction
In the analysis, a graph has been used where the xaxis represents the actual values of the test data, and the yaxis represents the values predicted by the SVM model for the same data.Each point on the graph signifies one test data, with its position reflecting the actual and predicted values, respectively.The points on this graph are colored blue to highlight the relationship between the actual and predicted values.The graph includes two reference lines, a red diagonal line and a dashed line.The red diagonal line shows the ideal scenario where the dots should be if the actual and predicted values are identical.The dotted line serves as an additional point of comparison.If the points are close to the diagonal line, it indicates that the SVM prediction model performed well in predicting the actual values.Conversely, if the points are widely scattered, this indicates that there is a significant difference between the actual value and the value predicted by the SVM model.

K-Means scattering analysis
The study used the k-means clustering method to analyze the UV, UVA, and UVB Index data, which resulted in the formation of six distinct groups.The analysis revealed a varied distribution in the data set across the groups.Specifically, Group 1 consists of 437 data points, Group 2 includes 180 data points, Group 3 contains 784 data points, Group 4 consists of 1043 data points, Group 5 includes 972 data points, and Group 6 has 603 data points.The k-means clustering analysis focuses on several variables, namely clusters, centers, totss (total sum of squares), withinss (sum of squares within clusters), tot.withinss (total sum of squares within clusters), betweens (sum of squares between clusters), size (size of each group), iter (number of iterations), and ifault (exit condition from the algorithm).The k-means clustering method groups data based on similarity or distance [177], [178].The process starts with the random selection of an initial center for each group.Each data point is then assigned a group label based on its proximity to the nearest center.The group centers are recalculated by averaging the data points within each group.This iterative process continues until the group centers become stable or show minimal change.The k-means clustering method effectively grouped the UV, UVA, and UVB Index data into six groups with different distributions.The volume of data within each group provides an understanding of the relative size of the groups.The variables observed in the analysis provide information on the number of squares within groups, the total number of squares within groups, the number of squares between groups, the size of each group, and other metrics that facilitate a comprehensive understanding of the clustering results.The graph visualizes the results of the k-means clustering analysis on the UV Index, UVA, and UVB data.The x-axis represents the UV Index, while the y-axis shows the UVA value (μW/cm²).Each point on the graph corresponds to one observation, with its position determined by the respective UV Index and UVA values.The dots are colored according to the groups identified by the k-means analysis, as indicated by the color legend to the right of the graph.For example, a blue dot indicates that it belongs to the blue cluster.The graph shows the grouping of the data into six different clusters, allowing the observation of patterns and relationships between the UV Index and UVA values within each cluster.

Discussion
After conducting various statistical analyses and modeling using various methods, this study revealed some significant findings regarding the relationship between the UV Index and measured ultraviolet (UV) radiation levels.
Analysis using ANOVA showed a significant relationship between the UV Index and Ultraviolet A (UVA) radiation measurements.The ANOVA results showed variations in the UV Index significantly explained variations in UVA levels, with high F values and very low p values (<2e-16).This indicates that changes in the UV Index substantially affect the level of UVA radiation detected at the surface.The analysis showed that the ANOVA model had good accuracy in predicting UVA values based on the UV Index, as indicated by the low residual standard error values.ANOVA analysis of Ultraviolet B (UVB) radiation measurements yielded similar findings.The analysis showed that significant differences existed between the groups formed based on the UV Index variable on UVB levels.The high F value and very low p value confirmed that the UV Index significantly influenced the variation in UVB levels.Results support the use of the UV Index as a robust indicator for predicting UVB radiation levels at the surface.
Classification analysis using Naive Bayes shows variation in predicted values based on the trained model [45], [120].The model uses machine learning algorithms to identify patterns in the training data and applies those patterns to predict target values in the test data [179], [180].The visualization and summary statistics of the predictions give an idea of the accuracy of the model in predicting the UV Index values.
Analysis using decision trees and neural networks demonstrated effective approaches in classifying and predicting UV Index values based on relevant features in the data.The decision tree model provides a deep understanding of how the measured features affect the UV Index values, while the neural network demonstrates a complex model structure with the ability to predict the UV Index values well, albeit with variation in the observed predictions [181].
Analysis using Support Vector Machine (SVM) and k-means clustering highlighted the methods' ability to model and cluster data based on the UV Index and associated UV measurements.SVM was successful in predicting UV Index values with a high degree of accuracy, while k-means clustering effectively clustered the data into different groups based on their similarity in the context of UV Index and UV measurements.

V. Conclusion
Research theoretically confirmed that the UV Index is an effective indicator in predicting and understanding ultraviolet (UV) radiation levels at the surface.The statistical analyses performed, including ANOVA, Naive Bayes, decision trees, neural networks, SVM, and k-means clustering, consistently showed significant relationships between the UV Index and UV A and UV B measurements.These results provide strong support for the use of the UV Index as an important tool in monitoring and predicting UV exposure.Practically, the research has important implications in the context of public policy and public health.The use of the UV Index can help in informing the public about the risk of high UV exposure, which can contribute to health prevention efforts such as the use of sun protection and planning outdoor activities.The findings also provide a solid basis for the development of early warning systems and more effective UV protection strategies.
For future research, it is recommended to continue the study by considering some additional aspects.The development of UV Index prediction models is a crucial step to improve accuracy and precision in predicting UV radiation levels.Integration of more features and advanced modeling techniques can help in generating models that are more reliable and suited to variations in environmental conditions.It is important to investigate the temporal variations of the UV Index and UV radiation in a broader context, including understanding seasonal changes and long-term trends.Studies will provide a new, deeper understanding of how weather and climate dynamics contribute to UV exposure levels in different regions.Further analysis of environmental effects such as clouds and pollution should also be considered.These factors can significantly affect the distribution and intensity of UV radiation at the surface, so understanding their interactions will aid in the development of more effective mitigation and early warning strategies.Research on the long-term impacts of UV exposure on public health is an important area for further exploration.Studies could include evaluating the risk of skin cancer and other impacts related to UV exposure, providing an important basis for the formulation of sustainable public health policies.
. The algorithm works by dividing the Ervianto, et al.Assessing the Efficacy of the UV Index in Predicting … p-ISSN: 2621-3761 e-ISSN: 2621-2889

2 ,
December 2023, pp.99 -121 104 Ervianto, et al.Assessing the Efficacy of the UV Index in Predicting … p-ISSN: 2621-3761 e-ISSN: 2621-2889 Ervianto, et al.Assessing the Efficacy of the UV Index in Predicting … p-ISSN: 2621-3761 e-ISSN: 2621-2889 Ervianto, et al.Assessing the Efficacy of the UV Index in Predicting … p-ISSN: 2621-3761 e-ISSN: 2621-2889 measurement.As for the Residual, the Degrees of Freedom are 2811, which indicates the number of degrees of freedom remaining after considering the variability explained by the UV INDEX.The standard error value of the Residual illustrates the extent to which the actual UV light measurement values tend to differ from the values estimated by the model.A lower standard error value indicates a higher level of precision in the model.

Figure 1 .
Figure 1.Analysis graph for Anova test of SFC UVA against UV Index al. Assessing the Efficacy of the UV Index in Predicting … p-ISSN: 2621-3761 e-ISSN: 2621-2889

Figure 3 .
Figure 3. Anova test analysis graph of UVA against UVB

Figure 7 .
Figure 7. Neural network plot analysis results

Figure 9 .
Figure 9. Clustering of UV data

Indonesian
Review of Physics (IRIP) Vol.6, No.2, December 2023, pp.99 -121 114 Ervianto, et al.Assessing the Efficacy of the UV Index in Predicting … p-ISSN: 2621-3761 e-ISSN: 2621-2889 >F)) were calculated to assess the significance of the effect of the UV Index variable on the UVA variable.The study extended this ANOVA test by testing the effect of the UV Index variable on the UVB variable.The test results are presented in summary form which includes degrees of freedom (Df), sum of squares (Sum Sq), and mean square (Mean Sq) for each term in the model.