Machine Learning and Sustainable Mobility: The Case of the University of Foggia (Italy)

Thanks to the development of increasingly sophisticated machine-learning techniques, it is possible to improve predictions of a certain phenomenon. In this paper, after having analyzed data relating to the mobility habits of University of Foggia (UniFG) community members and deter- mined their emissions of pollutants, we applied machine-learning techniques to these data to estimate the quantities of pollutants (in a certain time period) produced by new subjects not present in the data sets, using very little information. In this way, we developed a method that the university could apply to inform new students about what their emissions of pollutants could be in the near future, through several easily obtainable features. This method could allow the UniFG Rectorate to improve its sustainable mobility policies by encouraging the use of methods that are as appropriate as possible to the users’ needs. In addition, any public/private organization outside the academic environment can use the method, due to the need for little information.


Introduction
The problem of sustainable mobility choices is involving more and more public and private organizations. As de ned by Banister (2008), implementing the sustainable-mobility paradigm requires intervention by a series of people who can develop policies capable of reducing car dependence.
Sustainable mobility plays a fundamental role (Jeon and Amekudzi 2005) in community development, from economic, environmental, and social points of view. In particular, a sustainable transport system should guarantee e cient travel to all users and an ecological means of transport while maintaining certain levels of economy and public health (Weichenthal et  Decision-makers' choices recently have been oriented toward reducing pollutant emissions in the most congested urban areas by introducing green vehicles such as bicycles or scooters; however, these means of transport represent only a part of cities' environmental development (Simons et al. 2014). The decisions taken include the introduction of car-sharing systems, integrated transport systems, bikesharing systems, bus prioritization, and free public transport ( The objective of this paper is to develop a system for evaluating the mobility choices of community members (in this speci c case, the members of the University of Foggia) based on the distance covered and the types of vehicles used. Such a system would allow each member to know whether or not he/she is respecting the objectives set by the community policy-maker and consequently to try to improve himself/herself to meet the established parameters. In the particular case in question, we develop a system based on machine-learning (ML) algorithms with which to evaluate how sustainable the means of transport are of UniFG members and to receive the evaluation directly through a green/red sticker on the management software used by the university.
The present paper is structured as follows. In the following section, the variables present in the data set collected from the academic community are described, and the most important features are analyzed through factor analysis and machine learning. Section 3 presents a possible application of these features to determine the emission levels of each UniFG member. In section 4, some conclusions are drawn.

Application of Machine Learning
Researchers recently have used ML algorithms to solve many problems, ranging from the industrial to the economic spheres. Within the world of sustainable mobility, several authors have proposed models whose objectives range from determining the shortest route with a means of transport to determining which means to choose based on certain variables (called features). For example, using a Random In this paper, we will use of ML algorithms to predict some of the missing data necessary to estimate the emissions of UniFG members. In particular, this method will allow us to exploit data that are easily obtainable through a very short questionnaire in order to estimate other missing data, which would be more di cult to obtain for each community member in practice.

Emissions description
The rst step in understanding the levels of total annual atmospheric emissions among UniFG members is to consider the largest possible number of variables detected during the survey (Cappelletti et al. 2021). In particular, we will consider the following parameters: The distance in kilometers that the respondents claim to travel; The frequency at which the respondents go to their reference facility per weekly; Whether it is the hot or cold season, which affects both the number of weeks of activity and different modes of travel; and The mode of transport used. Our goal was to understand which variables to use (reducing the number) so that the UniFG can improve its mobility services while reduce emissions.
The rst step of the analysis was to transform the categorical variables into numeric variables, as obtained through LabelEncoder in sklearn (through Python). Any type of analysis requires standardized data (which we made all numerical). We used StandardScaler to eliminate the "force" linked to the scale, which is always present in sklearn, since we did not need to maintain the ordering between the points (the data were not ordinal). Analyses were conducted regarding the quantity of atmospheric emissions by the entire UniFG academic community. For this reason, trip duration from one's home to the department (Min) was eliminated a priori, as were the second, third, and fourth choices of alternative sustainable mobility solutions (Second-, Third-, and Fourth-choice rent) since we assumed that the rst choice was the most representative. At this point, the data set comprised 16 features describing transportation by the academic community.
We rst tried to reduce the dimensionality of the data set through factor analysis, considering the 16 features. This analysis also allowed us to highlight the links between the different features (described through the loadings). This factor analysis was carried out with IBM SPSS using principal component analysis. We obtained eight factors, which explained 79.14% of the total variance (with eigenvalues > 0.8). As we can see in Table 2 (which highlights the component matrix), the variables with the highest loadings (> 0.5) were related to the means of transport, the characteristics of these means, and the number of passengers, especially for the rst factor loadings. For the second factor, the variables with highest loadings were related to the respondents' characteristics. In this way, we replaced the features with a reduced number of factors that describe the entire data set, while accepting a certain level of information loss. However, the factors resulting from factor analysis are di cult to interpret, especially for possible analyses directly linked to mobility systems.
The main goals for the university are to understand mobility habits and to improve, where possible, by moving toward sustainable means of transport. However, given the data set's composition and the presence of variables of different types, we are able to carry out further analyzes manually selecting some variables of interest through ML techniques.

Analysis through Machine Learning
The data set includes variables that answer different questions about the type of respondent (such as the respondent's role and department), his or her transport habits, and his or her eating habits (lunch). Classi cation was the reference task for the analyses. Because these are multi-class classi cations, we used logistic regression with the one-vs-rest (OvR) training algorithm in Python. In all cases, the training and test sets were divided into 70% and 30% portions, and a Strati ed 10-fold Cross-Validation was performed to avoid over-tting problems. The predictive analyses were as follows:

Results
We were able to combine the previous elements to try to improve sustainable mobility at UniFG. Using the previous results, we developed a system through which the university can determine the amount of CO 2 equivalent for a certain semester when registering new subjects and for those already enrolled, based on small amounts of data entered in the registration system, as follows.
Information on age, gender, department, role, and residence for each subject can be acquired through the UniFG member management system (ESSE3 platform). However, the city of residence is not a functional variable since a person residing in a certain city could decide to move to Foggia. One solution could be to periodically update the ESSE3 section relating to the domicile, in order to track how the issue quantity can vary by subject over a certain time period. In this way, because we are aware of the department to which each subject belongs and his or her domicile, the distance between these two points represents the distance in km that the subject will travel with a certain vehicle. With regard to the means of transport used, if the person uses a means other than a car, the subject has chosen (whether for reasons of necessity or as his or her own choice) a sustainable mobility system. On the other hand, if the subject uses a car, then we must calculate the expected emissions.
To obtain information on the use of the car, it is possible to enter a very short questionnaire to ESSE3 with a single question, such as, "Do you use a car to go to the university? Yes/No." This question may be mandatory for new subjects wishing to enrol at the university but voluntary for those already enrolled, thus minimizing requests to guarantee a large number of answers. After obtaining information on car use, we could use the ML algorithms de ned above to predict the vehicle's type of power supply and year of registration with good accuracy. In this way, we could determine the CO 2 -equivalent emissions produced by subjects who use a car simply as the product of the distance travelled and the emission value returned by the GaBi for each of the EURO classes, to which we would include the subjects according the vehicle's year of registration and type of power supply. The only variable on which we must hypothesize (always with a view to avoid burdening the questionnaire to be submitted through ESSE3) is the number of days when the subject goes to the university. However, on the basis of the training data set, we can assume that subjects who live in Foggia tend to go to the university four times a week, on average, while for those not who do not in Foggia, the number of trips tends to decrease as the distance increases (we can assume that for a distance of up to 50 km, the number of trips is three times a week, versus twice a week for a distance over 50 km). We can determine the trips per semester as the product of the weekly trips and the number of weeks in a semester (known a priori) and multiply this time value by the previous equivalent CO 2 value, to obtain the emissions expected in a semester.
Through this calculation, the university could exploit the ESSE3 platform to sensitize those belonging to the academic community to the use of alternative means to cars, where possible. In particular, after setting an upper bound of emissions, we could display a sticker upon access to ESSE3 that expresses the user's expected emissions value for the following semester, with a green sticker if the emissions are below the upper bound (as shown in Figure 2), yellow if they are higher than the upper bound but within a certain deviation from this limit, and red if the expected emissions are much higher. This calculation of prospective emissions, albeit probabilistic, could allow the UniFG to sensitize its members to replacing their cars with more sustainable transportation means. Furthermore, through the latest ML model, it also would be possible to predict the transportation means used in the following semester (the summer semester, if the rst use occurs in the winter one), so that these probabilistic forecasts can be improved by asking users seasonally whether they still use a car.

Conclusions
In this paper, we have developed a system that will allow the UniFG to inform its members of the level of pollutants they will emit into the atmosphere, based on their transportation habits. After training a ML system through a single question to be submitted to the academic staff, it will be possible to inform each member of how his or her means of transport affects the environment. In this way, it will be possible to sensitize the members of the academic community to the use of sustainable means of transport and to direct the rectorate's choices toward new types of incentives. Such a mechanism is not limited to the UniFG, it could be extended to any type of public or private organization. It would be su cient for the organization's management to submit a few simple questions to its members (in order to guarantee the greatest number of answers) in order to determine the respondents' transportation habits and emissions levels. With this knowledge, it would be possible to better regulate the incentive policies for using certain types of sustainable means of transportation. Figure 1 Heat map of correlations Example of display of the equivalent CO2 level via the ESSE3 platform