Identification of Information Security Threats Using Data Mining Approach in Campus Network

Comprehensive risk assessment implementation in an organization is crucial in order to safeguard valuable organization assets and to minimize information security threats. Thus, inadequate information security risk assessment may result in compromised confidentiality, integrity, and availability of the information system due to unauthorized access particularly in the education domain. Therefore, the objective of this paper is to identify several information security threat risks related to the University Information System. Hence, data from intrusion prevention system (IPS) has been collected from the selected university campus network. Moreover, under Python language, Anaconda is used as a machine learning environment to do the data analysis of the collected data. Basically, the analysis of the university campus network data identified various types of information security threats such as database-related attacks. The contribution of this research is to guide the network administrator to develop an appropriate incident response plan based on the identified threats from the risk assessment activity.


Introduction
Currently, information security is critical assets for an organization to protect their information in their business. Information security is defined when admin protecting the information, hardware, storage and network system from vulnerabilities and threats (1). There are challenges involves in implementing information security in an organization. Data inside the systems is valuable and critical to the business. As discussed by (2) stated that the core principles of information security are Confidentiality, Integrity, and Availability (CIA) that forms the basis of asset protection, authenticity, accountability, reliability, and non-repudiation. Authors also stated that the CIA is the three major principles of information protection. Compromising these principles leaves systems in risk. According to (3) emphasize that first among the targets in cyberspace are colleges and universities because they contain resources with open access. Therefore, it is needed to conduct a study in a university information system (UIS). As mention by (4) stated that organizing knowledge of systems behavior, network admin have a better chance of identifying the origins of risk events come whether from within  (5) A study from (7) emphasized that the difficulty in implementing information security in higher education because lack of system integration and understanding across different departments and systems. Therefore, a systematic approach is needed to assess the risks in UIS, so that the risks can be mitigated in time in order to protect the assets. The research about university information security threats also been discussed by (8) in the report. The author emphasizes that many university users do not understand basic information security threats. Universities users neglecting about a compromised system that can be used to attack another system through a computer network. They also overlooked on password protection, which can lead to identity theft. As in university network, we have to increase campus users' awareness regarding the confidentiality of our systems and networks content. As discussed by (9) threat model can help network admin to understand what kind of rival they are securing it against. Security is not only about developing technical solutions, but also users have to understand which information is critical to campus network and what is important to protect.
Higher education institutions are susceptible to cyber-attacks. Elements such as open networks, large volumes of data and freedom of public access expose them to a variety of cyber threats and risks.
According to (8) mention that the Computer Emergency Response Team reported that the number of network incidents in the university campus in 2001 was 52,658, which jumped from 21,756 the previous year. The increasing of threats in today's universities because of the lead of technological advancement (10). The author also discussed access technology in the university campus results in a vulnerable computing environment with more security threats. University campuses are demonstrating some advancement technology by providing facilities like extensive Wi-Fi support, online learning using lecture capture software, digital library, classroom virtualization and web conferencing. All these advancements make University's computing environment particularly vulnerable because in contrast to hacking targets like banks, college and university computing environments are often large open networks. A discussion from (11) stated that to define and identify critical infrastructures at the national level, risk assessment can be conducted in order to manage their own infrastructures to plan a decision-making. Figure 2 shows data breaches that affected the university information system. A discussion on UIS threats also been discussed by (10) where authors explained that campus network mainly suffers following security threats as groups below: 1.
Phishing, ransomware, and malware: Cybercriminals uses emails or Web accounts that spoof official mailings for financial gain. University's young students are at most of being the victim of a phishing attack that results in malware or ransomware downloads.

2.
Wi-Fi: University which provides Wi-Fi access on the University campus which is great in technology advancement view, but it can cause security problems.

3.
Viruses Spreading through Social Media: Young adults of University are the most passionate users of social media like Facebook, Twitter and YouTube. This implies that in University's network malware can spread through social media sites. 4.
Mobile Devices: Students are early adopters of technology, and new devices are frequently visible in campus; from iPads to new android phones, daily new launched devices are having upgraded versions of operating systems that can easily be infected by the smart attacker and also ready to infect University's network.

5.
Embedded Devices: Embedded connectivity improves the risks for viruses and more threats to the network.
Therefore, it is important to conduct research on risk assessment in the university information system. It also significance to develop and implement proper security controls based on the results of their internal risk assessment and vulnerability assessment. Both approaches can be used to improve efficiency towards achieving desired security levels.
Information Security Risk Assessment is an on-going process of discovering, correcting and preventing security problems. The risk assessment is part of a risk management process designed to provide appropriate levels of security for information systems. Authors (13) stated that data is important in risk assessment process to do identification and implementation of decision making. Quantitative description of specific data using statistical models provides tools for translation and provide method in the conventional risk assessment paradigm. Other researchers (14) studied on assessing risk using data analytics also stated that data analytics is a promising way to turn information into outcomes, enhance decision-making, make data-driven discoveries, minimize risk, and have a valuable insights that would otherwise remain hidden. Adaptation of data analytics in business give a major impact in doing the risk assessment. As been discussed by (15), data analytics is a techniques to process relevant data, which are to be tied with suitable capability to deal with unexpected events and provide the right support to enable risk management. From their research shows that how data analytics is relevant in doing risk assessment as tools to make accurate decision for securing campus network. Authors from (16) claimed that network infrastructure is secure. However, rather than wondering if the infrastructure would be attacked from malicious actions, IT administrators shifted towards trying to understand when it will happen, and what the consequences will be to prevent before the malicious attacked. As mention from (17) in their study, they conducted from unified risk assessment to personalized risk prediction. The process move from fixed information to relationship between human factors. They found that the risk prediction helped them to get the high-accuracy model in conducting a risk assessment. As in our research, we implemented data mining approach to have the data analysis in assessing the risk in campus network. As mentioned by (18) data mining turns a large collection of data into knowledge. Process of data mining is when researchers sort massive data sets in identifying patterns and establish relationships to solve problems through data analysis. We capture raw data from IPS and analysis them into useful information to network admin. Figure 3 explained about a general road map on pattern of data mining research. Mentioned by (18), most research studies mainly address three pattern mining aspects which are patterns mined, mining methodologies, and applications. Some studies, however, integrate multiple aspects; for example, different applications may need to mine different patterns, which naturally leads to the development of new mining methodologies. As in our research, we integrate more than one aspects in doing the data mining for this research.  (18) According to (19) stated that data mining techniques have the capability to find hidden pattern from the secondary data in large databases. It leads in creating a prediction model for desired output. Data mining also helped in finding the accuracy of the algorithms. A research from (20) explained that they used three layers of analysis before decide to make any plan on their network. The methods they used are to develop structured firewall analysis, re-use in other firewall projects and presented for use by others with similar challenges. Benefit from this approach, they learned the attacks behaviour from analysis that they did and operation have been useful in both the perimeter and their secondary data centre firewalls projects.

Methodology
We have conducted an experiment from IPS data packet. The data is collected from IPS in local university network to analysis the behaviour of attack. There are about 5,000 threats log from January 2017 until December 2017. Python language is used as a tool to do the experiment. Anaconda is being used as a machine learning environment to do the data analysis. Anaconda is an open source distribution of the Python programming languages, one of data science platform and it is used in machine learning related application aiming at simplifying package management and deployment. During the experiment, we used some of the anaconda libraries such as pandas, matplotlib, seaborn, urllib.request, json, and socket. In this experiment, we used tracking patterns data mining where this is the most basic techniques in data mining in learning to recognize patterns in data sets. This method is usually a recognition of some abnormality in data happened at regular intervals. Next discussion we presented the output from the data analysis that we conducted using the log from IPS. Figure 4 below explained research procedure conducted in this research.

Attack Analysis
From attacks capture in IPS, we have identified attacks that capture into the university network. In university network, there are some application from these systems which are financial portal, staff portal, leave portal, e-mail system, complaints system, facilities system, support system, student information management system, student's affair system, library system and learning management system. We have analysis packet form the network. Below is the list of top 15 attacks capture from university IPS. The highest number of attack is on Linux Kernel nfsd. This is a remote denial of service vulnerability exists in the Linux Kernel. The vulnerability of this is due to an implementation flaw which may result in a buffer overflow in the NFS subsystem of the Linux Kernel. The first 7 attacks from the graph are associated with database-related attacks, which targeted both MySQL and Oracle database. This local university network used MySQL and Oracle database for storing their users' data.

Attack Category Analysis
From all attacks capture, they are in a high and medium severity to the university network. And most of the attack is done to exploit the network. An exploit takes advantage of a weakness in an operating system, application or any other software code, including application plug-ins or software libraries.
Exploits are ultimately errors in the software development process that leave holes in the software's built-in security that cybercriminals can then use to access the software and, by extension, your entire computer. Exploits are commonly classified according to the type of vulnerability they exploit, such as zero-day, DoS, spoofing and Cross-Site Scripting (XXS). From the graph also shows how suspicious packet capture from IPS higher to the code-execution group. Code execution vulnerabilities occur where the output or content served from a Web application can be manipulated in such a way that it triggers server-side code execution. In some poorly written web applications that allow users to modify server-side files such as by posting to a message board or guestbook. It is sometimes possible to inject code in the scripting language of the application itself.

Fig. 5 Categories of Attack
In Figure 5 shows categories graphs in describing attacks in campus network. The discussion is observed from 3 group of attacks level, which are the severity, attack category and attack subcategory. The first graph show about level of severity of attack based on the policy setting up from the university environment. Second graph shows that exploit is the highest contribution of attack category in campus network and last graph shows other sub-category of attacks which code execution contribute the most in this campus network.

Network Port Analysis
We analysed network port attacker attempted to go inside the server. The attacks are attempted the server through port 443 and 3306. Port 443 is usually associated with TLS (https), and 3306 is usually with MySQL/MariaDB database. Campus network use secure sockets layer (SSL) and transport layer security (TLS) to encrypt their internet communications. The encryption protocols are utilized to ensure privacy and ensure data integrity. Unfortunately, the encryption protocols secure all application data, whether it is legitimate or malicious. As university aware about encrypting network traffic to protect data from potential attacks or exposure, attackers recover their Secure Sockets Layer/Transport Layer Security (SSL/TLS) information to hide their malicious activities. The port 3306 is a port associated with MySQL/MariaDB database. MySQL is the world's most popular open source database system and MariaDB is the world's fastest growing open source database system. A common attack at this port is the brute forcing of the root password for the MySQL database.

Time Analysis
In this topic, we discussed about time of attacks through a year 2017. Figure 6 shows the analysis of time of attacks for 24 hours' time. From the graph, we can conclude that time of attack mostly happen from 12 midnight to 6 morning. The university network is not active at this time. Attackers prefer to do their activities at this time scale to prevent from being notice by the network administrator. Another time scale that attackers attack the university's network is at 7 morning to 12 afternoon. At this time, insider users may become potential attackers in contributing this numbers of attacks.   From this graph, we can see the attack happen during semester lecture period. One of the reasons this period is high frequency because of the insider users attack or spoof as legitimate campus users. An insider attack is a malicious attack committed on a network or computer system by a person with authorized system access. Insiders that perform attacks have a distinct advantage over external attackers because they have authorized system access and familiar with network architecture and system policies and procedures. In addition, less security against insider attacks because organizations focus on protection from external attacks. Among all top five attacks observed from those 4 different date, we analyzed that attacks associate with Oracle & MySQL database are the highest number of attempted from the attacker. We proposed to the administrator to update the software patches since attackers exploit the vulnerabilities from this type of databases. We also found an attack, which called EsteemAudit with highest hit. This attack targeted Windows Remote Desktop system on Windows XP and Windows Server 2003. This attack happen because of Microsoft stop releasing security updates to these two OSs.

Conclusions
As in University Information System, there are massive amount of personal information such as lecturer information, student information, staff information, exam information and the open nature of the higher education online learning environment also likely a target for a security breach. As a network administrator concern, developing a comprehensive incident response plan is important by conducting a risk assessment activity. It is important to understand our universities potential risks. In this paper, researchers proposed to use data from IPS as one of the steps in creating a library of risks. After narrow down threats list, we use machine learning software to take the next step of data mining and data analysis. By looking and the data from IPS, researchers used machine learning tool to track data patterns, and analyzing trends which lead to new or emerging risks.
The risk assessment is an important part of a risk management process to secure information systems. However, the protection of network system has become one of the challenges to the organizations especially with the increase of cybercrimes in university environment. Institution are expose to the cyberattack and this become a source of financial liability to universities. It is important to analyze our current security posture in campus network.