Machine Learning in Cyber Security
It is almost impossible for any Data Analyst to look at a time series chart of network traffic & draw any conclusion from it. Even if it is possible for some it is impossible for all humans to do it 24x7. Too many false alerts cause analysts to simply ignore some of what they’re seeing.
Machine Learning makes it possible to view and analyze huge data with multiple dimensions. It makes it possible to do operations for a variety of use cases. Block access to hackers stealing company data, stop the hijacking of computers, and whatnot.
K-means clustering
We use the k-mean clustering algorithm, which separates data along with any number of axes. This is an unsupervised model because there are no labels, only features. So, we don’t need to train the model, as there’s nothing to predict. Instead, we are observing.
Traditional security systems such as Intrusion Detection Systems (IDS) are not capable of handling such a growing amount of data set in real-time. Cybersecurity analytics is an alternative solution to such traditional security systems, which can use big data analytics techniques to provide a faster and scalable framework to handle a large amount of cyber security-related data in real time.
k-means clustering is one of the commonly used clustering algorithms in cybersecurity analytics aimed at dividing security-related data into groups of similar entities, which in turn can help in gaining important insights about the known and unknown attack patterns. This technique helps a security analyst to focus on the data specific to some clusters only for the analysis. To improve performance, k-means can exploit the triangle inequality to skip many point-center distance computations, without affecting the clustering results.
Advantages of clustering for crime pattern analysis
- This approach helps us to analyze the historical crime rates and enhance the crime resolution rate of the present.
- Take actions to prevent future incidents by using preventive mechanisms based on observed patterns.
- Reduce the training time of the officers that are assigned to a new location and have no prior knowledge of site-specific crimes.
- Increase operational efficiency by optimally redeploying limited resources to the right places at the right times.
Limitations of crime pattern detection
- Crime pattern analysis can only help the detectives and not replace them. It is up to the human experts to interpret what the clusters are telling us.
- Data mining is sensitive to the quality of input data and that can be inaccurate sometimes. Missing information can also cause errors.
- Mapping data mining attributes is a difficult task and hence it requires a skilled data miner and a crime data analyst with good domain knowledge.