Thesis Details

Efficient and Robust feature selection technique for detecting DDoS & Port Scan Traffic

Abed Naboulsi

Submission Year : 2020

Abstract

The tremendously progressive demand for protecting and securing systems from being attacked and exploited requires highly dedicated efficient and effective software and hardware to counter, contain, and prevent attacks. Intrusion Detection Systems (IDS) are intended to find attacks, removing their impacts, and lowering the likelihood of any future attacks. IDS is the core of any security system in which they play an increasingly essential role in preventing any high-level, sophisticated, and ever-increasing malicious activities. Among the key aspects of machine learning, dimensionality reduction techniques is a crucial step to managing the ”curse of dimensionality”, as Bell man called this scenario. Feature extraction and feature selection are the most used techniques in dimensionality reduction. The objective of feature selection techniques is to minimize redundancy and maxi mize feature relevance, resulting in improved accuracy, reduced risk of overfitting, reduced testing time, improved data visualization, and improved explainability of our model. In addition, it is statistically proven that there is an optimal number of features to be used for each specific task when performing a Machine Learning task. If more variables are added than those strictly necessary, our performance of the model will actually decrease, due to the added noise. Determining the optimal number of variables is the real challenge. Our objective in this research is to propose a powerful feature selection technique capable of detecting and classifying attacks, particularly when deal ing with big data, real-time and dynamic data. Also to determine the best metrics to evaluate our model performance in terms of detec tion performance and in terms of computational performance. And to develop a feature selection strategy and propose a methodology for future research. In our experiment we conduct a standard supervised classification task based on the labels provided in the CICIDS2017 DDoS and Port Scan datasets in the training data to train the model to predict the labels from the selected features based on a feature se lection process. We started first by calculating statistics and illustrate figures to find patterns, anomalies, trends, and relationships within the data to learn about the data provided. We then extract the most important and relevant features and evaluate them using a random forest machine learning model on the DDoS dataset and on the Port Scan dataset. The model results then compared to some of a previous similar experiment. The proposed model achieves high-performance scores in terms of detection metrics where a small number of the most relevant and important features have been identified and evaluated. The proposed model had a significant impact on improving the overall model performance.

Undergraduate

Graduate

Abstract