Efficient and Robust feature selection technique for detecting DDoS & Port Scan Traffic
The tremendously progressive demand for protecting and securing
systems from being attacked and exploited requires highly dedicated
efficient and effective software and hardware to counter, contain, and
prevent attacks. Intrusion Detection Systems (IDS) are intended to
find attacks, removing their impacts, and lowering the likelihood of
any future attacks. IDS is the core of any security system in which
they play an increasingly essential role in preventing any high-level,
sophisticated, and ever-increasing malicious activities. Among the
key aspects of machine learning, dimensionality reduction techniques
is a crucial step to managing the ”curse of dimensionality”, as Bell man called this scenario. Feature extraction and feature selection are
the most used techniques in dimensionality reduction. The objective
of feature selection techniques is to minimize redundancy and maxi mize feature relevance, resulting in improved accuracy, reduced risk
of overfitting, reduced testing time, improved data visualization, and
improved explainability of our model. In addition, it is statistically
proven that there is an optimal number of features to be used for
each specific task when performing a Machine Learning task. If more
variables are added than those strictly necessary, our performance of
the model will actually decrease, due to the added noise. Determining
the optimal number of variables is the real challenge. Our objective
in this research is to propose a powerful feature selection technique
capable of detecting and classifying attacks, particularly when deal ing with big data, real-time and dynamic data. Also to determine
the best metrics to evaluate our model performance in terms of detec tion performance and in terms of computational performance. And
to develop a feature selection strategy and propose a methodology for
future research. In our experiment we conduct a standard supervised
classification task based on the labels provided in the CICIDS2017
DDoS and Port Scan datasets in the training data to train the model
to predict the labels from the selected features based on a feature se lection process. We started first by calculating statistics and illustrate
figures to find patterns, anomalies, trends, and relationships within
the data to learn about the data provided. We then extract the most
important and relevant features and evaluate them using a random
forest machine learning model on the DDoS dataset and on the Port
Scan dataset. The model results then compared to some of a previous
similar experiment. The proposed model achieves high-performance
scores in terms of detection metrics where a small number of the most
relevant and important features have been identified and evaluated.
The proposed model had a significant impact on improving the overall
model performance.