Project Details

Machine Learning

Development Tools:

The project was done using Python and Jupyter Notebook. Some libraries include, but are not limited to:

Project-related information:

For in-depth information on the complete project, click the "Visit" button above.

This is a project where I had to implement my own code that performs nested cross-validation and the k-nearest neighbour ML algorithm, build confusion matrices, and estimate distances between data samples.

The purpose of this project was to help me:

Get familiar with common Python modules/functions used for ML in Python
Get practical experience implementing ML methods in Python
Get practical experience regarding parameter selection for ML methods
Get practical experience in evaluating ML methods and applying cross-validation

The K Nearest Neighbor algorithm is a supervised machine learning algorithm that relies on labelled input data to a function that assigns the appropriate label to new unlabelled data.

There are a couple of basic steps that need to be taken into account in order for the algorithm to work and these are:

Calculate the distance between the unlabelled data and the labelled data. (Euclidean Distance in my implementation)
Add the distance to an ordered collection.
Sort the data.
Pick the first K entries from the ordered collection.
Get the labels for the selected points.

Below is a screenshot of the grid plot with the different properties of the wine dataset that was used in the project.