[1] AMPL-Intro


Preamble
supervised learning:
Learn to predict target values from labelled data
- classification (target value are discrete classes)
- regression(target values are continuous values)
unsupervised learning:
Find structure in unlabeled data
- clustering(find similar instances in the data)
- outlier detection(find unusual patterns)
Machine Learning Workflow
- Representation
- Evaluation
- Optimization
An example of machine learning problem
The input data is a table
A KNN algorithm :
- A distance metric
Euclidean(Minkowski with p = 2) - Number of neighbors to look at
- Optional weighting function
- How to aggregate the classes of neighbor points
Simple majority vote
KNN process:
- Import required modules and load data file
- Create train-test split
x = fruits[{'mass','width','height'}] y = fruits['fruit_label'] X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)
- Create classifier object
from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors = 5)
- Train the classifier(fit the estimator) using the training data
knn.fit(X_train,y_train)
- Estimate the accuracy of the classifier
knn.score(X_test,y_test)
- To predict
fruit_prediction = knn.predict([{20,4.3,5.5}]) lookup_fruit_name[fruit_prediction[0]]
- Plot decision boundary
from adspy_shared_utilities import plot_fruit_knn plot_fruit_knn(X_train,y_train,5,;uniform)
Reference: https://www.coursera.org/learn/python-machine-learning/home/week/1