Visual Programming using Orange Tool

Vandita Chapadia
3 min readSep 21, 2021

--

This blog is gives more understanding of Orange tool in various feature to data Visual programming by fetching dataset and splitting into test and train data for the model and apply various supervise learning and unsupervise learning algorithm for the best fit in the purpose of getting good accuracy and best score of similarity in output data.

Creating the Workflow

First we use File widget in the canvas and load the inbuilt heart disease dataset in the workflow.

  • Next send the input data to widget Data Sampler. Data Sampler selects a subset of data instances from an input data set. It outputs a sampled and a complementary data set.
  • Now send the sample data from Data Sampler to Test and Score widget. The widget tests learning algorithms. Different sampling schemes are available, including using separate test data. The widget does two things. First, it shows a table with different classifier performance measures, such as classification accuracy and area under the curve. Second, it outputs evaluation results, which can be used by other widgets for analyzing the performance of classifiers, such as ROC Analysis or Confusion Matrix.
  • The sample data from Test and Score is send to three different learning algorithms namely Neural Network, Naive Bayes and Logistic Regression.

Workflow created in Orange tool

Sampling using Cross Validation in Orange

Cross-validation splits the data into a given number of folds (usually 5 or 10). Cross-validation is primarily used in applied machine learning to estimate the skill of a machine learning model on unseen data. That is, to use a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model.

Effect of using Cross Validation on our models

With the use of cross validation with k value of 10, we can use all the input sample data and train and test on 10 different model which results in giving ,ore accurate outputs for the performance of our learning models. By using Cross-Validation, we are able to get more metrics and draw important conclusion both about our algorithm and our data. In the above image also get the comparison output between three different models based on there AUC score.

Split data in training data and testing data in Orange

To split the data into train and test dataset, we will send the 80% of the sampled data from Data Sampler as the train data and remaining 20% data as the test data.

Now get the comparison scores of the three different algorithms by testing on the train data. To do so double click on the Test and Score widget and choose the option of Test on train data there and get the scores for all the three algorithm.

Evaluation Results on Train Data

--

--

Vandita Chapadia
Vandita Chapadia

No responses yet