Introducing ProgramMatek’s WordPress Decision Tree Plugin

Plugin Overview

Version: 1.1.0
Author: Dataiku (Agathe Guillemot)
Released: 2019-11-21
Last Updated: 2023-05-11
License: MIT License
Source Code: Github
Issue Tracker: Github

Description

This innovative plugin provides a user-friendly web application that allows you to effortlessly create decision trees. Additionally, there are two visual recipes available – one for scoring datasets and another for evaluating datasets based on decision trees created using the web application.

With this versatile plugin, you can:

  • Build and save decision trees using the web application
  • Score datasets using saved decision trees
  • Evaluate datasets using saved decision trees

Please note that this plugin supports both binary and multiclass classification, but it does not currently support regression.

The web application is not only useful for exploratory analysis, but it also helps you generate new ideas for feature engineering and gain a deeper understanding of unfamiliar datasets. You can even implement specific business rules and compare the results to those obtained from machine learning algorithms thanks to the two recipes provided.

Moreover, the web application allows you to interactively bin data, making it an even more powerful tool in your data analysis arsenal.

Installation Instructions

You have two options to install the plugin:

  1. Download it directly from the Plugin Store.
  2. Download the plugin’s zip file and install it manually.

How to Use

Interactive Decision Tree Builder Web Application

Settings

To create and visualize decision trees, you can utilize this user-friendly web application.

Getting Started

Upon launching the web application, you are presented with two options: creating a new tree or loading a previously saved one.

See also  ProgramMatek: The Best WordPress Flashcard Plugin

Create or Load a Tree

  • Creating a new tree involves selecting a dataset and a target. Please note that datasets from other projects cannot be used.
  • Loading an existing tree involves selecting the file where the tree was saved. The file must be saved within the specified folder.

Sampling Parameters

The web application provides several sampling methods for datasets:

  • Head: Select the first N rows
  • Random: Choose a random sample of approximately N rows
  • Full: Select all rows (Note: This is not recommended for larger datasets)

You can also specify the sampling size (N) when using the Head or Random methods.

Building a Decision Tree

The web application operates in three modes: edit mode, tree visualization mode, and sunburst visualization mode.

Edit Mode

In edit mode, the decision tree builder opens with only the root node visible. You can gradually build the tree by creating splits.

Edit Mode - Root Node

Selecting a Node

Once a node is selected, the “Selected Node” section provides valuable information, including the probabilities of different target classes (in descending order) and the number of samples. For non-root nodes, the decision rule is also displayed. Additionally, if the selected node is a leaf (i.e., it has no child nodes), you can assign a label to it. Labels can be used for tagging, grouping predictions, or providing additional information.

Available Actions for Leaf Nodes

If the selected node is a leaf, you can view the list of features below the “Selected Node” section. You can filter this list using the search bar. Furthermore, you can add a split by selecting a feature.

Available Actions for Nodes with Splits

If the selected node already has splits, the feature used for the splits and the corresponding split list are displayed below the “Selected Node” section. You can add, edit, or delete splits as needed.

Creating Splits

Splits are rules that partition the data based on the value of a specific feature. The web application differentiates between categorical and numerical features, allowing you to create corresponding splits.

See also  An Overview of the Scheps 73 Plugin

For categorical features, you can create multiple splits for various values. Each split creates a different node based on the selected values.

For numerical features, splits can be defined based on specific value ranges. This allows you to create multiple nodes based on different ranges.

Please note that missing values in numerical features are replaced with the mean.

Split Editors

The split editor provides an intuitive interface for creating and editing splits. The interface varies based on the feature type (categorical or numerical).

  • Categorical Features: The upper part of the split editor displays a stacked histogram representing the target distribution across different feature values. You can select values from the list to create or edit a split.
  • Numerical Features: The split editor includes a stacked histogram of the target distribution across feature values. Additionally, you can switch between treating the feature as categorical or numerical. To create or edit a split, simply enter a numerical value.

Auto-creation of splits is also available for both categorical and numerical features. The web application backend can automatically compute splits based on a specified maximum number.

Please note that auto-creation is not available under certain conditions, such as when the node already has splits or when the target distribution is the same across feature values.

Tree Visualization

The tree visualization mode allows you to view the entire decision tree in a full-screen format. Each node in the tree is accompanied by a tooltip displaying key information, including probabilities and sample sizes.

Tree Visualization Mode

Sunburst Visualization

In the sunburst visualization mode, the decision tree is represented as a sunburst diagram, with each arc corresponding to a node in the tree. Hovering over an arc displays the decision rule, and clicking on an arc centers it in the sunburst. The size of each arc is proportional to the number of samples it represents.

See also  The Ultimate Guide to Distortion Plugins for Music Production

Sunburst Visualization Mode

Score Recipe

Use the score recipe to apply the decision tree as a prediction model to a dataset. This recipe requires a dataset that includes all the columns used by the decision tree. If the dataset contains missing values for a numerical feature, they will be replaced with the mean of that feature as calculated from the dataset used to build the tree.

The output of this recipe is a scored dataset that includes the following additional columns:

  • Prediction
  • Probability of each class
  • (Optional) Label

Score Recipe

Settings

  • Decision Tree: Specify the name of the JSON file where the decision tree is stored (remember to include the .json extension).
  • Chunk Size: Choose the number of rows to process in each scoring batch.

Evaluate Recipe

The evaluate recipe allows you to evaluate a dataset using the decision tree as the prediction model. Similar to the score recipe, this recipe requires a dataset that includes all the columns used by the decision tree and the target column. If there are missing values in the dataset for a numerical feature, they will be replaced with the mean of that feature, as computed from the dataset used to build the tree.

The output of this recipe is an evaluated dataset that includes the following additional columns:

  • Prediction
  • (Optional) Probability of each class
  • Whether the prediction was correct
  • (Optional) Label

Evaluate Recipe

Settings

  • Decision Tree: Specify the name of the JSON file where the decision tree is stored (remember to include the .json extension).
  • Chunk Size: Choose the number of rows to process in each evaluation batch.
  • Output Probabilities: Toggle whether to include the probabilities of each class in the evaluated dataset. This option is selected by default.
  • Filter Metrics: Choose whether to compute a subset of metrics in the metrics dataset. If unchecked, all available metrics will be computed.

Available Metrics

Please note that the metrics dataset provides information such as AUC, recall, precision, accuracy, log loss, hamming loss, and calibration loss.

For more information on ProgramMatek and to access this incredible WordPress Decision Tree Plugin, visit ProgramMatek.