# Creating Distributions Using Machine Learning

This method of creating a distribution is available from Simul8 2022 onwards, allowing you to use machine learning algorithms to create a distribution for use in your simulation.

To create a distribution using Machine Learning algorithms, you can either use R or Python.

If using R, the ML algorithm must be part of a function which is saved as an .RDS file. The function must also be able to read in a dataframe, with the parameter name in column 1 and the parameter value in column 2. For Python, the algorithm must be saved in a .py file and the function must be called prediction. It also must take in two lists as the arguments.

Note: for both, your Machine Learning algorithm must return a number.

### Setup

To create a distribution, navigate to the ‘Data and Rules’ tab and select ‘Create Distribution’. Next, give your distribution a name, then select the option ‘Machine Learning’ and click ‘Next’. By default, Simul8 will use R. If you want to use Python, click on Advanced Settings.

In the Setup tab, click on Browse and select the file which contains your algorithm. Then add the simulation parameters the algorithm needs. Click Add, this will open a new dialog. Give you parameter a name and a value – the value will usually be a label, spreadsheet location or object property.

The name should come from the variable used to train the algorithm. It is important that the spelling used is the same as in R or Python when training the algorithm. As always in Simul8, this distribution can now be selected in many places in your simulation, e.g., for timings, breakdowns, batching out etc..

### Tutorial using R

In this tutorial we will show you how you can use Machine Learning to control the timing of a checkout counter in a simulation.

What you will need to complete this tutorial:

R

#### Step 1: Create your Machine Learning algorithm

We will use a Decision Tree to create a ML algorithm based on the data in the CheckOut_Data file. Open R and copy and paste the script below into your R console, making sure you update the directory to where you saved the CheckOut_Data.xlsx file, then Run the script.

Note: make sure each folder in your directory is separated by two backslashes (\\)

library(rpart)
library(rpart.plot)

#change this directory to one where you have saved CheckOut_Data.xlsx

path = (paste(directory,“\\CheckOut_Data.xlsx”,sep = “”))
set.seed(1234)
tree = rpart(Time ~., data = DTData)
rpart.plot(tree)

path = (paste(directory,“\\GetTimeDT.rds”,sep = “”))
saveRDS(tree,path)

#### Step 2: Create a prediction function

Open a new R Script, copy and paste the below script into the console, and update the directories. Now run the script.

Timing = function(df){
Return = (df[1,2])
Items = (df[2,2])
#change this rds file to the same .RDS you have just created
data = data.frame(Return, Items)
return(predict(algorithm,data))
}

#change this directory to a location on your machine. this is the file you will use for Simul8

saveRDS(Timing,“C:\\Users\\yourname\\Desktop\\GetTimeRF.rds”)

This creates your .RDS file that you will choose in your simulation (see next step).

#### Step 3: Apply the Machine Learning algorithm in the simulation

Open the Check Out simulation, navigate to the ‘Data and Rules’ tab and select ‘Create Distribution’. Next, give your distribution a name, then select the option ‘Machine Learning’ and click ‘Next’.

Click on Add and enter the parameters. Type the Name (Returns), then click on the Value field and onto the button to its right – this will open the Formula Editor. Choose Labels and double-click on lbl_return, then click OK.

Now do the same for Items. Enter the name as Items, open the Formula Editor from the Value field, choose Labels and double-click on lbl_items. Click OK. Then go to the Check Out Activity, select the distribution on the timing and select the new Machine Learning -based distribution as the timing.

Reset and run your simulation. The timing of the Check Out Activity will now follow the ML algorithm we have created.

### Tutorial using Python

In this tutorial we will show you how you can use Machine Learning to control the timing of a checkout counter in a simulation.

What you will need to complete this tutorial:

Python

#### Step 1: Create your Machine Learning algorithm

Open your preferred Python interface, such as Jupiter notebook or Visual studio, in this example we will use Jupiter. We will use a Decision Tree to create a ML algorithm based on the data in the CheckOut_Data file. Copy and paste the script below into your notebook.

#First, we need to load in the packages needed for a decision tree

import pandas as pd

from sklearn import tree

from sklearn.tree import DecisionTreeClassifier

import pickle

#Then we lock in the file path, when saving this as a py file by using the download section remove the “” as this is only needed in Jupiter notebook

filepath = os.path.dirname(os.path.abspath(“file”))

# Read in the data and save as a name, in this case df for dataframe was used

features = ['Return','Items']

X = df[features].values

y = df['Time']

# Then make the model based on the x and y data arrays in this case a decision tree

from sklearn.svm import SVC

from sklearn.pipeline import make_pipeline

from sklearn.preprocessing import StandardScaler

dtree=clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))

dtree=clf.fit(X, y)

dtree = DecisionTreeClassifier(max_depth = 2,max_leaf_nodes=10)

dtree = dtree.fit(X,y)

# Save the tree as a sav file which we will use in the prediction function

filename = filepath+r'\TimeDT.sav'

pickle.dump(dtree,open(filename,'wb'))

#### Step 2: Create a prediction function

Now open a new Script, copy and paste the below script into the console, and run the script. Remember to change (file) to (“file”) if running in a Jupiter notebook. For Simul8 to be able to run it save the scripts as a .py file.

import pickle

import os

def prediction(list1,list2):

filepath = os.path.dirname(os.path.abspath(file))

filename = filepath+r'\TimeDT.sav'

return result[0]

When using Python, the script to create the prediction model for timing data is shorter than in R. However, to work correctly it is necessary that Python is installed directly onto the machine together with all necessary packages through the Command Prompt.

#### Step 3: Apply the Machine Learning algorithm in the simulation

Open the Check Out simulation, navigate to the ‘Data and Rules’ tab and select ‘Create Distribution’. Next, give your distribution a name, then select the option ‘Machine Learning’ and click ‘Next’. Select Advanced and choose Python.

Go back to setup, click on Browse and find the prediction.py script file you saved in step 2. Click on Add and enter the parameters. Type the Name (Returns), then click on the Value field and on the button to its right – this will open the Formula Editor. Choose Labels and double-click on lbl_return, then click OK.

Now do the same for Items. Enter the name as Items, open the Formula Editor from the Value field, choose Labels and double-click on lbl_items. Click OK. Then go to the Check Out Activity, select the distribution on the timing and select the new machine learning based distribution as the timing. Click OK

Reset and run your simulation. The timing of the Check Out Activity will now follow the machine learning algorithm we have created.

Having trouble setting up Distribution By ML? Check out our Machine Learning Troubleshooting page for more help.

Note creating a distribution using ML is not available in Simul8 Online.