Practical Machine Learning with Ruby

The following outline is a talk I gave at the Utah Ruby Users Group (URUG) where I discussed machine learning and how it can be applied in the Ruby programming language. It has also been updated for the talk I gave to the AZ Ruby Users Group where I added the Naive Bayes implementation.

What is Machine Learning?

TechTarget Definition

“Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.”

Dead simple definition:

The data determines the output.

Practical Applications of Machine Learning

  • Analyzing when a vehicle may need to be removed from a fleet based on historical trends.
  • Checking to see if an email is spam.
  • Verifying bank transactions.
  • Verifying medical diagnosis.
  • Product recommendations
  • Google search engine
  • Etc, etc, etc

Why Ruby Isn’t Typically Used for Machine Learning

  • Misconceived notions about speed
  • Lack of libraries
  • Ease of passing responsibilities to other services

Popular Algorithms

  • Decision Tree
  • Naive Bayes
  • K Means Clustering Algorithm
  • Support Vector Machine Algorithm
  • Apriori Algorithm
  • Linear Regression
  • Logistic Regression
  • Artificial Neural Networks
  • Random Forests
  • Nearest Neighbours

Supervised vs Unsupervised Learning

  • Supervised – Teaching a child that an electrical outlet is dangerous, and the child learns not to touch other electrical hazards.
  • Unsupervised – A child learns what’s dangerous by trial and error, eventually learning not to touch electrical hazards.

Ruby Machine Learning Example

Continuous vs Discrete

  • Continuous – Items that can be measured, e.g. height, weight, product reviews.
  • Discrete – Items that have a set number of options: gender, or dice roll.

Basic Decision Tree

require 'decisiontree'

attributes = ['Temp']

training = [
  [98.7, 'healthy'],
  [99.1, 'healthy'],
  [99.5, 'sick'],
  [100.5, 'sick'],
  [102.5, 'crazy sick'],
  [107.5, 'dead'],
]

dec_tree = DecisionTree::ID3Tree.new(attributes, training, 'sick', :continuous)
dec_tree.train

test = [98.5, 'healthy']

dec_tree.predict(test)

More Advanced Decision Tree

require 'decisiontree'

attributes = ['Age', 'Education', 'Income', 'Marital Status']
training = [
  ['36-55', 'Masters', 'High', 'Single', 1],
  ['18-35', 'High School', 'Low', 'Single', 0],
  ['36-55', 'Masters', 'High', 'Single', 1],
  ['18-35', 'PhD', 'Low', 'Married', 1],
  ['< 18', 'High School', 'Low', 'Single', 1],
  ['55+', 'High School', 'High', 'Married', 0],
  ['55+', 'High School', 'High', 'Married', 1],
  ['55+', 'High School', 'High', 'Married', 1],
  ['55+', 'High School', 'High', 'Married', 1],
  ['< 18', 'Masters', 'Low', 'Single', 0],
]

dec_tree = DecisionTree::ID3Tree.new(attributes, training, 1, :discrete)
dec_tree.train

test = ['18-35', 'PhD', 'High', 'Married']

dec_tree.predict(test)

Naive Bayes (provided by official guide)

require 'nbayes'

# create new classifier instance
nbayes = NBayes::Base.new
# train it - notice split method used to tokenize text (more on that below)
nbayes.train( "You need to buy some Viagra".split(/\s+/), 'SPAM' )
nbayes.train( "buy Viagra".split(/\s+/), 'SPAM' )
nbayes.train( "buy Viagra".split(/\s+/), 'SPAM' )
nbayes.train( "buy Viagra".split(/\s+/), 'SPAM' )
nbayes.train( "This is not spam, just a letter to Bob.".split(/\s+/), 'HAM' )
nbayes.train( "Hey Oasic, Do you offer consulting?".split(/\s+/), 'HAM' )
nbayes.train( "Hey Oasic, Do you offer consulting?".split(/\s+/), 'HAM' )
nbayes.train( "Hey Oasic, Do you offer consulting?".split(/\s+/), 'HAM' )
nbayes.train( "Hey Oasic, Do you offer consulting?".split(/\s+/), 'HAM' )
nbayes.train( "Hey Oasic, Do you offer consulting?".split(/\s+/), 'HAM' )
nbayes.train( "You should buy this stock".split(/\s+/), 'SPAM' )

# tokenize message
tokens = "Now is the time to buy Viagra cheaply and discreetly".split(/\s+/)
result = nbayes.classify(tokens)
# print likely class (SPAM or HAM)
result.max_class
# print probability of message being SPAM
result['SPAM']
# print probability of message being HAM
result['HAM']

Resources

Jordan Hudgens

view all post
Leave a comment
drew • 4 months ago

Great .. I learned a lot thanks

reply
Jordan Hudgens • 3 months ago

Thanks! I'm glad you were able to make it.

reply

Please be polite. We appreciate that.

By Daniele Zedda • 18 February

← PREV POST

By Daniele Zedda • 18 February

NEXT POST → 34
Share on