AGM 2018

Hello, UTMIST Fam! Thank you for coming to our AGM or joining the FB live stream, hope you all enjoyed our event 😊. Here are some follow-ups in case you missed the live AGM:



Video record



Join us (bit.ly/utmistsignup) if you have not sign up to our mailing list. Apply here for executive positions if you want to grow and learn with us. We really need people who are passionate about machine learning and help us make UTMIST into a better community!


Stay tuned for our upcoming events in early October! We will post all event updates and other ML-related content on Facebook & other social media.

Lastly, if you have any question, email us via utorontomist@gmail.com or ping us on Facebook @UofT.MIST, we will get back to you as soon as possible.


A218DED5-0541-4E15-A6E7-C9CC299C354F45CB425B-CAE4-4E68-AE6F-FAEEF9DFAF10Have a good weekend!

MIST101 #6: Reinforcement Learning Event Review

MIST101 Workshop 6: Reinforcement Learning, was held this Thursday, November 23rd.

In this workshop, we began by talking about some preliminary mathematical models for reinforcement learning and dive down to its approaches and applications.


Markov Decision Process(MDP) is a mathematical framework useful for studying optimization problems. It employs dynamic programming to compute the value functions to infer optimal policy, and uses Monte Carlo method to estimate value function by random sampling. The decision making process is only based on the current state. Iterative approaches such as policy/value iterations are greedy techniques to estimate the value function and improve the estimate until convergence.


Three different approaches were illustrated in detail: Value-based, Policy-based, and Actor critic. Actor critic is a generalised policy iteration – alternating between a policy evaluation and a policy improvement step. It no longer reliess on samples, but the trained model. The result of the different approaches on value function and policy are generalized in the table below.


Afterwards, we dive down to the applications of reinforcement learning on Alpha Go, Gaming bots, and self-driving cars. Alpha Go uses value network to reduce search depth, and uses policy network to reduce search breadth to reduce search space and find the next best move by maximizing the estimated reward. Gaming bots such as DotA2 and StarCraft II utilise multi-agnet reinforcement learning. Self-driving cars have an architecture includes a state-space and action space to develop the best driving policy.


The slides of workshop #6 can be found at: https://docs.google.com/presentation/d/1ED_iHBrR7PYTgOrxy8yUqZflpIIVi1d5ox2v4PWjCTc/edit?usp=sharing

This concludes our MIST101 Machine Learning Introductory Workshop Series.

It has been a great pleasure being a part of your journey on machine learning study. We would love you have your continuous support and looking forward to seeing you at our future events!


University of Toronto Machine Intelligence Student Team

Guest Speakers Series Event Review


We want to thank Lauren Erdman, Oren Kraus, Eleni Triantafillou and Marc Law from U of T’s Machine Learning Group in the Department of Computer Science for giving us great talks on their latest research progress last Tuesday and Thursday!

20171114-DSC_9253Oren Kraus talked about Microscopy in Biology. He presented a few innovative ideas and cutting-edge techniques that he has been trying out during his research. For example, in Yeast Proteome Dynamics, he uses methods such as computer vision, multiple instance learning, fully-connected CNNs, to extract features from yeast cells and perform cell recognition as well as classification. These techniques can be used in identification of tissues, and diagnosis for cancer.





Lauren Erdman emphasised on understanding the training dataset. She listed 3 examples on which the misused dataset lead to unexpected outcomes. The first example came in the perspective of making ethical decisions in health care. Degenerate ER risk assessments may bias on non-random treatment and mis-diagnosed the condition. The second example revealed the possibility that machine learning model learns the wrong training information when there is, for instance, structural bias in image data. Moreover, the model may also learn bias from historical data. Thus, it is important for us to know where the data set comes from and what it represents, so that we could mitigate its bias.





20171116-DSC_9282Eleni’s research is on few-shot learning, which understands new concepts from only a few examples. In order to optimize the information from the small amount of data, each data point, viewed as a “query”, ranks the other points based on its predicted relevance to them. This framework of structured prediction defines a model to optimize Mean Average Precision and performs just as well as other algorithms.






Marc Law teaches us about clustering, which groups examples so that similar ones are grouped into the same cluster and dissimilar examples are in different clusters. Two problem-dependant factors affect the quality of clustering: the chosen similarity metric and the data representation. Supervised clustering approaches try to find the metric that optimizes the performance of clustering.




Thank you for those who came to our events and asked great questions! We are looking forward to having everyone to join us and get a broader perspective on machine learning!20171116-DSC_9308


MIST101 #4 Event Review & Slides

MIST101 Workshop 4: Recurrent Neural Networks(RNN) was held this Thursday, October 26th. Having talked about Supervised Learnings and Convolutional Neural Network(CNN), this workshop covered RNN’s basic modeling principles, variants and various applications.

RNN Models and Training Methods

We first defined RNN as a classical neural network where connection units form a directed cycle. The proposed model takes in a sequence of data inputs and produce a sequence of predicted outputs as feedback information to the model itself. Computations are done in a series of hidden cells chaining together and sharing the same parameters.

To train a RNN, we showed the unfolded diagram of RNN. The steps involve defining a proper loss function that represents the task on hand  and using the standard method backpropagation through time(BPTT) to perform gradient descent to minimize loss across time steps. An example of generating a sentence was introduced.

We then briefly introduced the variants of RNN, including Bidirectional RNN, stacked RNN and the Encoder-Decoder Model. A commonly used architecture called Long Short-Term Memory  was covered as well. LSTM is augmented by gating mechanisms to prevent back propagated errors from vanishing or exploding, hence effectively capturing long term dependency from the sequential data.  


RNN excels in machine translation and language modeling, which are commonly used by search engines. As mentioned in the previous workshops, RNN can as well be combined with CNNs to achieve image captioning.

Hands-on TensorFlow Tutorial

The practical session for workshop #4 was about an example of modeling a sine curve. Students were given historical data of the process(Training Dataset) and asked to predict the price for the next K days. You can access the tutorial here:  https://github.com/ColinQiyangLi/MIST101/blob/master/Tensorflow_RNN%20(Part%201).ipynb

If you would like to learn more about workshop #4, the slides can be found under this link: https://docs.google.com/presentation/d/1zXyLmsSHlwy8KQpYvrw5WquxFKAciw0BU5QYktKO4kQ/edit?usp=sharing

MIST101 #3 Event Review & Slides

Event Review #3

20171012-DSC_9173MIST101 Workshop 3: Convolutional Neural Networks(CNN) was held this Thursday, October 12th. This workshop covered the building blocks, architectures, training methods , variants and applications of CNN.

Typical Architecture

A simple CNN consists stacks of specialized layers, and every layer acts like a function that transforms one set of activations to another. Three main types of layers are used and they were introduced in detail during the workshop.

Fully connected Layers

As a recap from the last workshop, neurons in a fully connected layer have full connections to all possible activations in the previous layer, without any cycle. However, neural nets with only fully connected layers are computationally expensive to train and are limited in application.

20171012-DSC_9143Convolution Layers

Neurons in convolution layers read the pixel values of a receptive field, and the selected kernels act as filters that process the input data. The convolution between pixel values and kernels makes up the new output of the layer. Stride and padding are two useful quantities to tune the convolution operation.

Pooling Layers

Pooling progressively reduces the spatial size of the image representations to reduce the amount of parameters and computation in the network.


Techniques such as gradient descent, weight initialization, Dropout method and Batch Normalization were introduced in detail to effectively leverage data learning. Dropout disables random nodes in the training process, and Batch Normalization is able to scale a batch to the center of focus. They both have the advantage of regularization and enable model ensembling. The learnt features from a CNN can be reused, which makes Transfer Learning possible across different computer vision domains.

20171012-DSC_9139Variance and Applications

To extend the topic and allow everyone to get a clear sense of real-world applications of CNN, a few state-of-the-art architectures such as GoogleNet, ResNet, were presented in the workshop.

CNN is typically used in areas like object detection and segmentation, image processing and style transfer. It can also be combined with Recurrent Neural Network(RNN), which will be introduced in detail in the next workshop, to handle more complicated tasks such as image captioning and neural translation.  

Tensorflow Tutorial

The tutorial shows how to use TensorFlow to train and evaluate a simple neural network for handwritten digit classification using the MNIST data set. Material used in this practical session can be found at: https://github.com/ColinQiyangLi/MIST101/blob/master/Tensorflow_Intro_2.ipynb

Thank you all for coming to our MIST101 workshop #3, we hope you have become familiar with Convolutional Neural Network by now! MIST101 #4, on 7:00-9:00pm October 26th, in GB119, will be talking about Recurrent Neural Network(RNN). RNN is widely applied in language modeling, translation, text classification and robotics.

If you would like to learn more about workshop #3, the slides can be found under this link: https://docs.google.com/presentation/d/13RBeyETvgSJ_2V8zDz1E2fVCNydGyllyMe6WSG4o5bU/edit?usp=sharing

Meanwhile, please take the time to fill out our feedback survey for the event if you haven’t done so!

MIST101 #2 Event Review & Slides


The second of our MIST101 workshop series, MIST101 #2 Supervised Learning and Neural Network, was successfully held last Thursday. The workshop went in depth to the major components of Supervised Learning problems and Neural Networks, and we are glad to obtain positive feedbacks from the audience.

During the workshop, we presented the Linear/Logistic Regression models and Neural Network architecture, introduced a learning algorithm called Gradient Descent that finds the minima of a loss function to foster model improvement/learning. In the end, we gave a summary of a typical training pipeline as:




  1. Preprocess data and split it into training(80%), validation(10%), and the test data set(10%).
  2. Choose a model architecture, optimize the model to minimize the loss function by a learning algorithm on the training set.
  3. Evaluate and fine-tune the model on the validation set.
  4. Repeat 2 and 3 until an optimized model is obtained.
  5. Evaluate the model on the test set to get a final performance score.

(Slides 60)

20171005-DSC_0018Introduction to Neural Networks

Supervised learning learns and constructs the model that best represents the target underlying function of the given input/output pairs. Typical examples of supervised learning are categorized into 2 types, Regression Problems and Classification Problems.

A loss function, which measures the incorrect prediction of a model to the given data set and serves as the learning objective to be minimized, can take various forms such as Mean Square Error(MSE) and Cross entropy, depending on the task on hand and the modeling principle (MLE, MAP, Bayesian).

Artificial Neural Network Models

A computational graph consists of Nodes and Edges that act as functions and input/output of a neural network. A Linear Regression Model,

Screen Shot 2017-10-12 at 9.03.20 PM

and Logistic Regression Model, where σ is a nonlinear function,

Screen Shot 2017-10-12 at 9.03.57 PM

were shown and explained in detail.

There are three major types of Artificial Neural Networks: Feed-forward Neural Network(FNN), Convolutional Neural Networks(CNN), and Recurrent Neural Networks(RNN). This workshop went in depth to talk about the fully-connected FNN. FNN consists of layers of neurons and no cycle, while a neuron encapsulates a linear transformation followed by a nonlinear activation.  CNN and RNN will be introduced in workshop #3 and #4 respectively.

20171005-DSC_0012Gradient Descent

Gradient is a multi-variable generalization of the derivative and has a direction of greatest rate of increase in the function. It allows the observing point to move towards the valley of the function to find its global/local minimum. Usually we use a method called Back-propagation to compute gradients on a computational graph. There is Batch GD that sums loss across the whole training set, and Stochastic/Mini-batch GD that only accumulate loss on one, or a mini-batch of training samples chosen randomly. The latter is normally considered as standard practice in applications. Concepts of Momentum and Adaptive Learning Rate were also introduced to augment Gradient Descent methods.

Model Evaluation

Lastly, if we were to evaluated the model, there could be scenarios where the model is not powerful enough or overpowered to learn the data, which are called Underfitting and Overfitting. To improve the model, we can tune the neural network architecture, training schedule, model regularization and etc. After many iterations of evaluations and tuning, we can settle down on the model that meets our needs.

20171005-DSC_0031Hands-on TensorFlow Tutorial

A hands-on tutorial  on TensorFlow was given after the lecture session. Some simple, representative examples were demonstrated in this session. You can gain some hands-on experience by yourself through reading the quick tutorial on the following link: https://github.com/ColinQiyangLi/MIST101
Thank you all for coming to our workshop #2, we hope you have gained some insights of supervised learning and Neural Network! MIST101 #3, on 7:00-9:00pm  October 12th, in GB119, will be talking about Convolutional Neural Network(CNN). CNN is specifically efficient in image processing, and it is what enables the machine to achieve unprecedented success from distinguishing simple cats & dogs images to demonstrating super-human performances in object recognition, segmentation and etc.

20171005-DSC_0003If you would like to learn more about workshop #2, the slides can be found under this link: https://docs.google.com/presentation/d/1guDvX8jy461qH8SmtdOYj_2BU76QHcugW32MQfwXhQU/edit?usp=sharing


MIST101 #2: Supervised Learning & Neural Networks

Following our first introductory MIST101 workshop, MIST101 #2: Supervised Learning and Neural Networks is coming up this Thursday!

For the impatient, if you would like to:
– Get insights on neural networks and supervised learning,
– Familiarize yourself with typical supervised Learning processes,
– Learn how to construct, train, and evaluate a learning model…

JOIN US at GB119, 7:00-9:00 p.m. on Thursday, Oct. 5th! Refreshments will be served.

Make sure to RSVP through Evenbrite, the $5 deposite will be fully refunded after the event for all attendees! Click here to get your ticket: https://www.eventbrite.ca/e/mist101-2-getting-started-with-supervised-learning-registration-38152621518

More details:

As a major subdivision in machine learning, supervised learning is a method to approximate the underlying relation from a given dataset and its labels. It has by far achieved unprecedented success in various areas such as database marketing, bioinformatics, pattern/speech recognition, and handwriting recognition, etc. Powered by the large amount of data, it has shown promising ability to solve an even wider range of tasks.

MIST101 is a series of workshops on machine learning and data science hosted by UTMIST on a bi-weekly basis. It starts from the basic theories and extends to the most cutting-edge research.