
Transfer learning is an important topic. As a civilization, we have been passing on the knowledge from one generation to the other, enabling the technological advancement that we enjoy today. It’s the edifice that supports most of the state-of-the-art models that are blowing steam, empowering many services that we take for granted.
Transfer learning is about having a good starting point for the downstream task we’re interested in solving.
In this article, we’re going to discuss how to piggyback on transfer learning to get a warm start to solve an image classification task. …

In the last article, we looked at models that deal with non-time-series data. Time to turn our heads towards some other models. Here we will be discussing deep sequential models. They are predominantly used to process/predict time series data.
Link to Part 1, in case you missed it.
Simple recurrent neural networks (referred to also as RNNs) are to time-series problems as CNNs to computer vision. In a time-series problem, you feed a sequence of values to a model and ask it to predict the next n values of that sequence. RNNs go through each value of the sequence while building up memory of what it has seen which helps it to predict what the future will look like. …

If you thought machine learning is the crush that you wouldn’t have guts to talk to, Deep learning is the dad of your crush! Due to the unprecedented advances in hardware and researchers’ appetite for better and bigger models, deep learning is becoming intimidating and elusive by the day. The more research bubbles up everyday, the more it pushes the level of the basic knowledge you should have. So, for all those folks, who’s hesitant to dive straight into the murky and tacky goo-iness of deep learning, I hope this article would boost your confidence. …

Ideation of Seq2Seq or sequence-to-sequence models came in a paper by Ilya Sutskever et.al. in “Sequence to Sequence Learningwith Neural Networks”. They are essentially a certain organization of deep sequential models (a.k.a. RNN based models) (e.g. LSTMs/GRUs)[1] (discussed later). The main type of problems addressed by these models is,
mapping an arbitrary length sequence to another arbitrary length sequence
Where might we come across such problems? Pretty much anywhere. Applications of,
are few examples that can capitalize on such a model. These applications have a very unique problem formulation requiring the ability to map an arbitrarily long source sequence to an arbitrary-length target sequence. For example, if you imagine a English to French translation, there is no one-to-one mapping between words in two languages. Often, translating from one language to another requires learning copious complex features (one-to-many, many-to-one, many-to-many mappings, lexical dependencies, word alignment [2], etc). …

As a data scientist, I grapple with Docker on a daily basis. Creating images, spinning up contains have become as common as writing Python scripts for me. And this journey has its achievements as well as moments, “I wish I knew that before”.
This article discusses some of the best practices while using Docker for your data science projects. By no means this is an exhaustive checklist. But this covers most things I’ve come across as a data scientist.
This article assumes basic-to-moderate knowledge of Docker. For example, you should know what Docker is used for and should be able to comfortably write a Dockerfile and understand Docker commands like RUN , CMD, etc. If not, have a read-through this article from official Docker site. …

GloVe implementation with Keras: [here]
In this article, you will learn about GloVe, a very powerful word vector learning technique. This article will focus explaining the why GloVe is better and the motivation behind the cost function of GloVe which is the most crucial part of the algorithm. . The code will be discussed in detail in a later article.
To visit my previous articles in this series use the following letters.
A B C D* E F G H I J K L* M N O P Q R S T U V W X Y Z
GloVe is a word vector technique that rode the wave of word vectors after a brief silence. Just to refresh, word vectors put words to a nice vector space, where similar words cluster together and different words repel. The advantage of GloVe is that, unlike Word2vec, GloVe does not rely just on local statistics (local context information of words), but incorporates global statistics (word co-occurrence) to obtain word vectors. But keep in mind that there’s quite a bit of synergy between the GloVe and Word2vec. …

This story introduces you to a Github repository which contains an atomic up-to-date Attention layer implemented using Keras backend operations. Available at attention_keras .
With the unveiling of TensorFlow 2.0 it is hard to ignore the conspicuous attention (no pun intended!) given to Keras. There was greater focus on advocating Keras for implementing deep networks. Keras in TensorFlow 2.0 will come with three powerful APIs for implementing deep networks.
model = Sequential() and keep adding layers, e.g. model.add(Dense(...)) .model = Model(inputs=[...], outputs=[...]) …
Neural style transfer (NST) is a very neat idea. NST builds on the key idea that,
it is possible to separate the style representation and content representations in a CNN, learnt during a computer vision task (e.g. image recognition task).
Following this concept, NST employs a pretrained convolution neural network (CNN) to transfer styles from a given image to another. This is done by defining a loss function that tries to minimise the differences between a content image, a style image and a generated image, which will be discussed in detail later. …

This article aims at introducing decision trees; a popular building block of highly praised models such as xgboost. A decision tree is simply a set of cascading questions. When you get a data point (i.e. set of features and values), you use each attribute (i.e. a value of a given feature of the data point) to answer a question. The answer to each question decides the next question. At the end of this sequence of questions, you will end up with a probability of the data point belonging to each class.
Note: This article is behind the Medium paywall. However the code is opensource, and can be accessed from this link. …

We see thousands of articles about data science being published everyday. Why is writing so popular and sought-after among data scientist? This is because,
But how many of those articles you read actually stood out to teach you the method instead of gushing out an incomprehensible torrent of technical jargon? How many of them would have made you appreciate the author, or recommend to a friend? How many times you have you come across a click-baity title that says “Best explanation of topic x” which lived up to the standard? Quite a few I suppose. …

About