Take a look at our latest articles

First of all let's clear some terminology you need in order to understand the concept of an activation function.

is another supervised learning optimization algorithm. The main task of the backpropagation algorithm is to find optimal weights in a by implementing optimization technique.

Gradient Descent is a popular optimization technique in machine learning. It is aimed to find the minimum value of a function.

In this third and last part of the series "Introduction to Statistics" we will cover questions as what is probability and what are its types, as well as the three probability axioms on top of which the entire probability theory is constructed.

In the following three parts we will cover basic terminology as well as the core concepts from statistics. In this Part I you are going to learn about measures of central tendency (mean, median and mode). In the Part II you will read about measures of variability and different distributions. Finally, in the last Part III you will get to know types of probability such as marginal, joint and conditional. Enjoy!

In this part we will continue our talk about descriptive statistics and the measures of variability such as range, standard deviation and variance as well as different types of distributions. Feel free to read the Part I of these series to deepen your knowledge about the measures of central tendency:

Logit regression is another shortened name derived from logistic unit. Logistic regression is a popular statistical model that generates probabilities for binary classification tasks. It produces discrete values and its span lies in the range of [0.0 ; 1.0]. A discrete value is some outcome (dependent variable) that has only a limited number of possible values.

we try to optimize the algorithm, so it gives the best possible output. This optimization needs a loss function to compute the error/loss of the model.

In this article we will gain a general picture of Squared Error, Mean Squared Error (MSE), Maximum Likelihood Estimation (MLE) and Cross Entropy.

In this article we will delve into the magic behind one of the most popular Deep Learning frameworks - Tensorflow. We will look at the crucial terminology and some core computation principles we need to grasp the real power of Tensorflow.

describes the probability of some event, based on some conditions that might be related to that event.

In Neural Networks (NNs) we try to create a program which is able to learn from experience with respect to some task. This program should consider a performance measure, if its performance improves with experience.

or PCA is a technique for taking out relevant data points (variables also called components or sometimes features) from a larger data set. From this high dimensional data set, PCA tries extracting low dimensional data points. The idea is that with PCA we would be able to reproduce as much information as feasible in a low-dimensional space. If we have less variables, the data visualization becomes easier to understand. PCA is often used in problems, where we have to work with higher dimensions in data.

A Recurrent Neural Network (RNN) is a type of neural network where an output from the previous step is given as an input to the current step. RNNs are designed to take an input series with no size limits. RNNs remember the past states and are influenced by them. This type of are well suited for time series data.

If you happened to have a classification, a regression or an outlier detection task, you might want to consider using Support Vector Machines (SVMs), a supervised learning model, that builds a line (hyperplane) to separate data into groups. SVM can perform not only linear classification but also effectively separate data in a non-linear, multi-dimensional space using the so called kernel trick which transfers data points into a high dimensional space. SVM is no new special fancy technique for machine learning, it exists since 1963.

Despite the fact that SVM is also applied for regression tasks, it is still more often used for classification.

Matrix decomposition is another name for matrix factorization. This method is a nice representation for applied linear algebra in machine learning and similar algorithms.

is a special kind of matrix that consists of first order partial derivatives for some vector function. The form of the Jacobian matrix can vary. That means, the number of rows and columns can be equal or not, denoting that in one case it is a square matrix and in the other case it is not.

As a general rule, weights and biases are normally initialized with some random numbers. Weights and biases are extremely important model's parameters and play a pivot role in every neural network training. Therefore, one should have a firm understanding of how they work and why one needs them. In this article we will see how we can initialize and optimize weights to achieve better performance in a neural network.

Word embedding is a popular vocabulary representation model. Such model is able to capture contexts and semantics of a word in a document. So what is it exactly?

In the second part of Word Embeddings we will talk about what are the downsides of the Word2Vec model (Skip Gram in particular).

If you do data analysis, machine learning or some other data driven research you will probably encounter with high dimensional data. High dimensions in a data set mean that we have a high number of features (sometimes also called input variable). Our goal is to find good feature values and use them in our model training.

In this article, we will talk about a dimensionality reduction technique known as T-Distributed Stochastic Neighbor Embedding or t-SNE for short.

Copyright © 2019 byRichard Siegel at siegel.work

Contact & Privacy Policy