Warning! Your browser does not support this Website: Try Google-Chrome or Firefox!

It's AI Against Corona

by Elvira Siegel
(Published: Tue Mar 17, 2020)


There has been a lot of talking about the new corona virus going around the world. Let's clear up some things about it first and then we will see how data science and ai can help us fight 2019-nCoV.

The Origin

The Coronavirus disease (COVID-19) was first reported from Wuhan, China, on 31 December 2019.

The Naming

The new corona virus is called "2019-nCoV". All corona viruses have a series of crown-like spikes on its surface, hence the name "corona" derived from Latin and meaning "crown" or "wreath".


SARS, Corona?

Often you might hear SARS referred to as Corona. SARS or "Severe Acute Respiratory Syndrome" corona virus 2 (SARS-CoV-2) was the original name of the new virus. The name "SARS" was the choice because the virus is genetically related to the corona virus responsible for the SARS outbreak of 2003. Those viruses are related but still different.

COVID-19 is the disease caused by the new corona virus called 2019-nCoV (2019 novel Corona Virus).

So why COVID-19 and not just SARS?

In 2003 the SARS outbreak affected mostly Asia. To mitigate uncertainty and prevent confusing 2019-nCoV with SARS in the media, WHO is referring to the virus as “the virus responsible for COVID-19” or “the COVID-19 virus”. There is no intention to replace the official name of the virus, namely SARS.

How Long Does 2019-nCoV Survive on Surfaces?

The new corona virus may live up to three days on plastics or stainless steel according to a new study by the CDC. The virus can remain active on paper for up to 24 hours. This is shown by initial experiments by the United States Disease Control Agency (CDC).


Risk Groups

Who exactly is part of this risk group is one of the many details that are currently still being researched. But from what we know so far, 2019-nCoV seems to be similar to other infectious diseases: anyone who is already weakened will not easily be able to cope with the new corona virus.

Smokers also seem to be more at risk.

In general, the elderly among the COVID-19 patients face a higher mortality rate, increasing with age. This can be seen in data from China. Professor Uwe Liebert, director of Virology at Leipzig University Hospital says: "People of age 70 and above are particularly at risk of developing a severe corona virus course of illness". According to data from China, everyone of age 60 or older is at elevated risk when dealing with COVID-19. Most of the deaths in China were among those over 80 years old.

Just another form of influenza?

Stop saying that Corona is "something like the flue". It is not. COVID-19 is more dangerous than influenza. In a typical winter, 10 to 15 percent of the population are infected with the flu virus. Around one percent of these people have to be hospitalized and about one in a thousand people die. Depending on the flu season, the flue death rate is sometimes slightly higher.

COVID-19 has a much more severe course than flu. About 20 percent of all people who tested positive had to be hospitalized, five percent had to be given mechanical ventilation. Experience so far has shown that the disease is fatal in at least one percent of all patients. COVID-19 thus has a death rate that is at least ten times higher than for the common flu.

However, this rate also depends on how well the healthcare system is prepared. If the cases grow rapidly, this can overwhelm the intensive care capacities. Countries that take drastic containment measures on the early stage of the virus spread can expect a lower death rate.

It is not an easy task to see why some countries suffer more from teh corona virus, while the others less. There may be many diverse questions and factors such as: How old is the population? How good is the healthcare system and nursing in general? How quick quarantine measures were employed? Etc...

AI against Corona

Data Science in Epidemiology

The immense complexity of our modern mobility makes reducing virus spread extremely challenging. As an example, the global air traffic network links more than 4.000 airports worldwide through more than 25.000 connections. More than 3 billion passengers are transported every year and together travel more than 14 billion km a day.

In the 14th century Europe the mobility was almost exclusively local as a result back then the "black death" spread from south to north at a speed of approx. 4-5 km per day. Modern epidemics are much faster, covering traveling area between 100-400 km per day.

A crucial tool for the prediction of epidemic paths is highly developed computer simulations that try to predict the virus spread. Such computer simulations are rather complex, requiring precise knowledge and data of viruses transmitted by humans.

Fortunately, there are some concepts which might help to fight virus outbreakes.

The physicists Dirk Brockmann from Humboldt University Berlin and Dirk Helbing from ETH Zurich have developed an approach for network-driven contagion phenomena. The mathematical theory says that in the highly networked world of the 21st century, the geographical distances are no longer relevant. They must be replaced by the so called "effective distances". "From the perspective of Frankfurt, for example, other metropolises like London, New York and Tokyo are effectively no further away than geographically close places like Bremen, Leipzig or Kiel." so Brockmann, who's referring to three cities in his relatively small home country Germany as an example for close places.

But how do you calculate the effective distances?

Here the researchers show that those distances can be determined directly from the air traveling traffic network. That means, if many people travel from city A to city B, then the effective distance from A to B is small. If only a few people travel from A to B, then the effective distance is large. This is also why flights are canceled and borders or even cities are closed in times of the global corona virus spread.

If we looks at complex geographical epidemic patterns such as, for example, the global spread of SARS in 2003 or influenza A (H1N1) in 2009 in context of the effective distances, then the spread patterns become regular. We see the same circular waves in local infected areas as we narrow down to specific geographical regions. The spread stays circular which is easier to describe mathematically and visualize graphically.


The researchers claim that modern epidemic spread patterns do not appear to be fundamentally different from historical spread patterns, but became more difficult to determine due to the modern mobility impact on the ever more complex spread patterns.

An important aspect of the theory for applications is the fact that the “effective” distance patterns only have a simple geometry if they are visualized assuming the real place of origin for the virus. This means that you can determine the virus origin by calculating the current spread pattern from the perspective of all possible locations. Then we quantify the virus circularity degree. Consequentially, the true place of origin gives the highest results, meaning the data is most clearly spread circularly.

Here is the original paper The Hidden Geometry of Complex, Network-Driven Contagion Phenomena.

Nvidia is calling on gaming PC owners to put their systems to work fighting COVID-19

In case you happened to have a gaming-ready PC with an Nvidia grapics card, you should think about lending some graphical power to help fighting the COVID-19 outbreak.

Nvidia is calling to PC gamers to download the Folding@home application and start sharing their spare clock cycles to promoting world's corona virus knowledge. This program links your computer to an international network which harnesses the distributed processing power to cope with massive computing tasks - something that gaming GPUs are good at. This power is used to compute Machine Learning models to better predict and analyze the new disease.

No worries! Anyone can still turn off the application and take the GPU's full power back at each time desired.

The download of the application is available on foldingathome.org.

Chinese supercomputer uses Artificial Intelligence to diagnose patients from chest scans

The system analyses hundreds of images in seconds. The supercomputer can quickly distinguish between patients infected with the corona virus and those with common pneumonia or another lung diseases. The accuracy of the analysis is at least 80%. The system reports areas of the patient’s lungs that require special attention.

It also provides a likelihood estimate of the person having contracted COVID-19 in a range from zero to ten.

The system's performance improves as the number of samples for training increases. It was and is used to help medical teams fighting the corona virus in more than 30 hospitals in Wuhan and other Chinese cities.

Btw, China has offered free use of the machine around the world.

The Canadian Start Up BlueDot tracks infectious disease outbreakes

BlueDot, a startup based in Toronto uses artificial intelligence, machine learning and big data to predict the outbreak and spread of infectious diseases. On Dec. 20, 2019, the company alerts its customers and clients from the government about a potential cluster of new unusual lung disease cases coming from around a market in Wuhan, China.

Nine days have passed before the WHO alerts people about a novel corona virus danger.

BlueDot is a proprietary software created to locate, track and predict viruses spread. The BlueDot software gathers data on diseases around the world 24 hours a day. This data comes partially from organizations like the Center for Disease Control or the WHO.

However, the bigger part of BlueDot's data comes from the outside of the official health care resources, such as the travelers movements on commercial flights worldwide, human, animal, insect population data, climate change data, information from journalists and healthcare workers, which can be found on the Internet.

BlueDot’s workers classify the data manually, then they filter for relevant keywords and apply natural language processing as well as machine learning statistics to train their systems.

What BlueDot also does, they send out alerts to government, business and public health clients.

With the beginning outbreak of COVID-19, the BlueDot program marked articles in the Chinese language that wrote about 27 new pneumonia cases associated with the Wuhan market. Moreover, BlueDot could analyze the cities that were highly connected to Wuhan. They applied methods like global airline ticketing data to help predicting where the potential infected cases might be traveling.

The following international destinations from Wuhan, that BlueDot predicted, would have the highest amount of travelers: Bangkok, Hong Kong, Tokyo, Phuket, Seoul and Singapore. Just like predicted, these cities at the top of the list had the first corona virus cases outside of China.


What to expect?

In the current corona virus outbreak, we can use artificial intelligence in at least three following ways:

  1. support in the development of vaccines against COVID-19
  2. scan quickly through existing drugs to help see if any of them may be repurposed
  3. help find a medicine to fight both the current and future corona virus outbreaks

Let's still be realistic about what AI might do for us. The timing - likely, the earliest time something like this might be accomplished is 18 to 24 months away. The manufacturing scale-up, the safety testing, etc ... will postpone the results we need.

Where to find data to to work on?

Two datasets were made available by the Johns Hopkins University recently. One containing SARS-COV2 reports of different countries and regions and another dataset containing time-series with confirmed deaths and recoveries. They are available for download in a GitHub repository, which is updated daily.

Other useful links:

nCoV2019.live map

Kaggle corona virus dataset

Coronavirus Map: Tracking the Spread of the Outbreak

Centers for Disease Control and Prevention (CDC)


Activation Functions

What are activation functions in Neural Networks? First of all let's clear some terminology you need in order to understand the concept of an activation function. ...


or backward propagation of errorsis another supervised learning optimization algorithm. The main task of the backpropagation algorithm is to find optimal weights in a by implementing optimization technique. ...


The Convolutional Neural Network (CNN) architecture is widely used in the field of computer vision. Because we have a massive amount of data in image files, the usage of traditional neural networks wouldn't give much efficiency as the computational time would expl...

Early Stopping

In this article we will introduce you to the concept of Early Stopping and its implementation including code samples. ...


Generative Adversarial Networks (GANs) are a type of unsupervised neural networks. The network exists since 2014 and was developed by and colleges. ...

Gradient Descent

Hiking Down a Mountain Gradient Descent is a popular optimization technique in machine learning. It is aimed to find the minimum value of a function. ...

Introduction to Statistics

Part III In this third and last part of the series "Introduction to Statistics" we will cover questions as what is probability and what are its types, as well as the three probability axioms on top of which the entire probability theory is constructed. ...

Introduction to Statistics

Part I In the following three parts we will cover basic terminology as well as the core concepts from statistics. In this Part I you are going to learn about measures of central tendency (mean, median and mode). In the Part II you will read about measures of variabili...

Introduction to Statistics

Part II In this part we will continue our talk about descriptive statistics and the measures of variability such as range, standard deviation and variance as well as different types of distributions. Feel free to read the Part I of these series to deepen your knowle...

Logistic Regression

Logit Regression Logit regression is another shortened name derived from logistic unit. Logistic regression is a popular statistical model that generates probabilities for binary classification tasks. It produces discrete values and its span lies in the range of [...

Loss Functions

When training a neural network, we try to optimize the algorithm, so it gives the best possible output. This optimization needs a loss function to compute the error/loss of the model. In this article we will gain a general picture of Squared Error, Mean Sq...

The Magic Behind Tensorflow

Getting started In this article we will delve into the magic behind one of the most popular Deep Learning frameworks - Tensorflow. We will look at the crucial terminology and some core computation principles we need to grasp the real power of Tensorflow. ...

Classification with Naive Bayes

The Bayes' Theorem describes the probability of some event, based on some conditions that might be related to that event. ...

Neural Networks

Neural Networks - Introduction In Neural Networks (NNs) we try to create a program which is able to learn from experience with respect to some task. This program should cons...


Principal component analysis or PCA is a technique for taking out relevant data points (variables also called components or sometimes features) from a larger data set. From this high dimensional data set, PCA tries extracting low dimensional data points. The idea...

Introduction to reinforcement learning

Part IV: Policy Gradient In the previous articles from this series on Reinforcement Learning (RL) we discussed Model-Based and Model-Free RL. In model-free RL we talked about Value Function Approximation (VFA). In this Part we are going to learn about Policy Based R...

Introduction to Reinforcement Learning

Part I : Model-Based Reinforcement Learning Welcome to the series "Introduction to Reinforcement Learning" which will give you a broad understanding about basic (and not only :) ) techniques in the field of Reinforcement Learning. The article series assumes you have s...

Introduction to Reinforcement Learning

Part II : Model-Free Reinforcement Learning In this Part II we're going to deal with Model-Free approaches in Reinforcement Learning (RL). See what model-free prediction and control mean and get to know some useful algorithms like Monte Carlo (MC) and Temporal Differ...

Recurrent Neural Networks

RNNs A Recurrent Neural Network (RNN) is a type of neural network where an output from the previous step is given as an input to the current step. RNNs are designed to take an input series with no size limits. RNNs remember the past states and are influenced by them...


Support Vector Machines If you happened to have a classification, a regression or an outlier detection task, you might want to consider using Support Vector Machines (SVMs), a supervised learning model, that builds a line (hyperplane) to separate data into groups....

Singular Value Decomposition

Matrix factorization: Singular Value Decomposition Matrix decomposition is another name for matrix factorization. This method is a nice representation for applied linear algebra in machine learning and similar algorithms. ...

Partial Derivatives and the Jacobian Matrix

A Jacobian Matrix is a special kind of matrix that consists of first order partial derivatives for some vector function. The form of the Jacobian matrix can vary. That means, the number of rows and columns can be equal or not, denoting that in one case it is a squa...

Introduction to Reinforcement Learning

Part III: Value Function Approximation In the previous Part I and Part II of this series we described model-based and model-free reinforcement learning as well as some well known algorithms. In this Part III we are going to talk about Value Function Approximation: w...

Weight Initialization

How does Weight Initialization work? As a general rule, weights and biases are normally initialized with some random numbers. Weights and biases are extremely important model's parameters and play a pivot role in every neural network training. Therefore, one should ...

Word Embeddings

Part 1: Introduction to Word2Vec Word embedding is a popular vocabulary representation model. Such model is able to capture contexts and semantics of a word in a document. So what is it exactly? ...

Word Embeddings

Part 2: Word2Vec (Skip Gram)In the second part of Word Embeddings we will talk about what are the downsides of the Word2Vec model (Skip Gram...


T-Distributed Stochastic Neighbor Embedding If you do data analysis, machine learning or some other data driven research you will prob...
Copyright © 2024 by Richard Siegel at siegel.work Donate Contact & Privacy Policy