Trending December 2023 # Physicists Probe The Deep Earth For A Fifth Fundamental Force # Suggested January 2024 # Top 21 Popular

You are reading the article Physicists Probe The Deep Earth For A Fifth Fundamental Force updated in December 2023 on the website Cattuongwedding.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Physicists Probe The Deep Earth For A Fifth Fundamental Force

In general, people tend to use the phrase “force of nature” loosely, as in “she’s a real force of nature.” But physicists are pickier–they reserve the phrase for just four separate, universal forces they call the “fundamental forces”: gravity, electro-magnetism, and the strong and weak nuclear forces, which hold the nucleus together and are involved with radioactive decay, respectively.

That doesn’t mean physicists rule out the possibility that other forces exist. Since the models they have for explaining everything are incomplete, there’s a fairly good chance that there’s something else out there, pulling matter apart or pushing it together in a different way than all the forces identified so far.

A team of physicists at the Amherst College and the University of Texas is looking for one potential fifth force–one that might arise from the spin of electrons interacting with the spins of other subatomic particles. We experience the short-range effects of electron spin every time we snap two magnets together–a result of the fact that iron contains electrons that line up with each other–but scientists think that particles’ spin may cause them to interact over very long distances, too.

The problem is, even if this long-range force, called “spin-spin force,” does exist, it’s incredibly weak, and therefore extremely difficult to detect. Larry Hunter, chief scientist on the Amherst team, calculates that the long-range spin-spin force must be at least a million times weaker than the gravitational attraction between a neutron and an electron. Since gravity is the most obvious force in our lives, it seems like a strong one, but on the scale of individual sub-atomic particles, it’s almost completely insignificant (the electrostatic force between two electrons is a million trillion trillion trillion times stronger than their gravitational attraction).

But Hunter’s team has come up with a clever way to address the difficulty: instead of trying to detect the incredibly faint force acting between two particles, they are aiming to detect the net force of all the spinning electrons locked up in iron atoms in Earth’s deep interior.

Since the electrons in iron tend to line up with the magnetic field, Hunter realized it might be possible to estimate the net spin from all the iron atoms in the mantle–the thick layer between Earth’s crust and core that makes up the bulk of our planet’s volume.

Hunter worked with a geophysicist, Jung-Fu Lin of the University of Texas at Austin, to calculate the strength of the magnetic field in different regions of Earth’s interior and the net spin of electrons throughout the mantle. The researchers realized they could use the planet’s quarry of iron atoms to increase the long-range spin-spin force on the neutrons inside Hunter’s lab by more than a quadrillion-fold.

Still, there are lots of challenges: the test chamber containing the atoms of mercury gas the team is studying has to be completely isolated from all interference from the four known fundamental forces, and the tools the team uses to detect changes in the gas need to be perfectly calibrated. “It’s a little like playing whack-a-mole,” Hunter jokes. “You fix one problem and then another one pops up.”

Hunter thinks he and his team can increase the sensitivity of their set-up by at least a couple orders of magnitude. Whether that increase will be enough for the physicists to make a detection–if this force really exists–remains to be seen.

You're reading Physicists Probe The Deep Earth For A Fifth Fundamental Force

Cyber Monday Star Ocean: The Divine Force Deals 2023

Cyber Monday Star Ocean: The Divine Force Deals 2023

All the best deals for Star Ocean during Cyber Monday!

Cyber Monday Star Ocean: The Divine Force deals are most likely going to be all over the place for this year’s sale

Star Ocean: The Divine Force is the sixth main game in the series that started way back in the year of 1996.

READ NOW: The Last of Us Part 1 Cyber Monday deals

It is a classic jRPG filled with diverse and colorful characters with some fast-paced combat with a lot of swords and blades.

Now without further ado, let’s jump into Cyber Monday Star Ocean: The Divine Force deals!

Best Star Ocean: The Divine Force Deals 2023

Star Ocean: The Divine Force was released very recently on October 27, 2023. It is available on PC, Xbox One X/S, Xbox Series X/S, PS4, and PS5.

If you are looking for some Cyber Monday Star Ocean: The Divine Force deals or discounts on any other jRPG then this sale might be just for you. Here you can find some of the best deals on games during Cyber Monday 2023.

Where to Find the Best Star Ocean: The Divine Force Cyber Monday Deals?

You can find all the discounts on Star Ocean: The Divine Force during this year’s Cyber Monday sale. All you need to do is tune in during the sale itself later on in November.

Visit your local retail stores and the official digital shops depending on the platform. Look through Steam on PC, PS Store on PlayStation, or Microsoft Store on Xbox consoles.

In addition to that you can see some of the stores we have gathered just for you down below!

Star Ocean: The Divine Force Cyber Monday Sales Shortlist:

Amazon – Deals on everything from games to PCs

Steam – Big deals every year for Cyber Monday

Green Man Gaming – Save big on AAA titles this Cyber Monday

Cyber Monday Star Ocean: The Divine Force Deals

Here you will be able to find a plethora of Cyber Monday Star Ocean: The Divine Force deals we have collected specially for you. Go through these deals on the game on various platforms and you might just find something for yourself!

US

Deals

US

UK

Deals

UK

Canada

Deals

Editor’s pick

$300 Off

iBUYPOWER – TraceMR Gaming Desktop

Best Deals

Save $300 @ Best Buy

editorpick

Editor’s pick

$300 Off

CyberPowerPC – Gamer Master Gaming Desktop (RTX 3060)

Best Deals

Save $300

editorpick

Editor’s pick

$350 Off

iBUYPOWER TraceMR Gaming Desktop (RTX 3080)

Best Deals

Save $350

Custom URL

editorpick

Editor’s pick

Save 29% NOW!

Lenovo – Legion Tower 5i Gaming Desktop

Gaming Desktop – NVIDIA GeForce RTX 3070 – Intel i7-12700F – RTX 3070-16GB RAM – 1TB SSD – Win 11 – Black – Mouse &

Keyboard – Free 3-Month Xbox GamePass

Best Deals

Deal @ Amazon

Save 9% NOW!

Acer Predator Orion 3000 (RTX 3070)

Best Deals

Deal @ amazon

Save 20% NOW!

Skytech Chronos Gaming PC Desktop

Best Deals

Deal @ amazon

iBUYPOWER Gaming PC Computer Desktop Element Mini 9300 (AMD Ryzen 3 3100 3.6GHz, AMD Radeon RX 550 2GB, 8GB DDR4 RAM, 240GB SSD, WiFi Ready, Windows 10 Home)

Best Deals

View Deal

CyberpowerPC Gamer Supreme Liquid Cool Gaming PC, Intel Core i7-9700K 3.6GHz, NVIDIA GeForce RTX 2070 Super 8GB, 16GB DDR4, 1TB PCI-E NVMe SSD, WiFi Ready & Win 10 Home (SLC8260A2, Black)

Best Deals

View Deal

Thermaltake LCGS Glacier 100 CPU Gaming PC (AMD RYZEN 5 3600 6-core, ToughRam DDR4 3000Mhz 16GB RGB Memory, GTX 1660 Super 6GB, 500GB SATA III, Win 10 Home) S1WT-B450-G10-LCS, White

Best Deals

View Deal

Lenovo ThinkCentre M90n-1 Nano Desktop PC, Intel Core i3, 8GB RAM, Windows 10 Pro

Best Deals

View Deal

Cyber Monday 2023 will take place on November 28, 2023. It is the last Monday of November and the first right after Black Friday 2023. There are also going to be some Cyber Monday Star Ocean: The Divine Force deals to go through!

If a game is not on the Black Friday 2023 sale make sure to wait a couple of days to see if the title you were craving for is on sale on Cyber Monday 2023 just a few days later!

How to Get the Best Cyber Monday Star Ocean: The Divine Force Deals in 2023?

If you want to get the best Cyber Monday Star Ocean: The Divine Force deals make sure to check out your local gaming stores on that day, as well as online and digital retailers, either official like Steam or some of the biggest shops like Amazon or Best Buy.

Features to Consider When Looking for Cyber Monday Star Ocean: The Divine Force Deals?

Their paths cross on Aster IV where they are attacked by the Federation. The whole story is very emotional and exploits the potential stories and relations between two characters while having some fast-paced combat to make the gameplay smooth.

More Cyber Monday Deals

Check out all our Cyber Monday Deals here.

Cyber Monday Star Ocean: The Divine Force FAQs

Is Star Ocean: The Divine Force Going to be on Sale During Cyber Monday 2023?

Probably yes. Even though the game is very new, Cyber Monday is a great time to remind players about a title and show it to the wider audience.

Are Cyber Monday Star Ocean: The Divine Force Deals 2023 Actually Good?

Yeah. The game is quite new so any discount on it will be a real steal.

Is Cyber Monday 2023 a Good Time to Buy Games?

It is one of the best times to purchase games, there are going to be huge discounts on some of the most popular titles.

5 Best Desktops For Machine Learning & Deep Learning

5 best desktops for machine learning & deep learning

995

Share

X

Machine learning and deep learning require powerful computers. Amazon has a fair amount of choices for people and budgets of all kinds.

This article will help you decide what to buy by showing you a list of different computers with their pros and cons.

Looking for more computer-related articles? Check out our detailed Computers Hub for more buying guides and information.

Visit our thorough Buying Guides Section if you want more help regarding your choice of gadgets and devices.

X

INSTALL BY CLICKING THE DOWNLOAD FILE

To fix Windows PC system issues, you will need a dedicated tool

Fortect is a tool that does not simply cleans up your PC, but has a repository with several millions of Windows System files stored in their initial version. When your PC encounters a problem, Fortect will fix it for you, by replacing bad files with fresh versions. To fix your current PC issue, here are the steps you need to take:

Download Fortect and install it on your PC.

Start the tool’s scanning process to look for corrupt files that are the source of your problem

Fortect has been downloaded by

0

readers this month.

Deep & machine learning are tools that try to replicate the brain’s neural network in machines. They introduce self-learning techniques that teach AI to behave under certain conditions. Usually, the AI has to do a certain task and learn from its own mistakes in order to fulfill it.

If you want to get into machine learning and deep learning you might need to take a look at your current computer. In other words, even if your desktop can perform everyday tasks with ease, that doesn’t mean it will have the computing power to run machine learning & deep learning programs.

GPU and CPU are crucial. You need a graphics card with high memory and your processor must have many cores. In addition, your RAM memory needs to be high as well, somewhere around 8 gigs or more.

Because these processes run for long periods of time, the computer you’re looking for needs to be able to run them as long as possible without problems. In conclusion, a powerful cooler is required to stop your components from overheating and causing thermal throttling.

What are the best desktops for machine & deep learning?

RTX 2080 Super has over 8GB of dedicated memory

The Intel Core i9-9900K has an ideal 8 cores and can be turbo boosted

Over 32GB of HyperX DDR4 3000mhz RAM memory

Liquid cooling keeps your temps as low as possible even during intensive use

The product is expensive, its price starting from 2000$

Visit website

HP Obelisk Omen is the most powerful item on our list. Geared with the latest hardware such as the 9th generation Intel Core i9-9900K Processor and the hyper-realistic NVIDIA GeForce RTX 2080 Super,

It is perfect for machine learning and deep learning.

If you want speed, power, customization, and the best quality products out there, this is the choice for you.

The GTX 1660 TI offers 6GB of dedicated memory

The i5-9400f has 6 cores

16Gb of DDR4 memory

The processor does not support overclocking

Visit website

Our last item on this list is another midrange choice, the Hp Pavilion. It is close in performance to the Skytech Shiva while also being a bit cheaper.

The GeForce GTX 1660 TI is just about 10% weaker than the aforementioned RTX 2060, but it is less expensive. In addition, the i5-9400f is still capable of deep learning & machine learning processes.

The Ryzen 5 2600 offers 6 cores

The Video Card has 6GB of DDR6 memory

3x RGB RING Fans for Maximum Air Flow, powered by 80 Plus Certified

500 Watt Power Supply

It only has 8GB of RAM

Check price

Expert tip:

Equipped with a Ryzen 5 2600 processor and a GTX 1660 TI graphics card, it is capable enough of running parsing data algorithms. Furthermore, both the GPU and the CPU can be overclocked.

Intel Core i7-9700k offers has an ideal 8 cores

The NVIDIA RTX 2070 Super comes with 8GB of dedicated memory

Liquid cooling keeps the temperatures low

16GB of DDR4 RAM is enough for deep learning & machine learning

You need to update your firmware if you want to get the proper CPU speed boosts working

Check price

CyberpowerPC Gamer Supreme is our next recommendation on the list. Coming close to the HP Omen mentioned above, this desktop trades a bit of power for a lower price.

The Intel Core i7-9700k and GeForce RTX 2070 Super still offer cutting-edge performance for more affordable prices. Moreover, this desktop can also be overclocked with no problem.

The Ryzen 5 2600 processor has 6 cores

It has 16GB of DDR4 RAM

The Graphics Card has 6GB of dedicated memory

Equipped with 3x RGB Ring Fans ensuring good airflow

Lacks a USB Type-C port

Check price

Skytech Shiva is our first budget-oriented choice. Significantly cheaper than the other two desktops from above, yet still holding strong in terms of performance, this computer is perfect for those who want some balance between price and power.

This product is geared with an AMD processor, precisely the Ryzen 5 2600, and an RTX 2060 non-Super version. The CPU is about 20% slower than an i7-9900k, but it is so much cheaper. Moreover, both the CPU and GPU can be easily overclocked.

This list covers all you need to buy a brand new desktop for deep learning & machine learning.

Still experiencing issues?

Was this page helpful?

x

Start a conversation

Essentials Of Deep Learning: Introduction To Unsupervised Deep Learning (With Python Codes)

Introduction

In one of the early projects, I was working with the Marketing Department of a bank. The Marketing Director called me for a meeting. The subject said – “Data Science Project”. I was excited, completely charged and raring to go. I was hoping to get a specific problem, where I could apply my data science wizardry and benefit my customer.

The meeting started on time. The Director said “Please use all the data we have about our customers and tell us the insights about our customers, which we don’t know. We really want to use data science to improve our business.”

I was left thinking “What insights do I present to the business?”

Data scientists use a variety of machine learning algorithms to extract actionable insights from the data they’re provided. The majority of them are supervised learning problems, because you already know what you are required to predict. The data you are given comes with a lot of details to help you reach your end goal.

I am planning to write a series of articles focused on Unsupervised Deep Learning applications. This article specifically aims to give you an intuitive introduction to what the topic entails, along with an application of a real life problem. In the next few articles, I will focus more on the internal workings of the techniques involved in deep learning.

Note – This article assumes a basic knowledge of Deep Learning and Machine learning concepts. If you want to brush up on them, you can go through these resources:

So let’s get started!

Table of Contents

Why Unsupervised Learning?

Case Study of Unsupervised Deep Learning

Defining our Problem – How to Organize a Photo Gallery?

Approach 1 – Arrange on the basis of time

Approach 2 – Arrange on the basis of location

Approach 3 – Extract Semantic meaning from the image and use it organize the photos

Code Walkthrough of Unsupervised Deep Learning on the MNIST dataset

Why Unsupervised Learning?

A typical workflow in a machine learning project is designed in a supervised manner. We tell the algorithm what to do and what not to do. This generally gives a structure for solving a problem, but it limits the potential of that algorithm in two ways:

It is bound by the biases in which it is being supervised in. Of course it learns how to perform that task on its own. But it is prohibited to think of other corner cases that could occur when solving the problem.

As the learning is supervised – there is a huge manual effort involved in creating the labels for our algorithm. So fewer the manual labels you create, less is the training that you can perform for your algorithm.

To solve this issue in an intelligent way, we can use unsupervised learning algorithms. These algorithms derive insights directly from the data itself, and work as summarizing the data or grouping it, so that we can use these insights to make data driven decisions.

Let’s take an example to better understand this concept. Let’s say a bank wants to divide its customers so that they can recommend the right products to them. They can do this in a data-driven way – by segmenting the customers on the basis of their ages and then deriving insights from these segments. This would help the bank give better product recommendations to their customers, thus increasing customer satisfaction.

Case Study of Unsupervised Deep Learning

In this article, we will take a look at a case study of unsupervised learning on unstructured data. As you might be aware, Deep Learning techniques are usually most impactful where a lot of unstructured data is present. So we will take an example of Deep Learning being applied to the Image Processing domain to understand this concept.

Defining our Problem – How to Organize a Photo Gallery?

I have 2000+ photos in my smartphone right now. If I had been a selfie freak, the photo count would easily be 10 times more. Sifting through these photos is a nightmare, because every third photo turns out to be unnecessary and useless for me. I’m sure most of you will be able to relate to my plight!

Ideally, what I would want is an app which organizes the photos in such a manner that I can go through most of the photos and have a peek at it if I want. This would actually give me context as such of the different kinds of photos I have right now.

To get a clearer perspective of the problem, I went through my mobile and tried to identify the categories of the images by myself. Here are the insights I gathered:

First and foremost, I found that one-third of my photo gallery is filled with memes (Thanks to my lovely friends on WhatsApp).

I personally collect interesting quotes / shares I come across on Reddit. These are mostly motivational or funny, depending on which subreddit I downloaded them from.

There are at least 200 images I captured, or my colleagues shared, of the famous DataHack Summit and the subsequent AV outing we had in Kerala

There are a few photos of whiteboard discussions that happen frequently during meetings.

Then there are a few images/screenshots of code tracebacks/bugs that require internal team discussions. They are a necessary evil that has to be purged after use.

I also found dispersed “private & personal” images, such as selfies, group photos and a few objects/sceneries. They are few, but they are my prized possessions.

Last but not the least – there were numerous “good morning”, “happy birthday” and “happy diwali” posts that I desperately want to delete from my gallery. No matter how much I exterminate them, they just keep coming back!

Now that you know the scenerio, can you think of the different ways to better organize my photos through an automated algorithm? You can discuss your thoughts on this discussion thread.

In the below sections, we will discuss a few approaches I have come up with to solve this problem.

Approach 1 – Arrange on the basis of time

The simplest way is to arrange the photos on the basis of time. Each day could have a different folder for itself. Typically, most of the photo viewing apps use this approach (eg. Google Photos app).

The upside of this will be that all the events that happened on that day will be stored together. The downside of this approach is that it is too generic. Each day, I could have photos that are from an outing, or a motivational quote, etc. Both of them will be mixed together – which defeats the purpose altogether.

Approach 2 – Arrange on the basis of location

The downside of this approach is the simplistic idea on which it was created. How can we define the location of a meme, or a cartoon – which takes a fair share of my image gallery? So this approach lacks ingenuity as well.

Approach 3 – Extract Semantic meaning from the image and use it to define my collection

The approaches we have seen so far were mostly dependent on the metadata that is captured along with the image. A better way to organize the photos would be to extract semantic information from the image itself and use that information intelligently.

Let’s break this idea down into parts. Suppose we have a similar variety of photos (as mentioned above). What trends should our algorithm capture?

Is the captured image of a natural scene or is it an artificially generated image?

Is there textual material in the photograph? If there is – can we identify what it is?

What are the different kinds of objects present in the photograph? Do they combine to define the aesthetics of the image?

Are there people present in the photograph? Can we recognize them?

Are there similar images on the web which can help us identify the context of the image?

So our algorithm should ideally capture this information without explicitly tagging what is present and what is not, and use it to organize and segment our photos. Ideally, our final organized app could look like this:

This approach is what we call an “unsupervised way” to solve problems. We did not directly define the outcome that we want. Instead, we trained an algorithm to find those outcomes for us! Our algorithm summarizes the data in an intelligent manner, and then tries to solve the problem on the basis of these inferences. Pretty cool, right?

Now you may be wondering – how can we leverage Deep Learning for unsupervised learning problems?

As we saw in the case study above, by extracting semantic information from the image, we can get a better view of the similarity of images. Thus, our problem can be formulated as – how can we reduce the dimensions of an image so that we can reconstruct the image back from these encoded representations?

Here, we can use a deep learning architecture called Auto Encoders. 

Let me give you a high level overview of Auto Encoders. The idea behind using this algorithm is that you are training it to recreate what it just learnt. But the catch is that it has to use a much smaller representation phase to recreate it.

For example, an Auto Encoder with encoding set to 10 is trained on images of cats, each of size 100×100. So the input dimension is 10,000, and the Auto Encoder has to represent all this information in a vector of size 10 (as seen in the image below).

An auto encoder can be logically divided into two parts: an encoder and a decoder. The task of the encoder is to convert the input to a lower dimensional representation, while the task of the decoder is to recreate the input from this lower dimensional representation.

This was a very high level overview of auto encoders. In the next article – we will look at them in more detail.

Note – This is more of a forewarning; but the current state-of-the-art methods still aren’t mature enough to handle industry level problems with ease. Although research in this field is booming, it would take a few more years for our algorithms to become “industrially accepted”.

Code Walkthrough of Unsupervised Deep Learning on MNIST data

Now that you have an intuition of solving unsupervised learning problems using deep learning – we will apply our knowledge on a real life problem. Here, we will take an example of the MNIST dataset – which is considered as the go-to dataset when trying our hand on deep learning problems. Let us understand the problem statement before jumping into the code.

The original problem statement is to identify individual digits from an image. You are given the labels of the digit that the image contains. But for our case study, we will try to figure out which of the images are similar to each other and cluster them into groups. As a proxy, we will check the purity of these groups by inspecting their labels. You can find the data on AV’s DataHack platform – the Identify the Digits practice problem.

We will perform three Unsupervised Learning techniques and check their performance, namely:

KMeans directly on image

KMeans + Autoencoder (a simple deep learning architecture)

We will look into the details of these algorithms in another article. For the purposes of this post, let’s see how we can attempt to solve this problem.

Before starting this experiment, make sure you have Keras installed in your system. Refer to the official installation guide. We will use TensorFlow for the backend, so make sure you have this in your config file. If not, follow the steps given here.

We will then use an open source implementation of DEC algorithm by Xifeng Guo. To set it up on your system, type the below commands in the command prompt:

cd DEC-keras

You can then fire up a Jupyter notebook and follow along with the code below.

First we will import all the necessary modules for our code walkthrough.

%

pylab

inline

import

os

import

keras

import

metrics

import

numpy

as

np

import

pandas

as

pd

import

keras.backend

as

K

from

time

import

time

from

keras

import

callbacks

from

keras.models

import

Model

from

keras.optimizers

import

SGD

from

keras.layers

import

Dense

,

Input

from

keras.initializers

import

VarianceScaling

from

keras.engine.topology

import

Layer

,

InputSpec

from

scipy.misc

import

imread

from

sklearn.cluster

import

KMeans

from

sklearn.metrics

import

accuracy_score

,

normalized_mutual_info_score

Populating the interactive namespace from numpy and matplotlib

We will then set a seed value to restrict randomness.

# To stop potential randomness

seed

=

128

rng

=

np

.

random

.

RandomState

(

seed

)

Now set the working path of your data, so that you can access it later on.

root_dir

=

os

.

path

.

abspath

(

'.'

)

data_dir

=

os

.

path

.

join

(

root_dir

,

'data'

,

'mnist'

)

Read the train and test files.

train

=

pd

.

read_csv

(

os

.

path

.

join

(

data_dir

,

'train.csv'

))

test

=

pd

.

read_csv

(

os

.

path

.

join

(

data_dir

,

'test.csv'

))

train

.

head

()

filename label

0 0.png 4

1 1.png 9

2 2.png 1

3 3.png 7

4 4.png 3

Note that in this dataset, you have also been given the labels for each image. This is generally not seen in an unsupervised learning scenario. Here, we will use these labels to evaluate how our unsupervised learning models perform.

Now let us plot an image to view what our data looks like.

img_name

=

rng

.

choice

(

train

.

filename

)

filepath

=

os

.

path

.

join

(

data_dir

,

'train'

,

img_name

)

img

=

imread

(

filepath

,

flatten

=

True

)

pylab

.

imshow

(

img

,

cmap

=

'gray'

)

pylab

.

axis

(

'off'

)

pylab

.

show

()

We will then read all the images and store them in a numpy array to create a train and test file.

temp

=

[]

for

img_name

in

train

.

filename

:

image_path

=

os

.

path

.

join

(

data_dir

,

'train'

,

img_name

)

img

=

imread

(

image_path

,

flatten

=

True

)

img

=

img

.

astype

(

'float32'

)

temp

.

append

(

img

)

train_x

=

np

.

stack

(

temp

)

train_x

/=

255.0

train_x

=

train_x

.

reshape

(

-

1

,

784

)

.

astype

(

'float32'

)

temp

=

[]

for

img_name

in

test

.

filename

:

image_path

=

os

.

path

.

join

(

data_dir

,

'test'

,

img_name

)

img

=

imread

(

image_path

,

flatten

=

True

)

img

=

img

.

astype

(

'float32'

)

temp

.

append

(

img

)

test_x

=

np

.

stack

(

temp

)

test_x

/=

255.0

test_x

=

test_x

.

reshape

(

-

1

,

784

)

.

astype

(

'float32'

)

train_y

=

train

.

label

.

values

 We will then divide our training data into train and validation set.

split_size

=

int

(

train_x

.

shape

[

0

]

*

0.7

)

train_x

,

val_x

=

train_x

[:

split_size

],

train_x

[

split_size

:]

train_y

,

val_y

=

train_y

[:

split_size

],

train_y

[

split_size

:]

We will first apply K-Means directly to our image and divide it into 10 clusters.

km

=

KMeans

(

n_jobs

=-

1

,

n_clusters

=

10

,

n_init

=

20

)

km

.

fit

(

train_x

)

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300, n_clusters=10, n_init=20, n_jobs=-1, precompute_distances='auto', random_state=None, tol=0.0001, verbose=0)

 Now that we have trained our model, let’s see how it performs on the validation set.

pred

=

km

.

predict

(

val_x

)

We will use Normalized Mutual Information (NMI) score to evaluate our model.

Mutual information is a symmetric measure for the degree of dependency between the clustering and the manual classification. It is based on the notion of cluster purity pi, which measures the quality of a single cluster Ci, the largest number of objects in cluster Ci which Ci has in common with a manual class Mj, having compared Ci to all manual classes in M. Because NMI is normalized, we can use it to compare clusterings with different numbers of clusters.

The formula for NMI is:

Source: Slideshare

normalized_mutual_info_score

(

val_y

,

pred

)

0.4978202313979692

Now instead of directly applying K-Means on the problem, we will first use an autoencoder to decrease the dimensionality of the data and extract useful information. This will then pass on the information to the K-Means algorithm.

# this is our input placeholder

input_img

=

Input

(

shape

=

(

784

,))

# "encoded" is the encoded representation of the input

encoded

=

Dense

(

500

,

activation

=

'relu'

)(

input_img

)

encoded

=

Dense

(

500

,

activation

=

'relu'

)(

encoded

)

encoded

=

Dense

(

2000

,

activation

=

'relu'

)(

encoded

)

encoded

=

Dense

(

10

,

activation

=

'sigmoid'

)(

encoded

)

# "decoded" is the lossy reconstruction of the input

decoded

=

Dense

(

2000

,

activation

=

'relu'

)(

encoded

)

decoded

=

Dense

(

500

,

activation

=

'relu'

)(

decoded

)

decoded

=

Dense

(

500

,

activation

=

'relu'

)(

decoded

)

decoded

=

Dense

(

784

)(

decoded

)

# this model maps an input to its reconstruction

autoencoder

=

Model

(

input_img

,

decoded

)

autoencoder

.

summary

()

Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 784) 0 _________________________________________________________________ dense_2 (Dense) (None, 500) 392500 _________________________________________________________________ dense_3 (Dense) (None, 500) 250500 _________________________________________________________________ dense_4 (Dense) (None, 2000) 1002000 _________________________________________________________________ dense_5 (Dense) (None, 10) 20010 _________________________________________________________________ dense_6 (Dense) (None, 2000) 22000 _________________________________________________________________ dense_7 (Dense) (None, 500) 1000500 _________________________________________________________________ dense_8 (Dense) (None, 500) 250500 _________________________________________________________________ dense_9 (Dense) (None, 784) 392784 ================================================================= Total params: 3,330,794 Trainable params: 3,330,794 Non-trainable params: 0 _________________________________________________________________

# this model maps an input to its encoded representation

encoder

=

Model

(

input_img

,

encoded

)

autoencoder

.

compile

(

optimizer

=

'adam'

,

loss

=

'mse'

)

Now let us train our autoencoder model.

train_history

=

autoencoder

.

fit

(

train_x

,

train_x

epochs

=

500

batch_size

=

2048

validation_data

=

(

val_x

,

val_x

))

Train on 34300 samples, validate on 14700 samples Epoch 1/500 34300/34300 [==============================] - 2s 60us/step - loss: 0.0805 - val_loss: 0.0666 ... Epoch 494/500 34300/34300 [==============================] - 0s 11us/step - loss: 0.0103 - val_loss: 0.0138 Epoch 495/500 34300/34300 [==============================] - 0s 10us/step - loss: 0.0103 - val_loss: 0.0138 Epoch 496/500 34300/34300 [==============================] - 0s 11us/step - loss: 0.0103 - val_loss: 0.0138 Epoch 497/500 34300/34300 [==============================] - 0s 11us/step - loss: 0.0103 - val_loss: 0.0139 Epoch 498/500 34300/34300 [==============================] - 0s 11us/step - loss: 0.0103 - val_loss: 0.0137 Epoch 499/500 34300/34300 [==============================] - 0s 11us/step - loss: 0.0103 - val_loss: 0.0139 Epoch 500/500 34300/34300 [==============================] - 0s 11us/step - loss: 0.0104 - val_loss: 0.0138

pred_auto_train

=

encoder

.

predict

(

train_x

)

pred_auto

=

encoder

.

predict

(

val_x

)

km

.

fit

(

pred_auto_train

)

pred

=

km

.

predict

(

pred_auto

)

normalized_mutual_info_score

(

val_y

,

pred

)

0.7435578557037037

We see that our combined Autoencoder and K-Means model performs better than an individual K-Means model.

Finally, we will see the implementation of a state-of-the-art model – known as DEC algorithm. This algorithm trains both clustering and autoencoder models to get better performance. You can go through this paper to get a better perspective – Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. ICML 2023.

"""

Keras implementation for Deep Embedded Clustering (DEC) algorithm:

Original Author:

Xifeng Guo. 2023.1.30

"""

def

autoencoder

(

dims

,

act

=

'relu'

,

init

=

'glorot_uniform'

):

"""

Fully connected auto-encoder model, symmetric.

Arguments:

dims: list of number of units in each layer of encoder. dims[0] is input dim, dims[-1] is units in hidden layer.

The decoder is symmetric with encoder. So number of layers of the auto-encoder is 2*len(dims)-1

act: activation, not applied to Input, Hidden and Output layers

return:

(ae_model, encoder_model), Model of autoencoder and model of encoder

"""

n_stacks

=

len

(

dims

)

-

1

# input

x

=

Input

(

shape

=

(

dims

[

0

],),

name

=

'input'

)

h

=

x

# internal layers in encoder

for

i

in

range

(

n_stacks

-

1

):

h

=

Dense

(

dims

[

i

+

1

],

activation

=

act

,

kernel_initializer

=

init

,

name

=

'encoder_

%d

'

%

i

)(

h

)

# hidden layer

h

=

Dense

(

dims

[

-

1

],

kernel_initializer

=

init

,

name

=

'encoder_

%d

'

%

(

n_stacks

-

1

))(

h

)

# hidden layer, features are extracted from here

y

=

h

# internal layers in decoder

for

i

in

range

(

n_stacks

-

1

,

0

,

-

1

):

y

=

Dense

(

dims

[

i

],

activation

=

act

,

kernel_initializer

=

init

,

name

=

'decoder_

%d

'

%

i

)(

y

)

# output

y

=

Dense

(

dims

[

0

],

kernel_initializer

=

init

,

name

=

'decoder_0'

)(

y

)

return

Model

(

inputs

=

x

,

outputs

=

y

,

name

=

'AE'

),

Model

(

inputs

=

x

,

outputs

=

h

,

name

=

'encoder'

)

class

ClusteringLayer

(

Layer

):

"""

Clustering layer converts input sample (feature) to soft label, i.e. a vector that represents the probability of the

sample belonging to each cluster. The probability is calculated with student's t-distribution.

# Example

```

model.add(ClusteringLayer(n_clusters=10))

```

# Arguments

n_clusters: number of clusters.

weights: list of Numpy array with shape `(n_clusters, n_features)` witch represents the initial cluster centers.

alpha: parameter in Student's t-distribution. Default to 1.0.

# Input shape

2D tensor with shape: `(n_samples, n_features)`.

# Output shape

2D tensor with shape: `(n_samples, n_clusters)`.

"""

def

__init__

(

self

,

n_clusters

,

weights

=

None

,

alpha

=

1.0

,

**

kwargs

):

if

'input_shape'

not

in

kwargs

and

'input_dim'

in

kwargs

:

kwargs

[

'input_shape'

]

=

(

kwargs

.

pop

(

'input_dim'

),)

super

(

ClusteringLayer

,

self

)

.

__init__

(

**

kwargs

)

self

.

n_clusters

=

n_clusters

self

.

alpha

=

alpha

self

.

initial_weights

=

weights

self

.

input_spec

=

InputSpec

(

ndim

=

2

)

def

build

(

self

,

input_shape

):

assert

len

(

input_shape

)

==

2

input_dim

=

input_shape

[

1

]

self

.

input_spec

=

InputSpec

(

dtype

=

K

.

floatx

(),

shape

=

(

None

,

input_dim

))

self

.

clusters

=

self

.

add_weight

((

self

.

n_clusters

,

input_dim

),

initializer

=

'glorot_uniform'

,

name

=

'clusters'

)

if

self

.

initial_weights

is

not

None

:

self

.

set_weights

(

self

.

initial_weights

)

del

self

.

initial_weights

self

.

built

=

True

def

call

(

self

,

inputs

,

**

kwargs

):

""" student t-distribution, as same as used in t-SNE algorithm.

q_ij = 1/(1+dist(x_i, u_j)^2), then normalize it.

Arguments:

inputs: the variable containing data, shape=(n_samples, n_features)

Return:

q: student's t-distribution, or soft labels for each sample. shape=(n_samples, n_clusters)

"""

q

=

1.0

/

(

1.0

+

(

K

.

sum

(

K

.

square

(

K

.

expand_dims

(

inputs

,

axis

=

1

)

-

self

.

clusters

),

axis

=

2

)

/

self

.

alpha

))

q

**=

(

self

.

alpha

+

1.0

)

/

2.0

q

=

K

.

transpose

(

K

.

transpose

(

q

)

/

K

.

sum

(

q

,

axis

=

1

))

return

q

def

compute_output_shape

(

self

,

input_shape

):

assert

input_shape

and

len

(

input_shape

)

==

2

return

input_shape

[

0

],

self

.

n_clusters

def

get_config

(

self

):

config

=

{

'n_clusters'

:

self

.

n_clusters

}

base_config

=

super

(

ClusteringLayer

,

self

)

.

get_config

()

return

dict

(

list

(

base_config

.

items

())

+

list

(

config

.

items

()))

class

DEC

(

object

):

def

__init__

(

self

,

dims

,

n_clusters

=

10

,

alpha

=

1.0

,

init

=

'glorot_uniform'

):

super

(

DEC

,

self

)

.

__init__

()

self

.

dims

=

dims

self

.

input_dim

=

dims

[

0

]

self

.

n_stacks

=

len

(

self

.

dims

)

-

1

self

.

n_clusters

=

n_clusters

self

.

alpha

=

alpha

self

.

autoencoder

,

self

.

encoder

=

autoencoder

(

self

.

dims

,

init

=

init

)

# prepare DEC model

clustering_layer

=

ClusteringLayer

(

self

.

n_clusters

,

name

=

'clustering'

)(

self

.

encoder

.

output

)

self

.

model

=

Model

(

inputs

=

self

.

encoder

.

input

,

outputs

=

clustering_layer

)

def

pretrain

(

self

,

x

,

y

=

None

,

optimizer

=

'adam'

,

epochs

=

200

,

batch_size

=

256

,

save_dir

=

'results/temp'

):

print

(

'...Pretraining...'

)

self

.

autoencoder

.

compile

(

optimizer

=

optimizer

,

loss

=

'mse'

)

csv_logger

=

callbacks

.

CSVLogger

(

save_dir

+

'/pretrain_log.csv'

)

cb

=

[

csv_logger

]

if

y

is

not

None

:

class

PrintACC

(

callbacks

.

Callback

):

def

__init__

(

self

,

x

,

y

):

self

.

x

=

x

self

.

y

=

y

super

(

PrintACC

,

self

)

.

__init__

()

def

on_epoch_end

(

self

,

epoch

,

logs

=

None

):

if

epoch

%

int

(

epochs

/

10

)

!=

0

:

return

feature_model

=

Model

(

self

.

model

.

input

,

self

.

model

.

get_layer

(

'encoder_

%d

'

%

(

int

(

len

(

self

.

model

.

layers

)

/

2

)

-

1

))

.

output

)

features

=

feature_model

.

predict

(

self

.

x

)

km

=

KMeans

(

n_clusters

=

len

(

np

.

unique

(

self

.

y

)),

n_init

=

20

,

n_jobs

=

4

)

y_pred

=

km

.

fit_predict

(

features

)

# print()

%

(

metrics

.

acc

(

self

.

y

,

y_pred

),

metrics

.

nmi

(

self

.

y

,

y_pred

)))

cb

.

append

(

PrintACC

(

x

,

y

))

# begin pretraining

t0

=

time

()

self

.

autoencoder

.

fit

(

x

,

x

,

batch_size

=

batch_size

,

epochs

=

epochs

,

callbacks

=

cb

)

print

(

'Pretraining time: '

,

time

()

-

t0

)

self

.

autoencoder

.

save_weights

(

save_dir

+

'/ae_weights.h5'

)

print

(

'Pretrained weights are saved to

%s

/ae_weights.h5'

%

save_dir

)

self

.

pretrained

=

True

def

load_weights

(

self

,

weights

):

# load weights of DEC model

self

.

model

.

load_weights

(

weights

)

def

extract_features

(

self

,

x

):

return

self

.

encoder

.

predict

(

x

)

def

predict

(

self

,

x

):

# predict cluster labels using the output of clustering layer

q

=

self

.

model

.

predict

(

x

,

verbose

=

0

)

return

q

.

argmax

(

1

)

@staticmethod

def

target_distribution

(

q

):

weight

=

q

**

2

/

q

.

sum

(

0

)

return

(

weight

.

T

/

weight

.

sum

(

1

))

.

T

def

compile

(

self

,

optimizer

=

'sgd'

,

loss

=

'kld'

):

self

.

model

.

compile

(

optimizer

=

optimizer

,

loss

=

loss

)

def

fit

(

self

,

x

,

y

=

None

,

maxiter

=

2e4

,

batch_size

=

256

,

tol

=

1e-3

,

update_interval

=

140

,

save_dir

=

'./results/temp'

):

print

(

'Update interval'

,

update_interval

)

save_interval

=

x

.

shape

[

0

]

/

batch_size

*

5

# 5 epochs

print

(

'Save interval'

,

save_interval

)

# Step 1: initialize cluster centers using k-means

t1

=

time

()

print

(

'Initializing cluster centers with k-means.'

)

kmeans

=

KMeans

(

n_clusters

=

self

.

n_clusters

,

n_init

=

20

)

y_pred

=

kmeans

.

fit_predict

(

self

.

encoder

.

predict

(

x

))

y_pred_last

=

np

.

copy

(

y_pred

)

self

.

model

.

get_layer

(

name

=

'clustering'

)

.

set_weights

([

kmeans

.

cluster_centers_

])

# Step 2: deep clustering

# logging file

import

csv

logfile

=

open

(

save_dir

+

'/dec_log.csv'

,

'w'

)

logwriter

=

csv

.

DictWriter

(

logfile

,

fieldnames

=

[

'iter'

,

'acc'

,

'nmi'

,

'ari'

,

'loss'

])

logwriter

.

writeheader

()

loss

=

0

index

=

0

index_array

=

np

.

arange

(

x

.

shape

[

0

])

for

ite

in

range

(

int

(

maxiter

)):

if

ite

%

update_interval

==

0

:

q

=

self

.

model

.

predict

(

x

,

verbose

=

0

)

p

=

self

.

target_distribution

(

q

)

# update the auxiliary target distribution p

# evaluate the clustering performance

y_pred

=

q

.

argmax

(

1

)

if

y

is

not

None

:

acc

=

np

.

round

(

metrics

.

acc

(

y

,

y_pred

),

5

)

nmi

=

np

.

round

(

metrics

.

nmi

(

y

,

y_pred

),

5

)

ari

=

np

.

round

(

metrics

.

ari

(

y

,

y_pred

),

5

)

loss

=

np

.

round

(

loss

,

5

)

logdict

=

dict

(

iter

=

ite

,

acc

=

acc

,

nmi

=

nmi

,

ari

=

ari

,

loss

=

loss

)

logwriter

.

writerow

(

logdict

)

print

(

'Iter

%d

: acc =

%.5f

, nmi =

%.5f

, ari =

%.5f

'

%

(

ite

,

acc

,

nmi

,

ari

),

' ; loss='

,

loss

)

# check stop criterion

delta_label

=

np

.

sum

(

y_pred

!=

y_pred_last

)

.

astype

(

np

.

float32

)

/

y_pred

.

shape

[

0

]

y_pred_last

=

np

.

copy

(

y_pred

)

print

(

'delta_label '

,

delta_label

,

'< tol '

,

tol

)

print

(

'Reached tolerance threshold. Stopping training.'

)

logfile

.

close

()

break

# train on batch

# if index == 0:

# np.random.shuffle(index_array)

idx

=

index_array

[

index

*

batch_size

:

min

((

index

+

1

)

*

batch_size

,

x

.

shape

[

0

])]

self

.

model

.

train_on_batch

(

x

=

x

[

idx

],

y

=

p

[

idx

])

index

=

index

+

1

if

(

index

+

1

)

*

batch_size

<=

x

.

shape

[

0

]

else

0

# save intermediate model

if

ite

%

save_interval

==

0

:

print

(

'saving model to:'

,

save_dir

+

'/DEC_model_'

+

str

(

ite

)

+

'.h5'

)

self

.

model

.

save_weights

(

save_dir

+

'/DEC_model_'

+

str

(

ite

)

+

'.h5'

)

ite

+=

1

# save the trained model

logfile

.

close

()

print

(

'saving model to:'

,

save_dir

+

'/DEC_model_final.h5'

)

self

.

model

.

save_weights

(

save_dir

+

'/DEC_model_final.h5'

)

return

y_pred

# setting the hyper parameters

init

=

'glorot_uniform'

pretrain_optimizer

=

'adam'

dataset

=

'mnist'

batch_size

=

2048

maxiter

=

2e4

tol

=

0.001

save_dir

=

'results'

import

os

if

not

os

.

path

.

exists

(

save_dir

):

os

.

makedirs

(

save_dir

)

update_interval

=

200

pretrain_epochs

=

500

init

=

VarianceScaling

(

scale

=

1.

/

3.

,

mode

=

'fan_in'

,

distribution

=

'uniform'

)

# [-limit, limit], limit=sqrt(1./fan_in)

#pretrain_optimizer = SGD(lr=1, momentum=0.9)

# prepare the DEC model

dec

=

DEC

(

dims

=

[

train_x

.

shape

[

-

1

],

500

,

500

,

2000

,

10

],

n_clusters

=

10

,

init

=

init

)

dec

.

pretrain

(

x

=

train_x

,

y

=

train_y

,

optimizer

=

pretrain_optimizer

,

epochs

=

pretrain_epochs

,

batch_size

=

batch_size

,

save_dir

=

save_dir

)

...Pretraining... Epoch 1/500 ... Epoch 494/500 34300/34300 [==============================] - 0s 8us/step - loss: 0.0086 Epoch 495/500 34300/34300 [==============================] - 0s 8us/step - loss: 0.0086 Epoch 496/500 34300/34300 [==============================] - 0s 9us/step - loss: 0.0085 Epoch 497/500 34300/34300 [==============================] - 0s 9us/step - loss: 0.0085 Epoch 498/500 34300/34300 [==============================] - 0s 9us/step - loss: 0.0086 Epoch 499/500 34300/34300 [==============================] - 0s 8us/step - loss: 0.0085 Epoch 500/500 34300/34300 [==============================] - 0s 8us/step - loss: 0.0085 Pretraining time: 183.56538462638855 Pretrained weights are saved to results/ae_weights.h5

dec

.

model

.

summary

()

Layer (type) Output Shape Param # ================================================================= input (InputLayer) (None, 784) 0 _________________________________________________________________ encoder_0 (Dense) (None, 500) 392500 _________________________________________________________________ encoder_1 (Dense) (None, 500) 250500 _________________________________________________________________ encoder_2 (Dense) (None, 2000) 1002000 _________________________________________________________________ encoder_3 (Dense) (None, 10) 20010 _________________________________________________________________ clustering (ClusteringLayer) (None, 10) 100 ================================================================= Total params: 1,665,110 Trainable params: 1,665,110 Non-trainable params: 0 _________________________________________________________________

dec

.

compile

(

optimizer

=

SGD

(

0.01

,

0.9

),

loss

=

'kld'

)

y_pred

=

dec

.

fit

(

train_x

,

y

=

train_y

,

tol

=

tol

,

maxiter

=

maxiter

,

batch_size

=

batch_size

,

update_interval

=

update_interval

,

save_dir

=

save_dir

)

... Iter 3400: acc = 0.79621, nmi = 0.77514, ari = 0.71296 ; loss= 0 delta_label 0.0007288629737609329 < tol 0.001 Reached tolerance threshold. Stopping training. saving model to: results/DEC_model_final.h5

pred_val

=

dec

.

predict

(

val_x

)

normalized_mutual_info_score

(

val_y

,

pred_val

)

0.7651617433541817

As you can see, this gives us the best performance as compared to the methods we have covered above. Researchers have seen that further training of a DEC model can give us even higher performances (NMI as high as 87 to be exact!).

End Notes

In this article, we saw an overview of the concept of unsupervised deep learning with an intuitive case study. In the next series of articles, we will get into the details of how we can use these techniques to solve more real life problems.

Related

How John Glenn Landed After Becoming The First American To Orbit The Earth

On Monday, July 18, 2023, former NASA astronaut, U.S. senator, and military pilot John Glenn celebrated his 95th birthday. To celebrate the occasion, Popular Science shares below the first time Glenn was ever mentioned in our magazine. The article, “How To Get Down From 100 Miles Up,” originally penned by Wesley S. Griswold, looks at the technology that was required for Glenn, the first American to orbit the earth, to safely return from his mission.

WHEN a U. S. astronaut returns from orbit, he must slow down from 17,500 miles an hour to about 20.

While doing this, his spacecraft plunges forward a few thousand miles and downward around 100 miles. It all takes an astonishingly short time—in Lt. Col. John Glenn’s case, 23 minutes.

Step 1

A kick from backward-firing rockets begins an elaborate series of maneuvers that ends when a manned capsule splashes gently into the sea. Three retrorockets, fired one at a time, reduce speed of capsule to divert it from orbit and start it on its way down.

How does anybody make this most spectacular of human falls and come out alive and unharmed? The accompanying drawings show the sequence of actions that makes the awesome descent practical and safe.

The astronaut’s homeward dive begins when he fires three retrorockets attached to the base of his little black spacecraft. Timing is of the utmost importance. He’s traveling five miles per second. An error of one second in firing the retros will land him five miles off target.

As he zooms toward California on his final swing across the Pacific, he makes a time check with Point Arguello, Calif. He then sets a timer to trigger the retrorockets, and braces himself for a hair-raising deceleration.

(If he should be disabled or unconscious, Point Arguello can and will fire the retrorockets by radio command.)

Remember, as this crucial instant nears, that the 12-foot, bell-shaped spacecraft is hurtling along with its six-foot-wide bottom foremost. Its occupant is riding backward, facing the craft’s 28-inch neck. Packed inside one half of that neck lies the eventual means of saving his own.

The retrorockets fire in sequence: at five-second intervals. Their effect is that of applying enormously powerful brakes. When this happened to Col. Glenn’s “Friendship 7” as it rushed toward California from out of the west, he exclaimed into his mike, “It feels like I’m going clear back to Hawaii.”

The dramatically slowed spacecraft now starts descending in a long, flat, continent-spanning arc that ends with a splash in the Atlantic Ocean, off the Bahamas. Normally the retrorockets, in a package attached to the capsule by titanium straps, are dropped off as soon as they have been fired.

The fact that the craft is falling blunt-end forward makes it an aerodynamically rough object, and in itself helps to slow the descent.

Meanwhile, as it plunges deeper through the ever-denser layers of earth’s atmosphere, the friction of its passage threatens to consume it.

The fiber-glass-lined heat shield, a false bottom, turns cherry-red as its temperature rises to 3,000 degrees. Its outer coating peels and burns away. The spacecraft’s own bottom, despite its protective shield, heats to 350-400 degrees. The air inside the craft warms noticeably—in Glenn’s case, to 108 degrees. The astronaut himself, however, remains comfortable in his air-conditioned spacesuit.

Steps 2 and 3

2. As capsule hits earth’s air, its outside briefly glows red-hot. Heat shield protects the astronaut within. 3. At 21,000-foot altitude, a mortar ejects small droque chute (to slow capsule and steady it) and cloud of aluminum foil (to aid tracking the astronaut by radio).

Braking. By the time the craft has fallen from an altitude of more than 500,000 feet to 21,000 feet, it has slowed from its fantastic orbital speed to around 300 m.p.h. The instant it reaches 21,000 feet, a remarkable landing system, developed and built by Northrop’s Ventura Division, goes into action.

At that altitude, a sensitive aneroid switch triggers a tiny mortar. A charge fires a small white drogue parachute out the top of the spacecraft. It’s a ribbon chute. It looks like a bunch of rectanguLar holes stitched together. If it weren’t full of holes, it would be as useless as an umbrella in a hurricane. The drogue is on a 30-foot tether that places it out of the turbulent wake of the plunging spacecraft. It’s only a six-foot parachute at best. The fierce tug on the lines puckers its mouth to four-foot width.

This seems a puny deterrent to speed and gravity, but it steadies the craft and slows it to about 185 m.p.h.

At the moment the drogue chute is flung out, a four-ounce package of shredded aluminum foil is tossed into the air with it. This bursts open and forms a target that looks as big as an airplane to search radars. Now they know about where the spacecraft will come down.

Releasing the main chute. At 10,000 feet, the antenna cover, which crowns the neck of the craft like the cap on a bottle, is automatically released. The drogue chute is anchored to the cover, which, in turn, is attached to the big main parachute. Drogue and antenna cover, together, yank out the main chute. Then they disconnect themselves and fall away.

The main chute, a 63-foot-diameter orange-and-white-striped job, streams out of its packing in reefed condition. It might be torn to pieces if unfurled too soon. For four precisely timed seconds, its mouth is not allowed to open more than 10 feet.

Then cutters attached to opposite sides of the big chute automatically knife through the reefing lines. The parachute’s mouth widens abruptly to a 42-foot opening.

“The prettiest ol’ sight you ever saw in your life,” Col. Glenn called it. This great nylon wind catcher hauls back on the spacecraft until it is dropping toward the sea at a gentle 30 feet a second, or 20 1/2 m.p.h.

(If the drogue should fail to open, a red light would warn the astronaut. He then could release the main chute. In the unlikely event that it, too, should fail, he has an identical reserve chute to rely on.)

Steps 4 and 5

4. At 10,000 feet, main chute is yanked out by drogue, which falls away. Depth charge drops to sea to signal where capsule will descend. 5. Four seconds later, cutters slash reefing lines, and main chute fully unfurls.

Catching the Navy’s ears. As the main chute is pulled out at 10,000 feet, a 3 1/2-pound depth charge, hardly bigger than a man’s fist and wrist, plummets from the spacecraft to the ocean. It plunges 4,000 feet below the surface, and there explodes. Listening sonars on Navy vessels near and far hear the sound and promptly fix its location.

At 8,000 feet above the ocean, the spacecraft’s heat shield drops four feet, pulling down a circular, silicone-coated, glass curtain, nearly its own size—a curtain with diagonal rows of two-inch holes in it.

This is the spacecraft’s impact bag. Air fills the bag on the way down but can’t get out as easily as it got in. Momentarily, as the heat shield strikes the waves, the bag acts like an air cushion.

Steps 6 and 7

6. At 8,000 feet, heat shield drops four feet and capsule descends with impact bag, beneath it, extended to absorb shock of striking sea. 7. Impact in sea automatically detaches chute and releases dye marker. Whip antenna emerges, and radio signals guide searchers to capsule.

On the water. An inertia switch in the spacecraft senses the landing shock and cuts loose the main chute. If left attached, it might overturn the craft and drown its occupant.

As the spacecraft gently bobs in the ocean swells, the impact bag fills with water. Inside it are powdered dyes–bright green to attract air searchers, black to repel sharks. These dissolve and flood out through the holes in the bag.

The astronaut jettisons his reserve chute. A whip antenna pokes itself out of the half-empty neck of the spacecraft and starts broadcasting sea-air-rescue and homing signals.

Then it’s up to the Navy.

The Challenge Of Vanishing/Exploding Gradients In Deep Neural Networks

This article was published as a part of the Data Science Blogathon

network and the techniques that can be used to cleverly get past this impediment.

Table of Contents 

1. A Glimpse of the Backpropagation Algorithm

2. Understanding the problems

Vanishing gradients 

Exploding gradients

Why do gradients even vanish/explode?

How do we come to know if our model is suffering from exploding/vanishing gradients problem?

3. Solutions

Proper weight initialization

Using non-saturating activation functions

Batch normalization

Gradient Clipping

4. Endnote

A glimpse of the Backpropagation Algorithm

We know that the backpropagation algorithm is the heart of neural network training. Let’s have a glimpse over this algorithm that has proved to be a harbinger in the evolution as well as the revolution of Deep Learning.

Sour

After propagating the input features forward to the output layer through the various hidden layers consisting of different/same activation functions, we come up with a predicted probability of a sample belonging to the positive class ( generally, for classification tasks).

Now, the backpropagation algorithm propagates backward from the output layer to the input layer calculating the error gradients on the way.

Once the computation for gradients of the cost function w.r.t each parameter (weights and biases) in the neural network is done, the algorithm takes a gradient descent step towards the minimum to update the value of each parameter in the network using these gradients.

 

Understanding the Problems Vanishing – Exploding –

On the contrary, in some cases, the gradients keep on getting larger and larger as the backpropagation algorithm progresses. This, in turn, causes very large weight updates and causes the gradient descent to diverge. This is known as the exploding gradients problem.

Why do the gradients even vanish/explode?

Certain activation functions, like the logistic function (sigmoid), have a very huge difference between the variance of their inputs and the outputs. In simpler words, they shrink and transform a larger input space into a smaller output space that lies between the range of [0,1].

Image source: Google Images

Observing the above graph of the Sigmoid function, we can see that for larger inputs (negative or positive), it saturates at 0 or 1 with a derivative very close to zero. Thus, when the backpropagation algorithm chips in, it virtually has no gradients to propagate backward in the network, and whatever little residual gradients exist keeps on diluting as the algorithm progresses down through the top layers. So, this leaves nothing for the lower layers.

Image Source: Google Images

How to know if our model is suffering from the Exploding/Vanishing gradient problem?

Following are some signs that can indicate that our gradients are exploding/vanishing :

Exploding

Vanishing

There is an exponential growth in the model parameters.

The parameters of the higher layers change significantly whereas the parameters of lower layers would not change much (or not at all).

The model weights may become NaN during training.

The model weights may become 0 during training.

The model experiences  avalanche learning.

The model learns very slowly and perhaps the training stagnates at a very early stage just after a few iterations.

Certainly, neither do we want our signal to explode or saturate nor do we want it to die out. The signal needs to flow properly both in the forward direction when making predictions as well as in the backward direction while calculating gradients.

  Solutions 

Now that we are well aware of the vanishing/exploding gradients problems, it’s time to learn some techniques that can be used to fix the respective problems.

1. Proper Weight Initialization 

In their paper, researchers Xavier Glorot, Antoine Bordes, and Yoshua Bengio proposed a way to remarkably alleviate this problem.

For the proper flow of the signal, the authors argue that:

The variance of outputs of each layer should be equal to the variance of its inputs.

The gradients should have equal variance before and after flowing through a layer in the reverse direction.

Although it is impossible for both conditions to hold for any layer in the network until and unless the number of inputs to the layer ( fanin ) is equal to the number of neurons in the layer ( fanout ), but they proposed a well-proven compromise that works incredibly well in practice: randomly initialize the connection weights for each layer in the network as described in the following equation which is popularly known as Xavier initialization (after the author’s first name) or Glorot initialization (after his last name).

where  fanavg = ( fanin + fanout ) / 2

Normal distribution with mean 0 and variance σ2 = 1/ fanavg

Or a uniform distribution between -r  and +r , with r = sqrt( 3 / fanavg )         

Following are some more very popular weight initialization strategies for different activation functions, they only differ by the scale of variance and by the usage of either fanavg or fanin 

 

for uniform distribution, calculate r as: r = sqrt( 3*σ2 )

Source : A book from Aurelien Geron

Using the above initialization strategies can significantly speed up the training and increase the odds of gradient descent converging at a lower generalization error. 

Wait, but how do we put these strategies into code ??

Relax! we will not need to hardcode anything, Keras does it for us.

Keras uses Xavier’s initialization strategy with uniform distribution.

If we wish to use a different strategy than the default one, this can be done using the kernel_initializer parameter while creating the layer. For example :

keras.layer.Dense(25, activation = "relu", kernel_initializer="he_normal")

or

keras.layer.Dense(25, activation = "relu", kernel_initializer="he_uniform")

 If we wish to use  use the initialization based on fanavg rather than fanin , we can use the VarianceScaling initializer like this :

he_avg_init = keras.initializers.VarianceScaling(scale=2., mode='fan_avg', distribution='uniform') keras.layers.Dense(20, activation="sigmoid", kernel_initializer=he_avg_init)   2. Using Non-saturating Activation Functions

In an earlier section, while studying the nature of sigmoid activation function, we observed that its nature of saturating for larger inputs (negative or positive) came out to be a major reason behind the vanishing of gradients thus making it non-recommendable to use in the hidden layers of the network.

So to tackle the issue regarding the saturation of activation functions like sigmoid and tanh, we must use some other non-saturating functions like ReLu and its alternatives.

      

ReLU ( Rectified Linear Unit )

Source: Google Images

Relu(z) = max(0,z)

Outputs 0 for any negative input.

Range: [0, infinity]

Read about the dying relus problem in detail here.

Some popular alternative functions of the ReLU that mitigates the problem of vanishing gradients when used as activation for the intermediate layers of the network  are LReLU, PReLU, ELU, SELU :

LReLU (Leaky ReLU)

Source: Google images

LeakyReLUα(z) = max(αz, z)

The amount of “leak” is controlled by the hyperparameter α, it is the slope of the function for z < 0.

The smaller slope for the leak ensures that the neurons powered by leaky Relu never die; although they might venture into a state of coma for a long training phase they always have a chance to eventually wake up.

α can also be trained, that is, the model learns the value of α during training. This variant wherein α is now considered a parameter rather than a hyperparameter is called parametric leaky ReLu (PReLU).

ELU (Exponential Linear Unit)

For z < 0, it takes on negative values which allow the unit to have an average output closer to 0 thus alleviating the vanishing gradient problem

For z < 0, the gradients are non zero. This avoids the dead neurons problem.

For α = 1, the function is smooth everywhere, this speeds up the gradient descent since it does not bounce right and left around z=0.

A scaled version of this function ( SELU: Scaled ELU ) is also used very often in Deep Learning.

3. Batch Normalization

 

Using He initialization along with any variant of the ReLU activation function can significantly reduce the chances of vanishing/exploding problems at the beginning. However, it does not guarantee that the problem won’t reappear during training.

In 2023, Sergey Ioffe and Christian Szegedy proposed a paper in which they introduced a technique known as Batch Normalization to address the problem of vanishing/exploding gradients.

The Following key points explain the intuition behind BN and how it works:

It consists of adding an operation in the model just before or after the activation function of each hidden layer.

This operation simply zero-centers and normalizes each input, then scales and shifts the result using two new parameter vectors per layer: one for scaling, the other for shifting.

In other words, the operation lets the model learn the optimal scale and mean of each of the layer’s inputs.

To zero-center and normalize the inputs, the algorithm needs to estimate each input’s mean and standard deviation.

It does so by evaluating the mean and standard deviation of the input over the current mini-batch (hence the name “Batch Normalization”).

model = keras.models.Sequential([keras.layers.Flatten(input_shape=[28, 28]), keras.layers.BatchNormalization(), keras.layers.Dense(300, activation="relu"), keras.layers.BatchNormalization(), keras.layers.Dense(100, activation="relu"), keras.layers.BatchNormalization(), keras.layers.Dense(10, activation="softmax")])

we just added batch normalization after each layer ( dataset : FMNIST)

model.summary() 4. Gradient Clipping

Another popular technique to mitigate the exploding gradients problem is to clip the gradients during backpropagation so that they never exceed some threshold. This is called Gradient Clipping.

This optimizer will clip every component of the gradient vector to a value between –1.0 and 1.0.

Meaning, all the partial derivatives of the loss w.r.t each  trainable parameter will be clipped between –1.0 and 1.0

optimizer = keras.optimizers.SGD(clipvalue = 1.0)

The threshold is a hyperparameter we can tune.

The orientation of the gradient vector may change due to this: for eg, let the original gradient vector be [0.9, 100.0] pointing mostly in the direction of the second axis, but once we clip it by some value, we get [0.9, 1.0] which now points somewhere around the diagonal between the two axes.

To ensure that the orientation remains intact even after clipping, we should clip by norm rather than by value.

optimizer = keras.optimizers.SGD(clipnorm = 1.0)

Now the whole gradient will be clipped if the threshold we picked is less than its ℓ2 norm. For eg: if clipnorm=1 then the vector [0.9, 100.0] will be clipped to [0.00899, 0.999995] , thus preserving its orientation.

ENDNOTE :

I hope the article would have helped you understand the problem of exploding/vanishing gradients thoroughly and you will be able to identify if by any chance your model suffers from it and you can easily tackle it and train your model efficiently.

Thanks for your time. HAPPY LEARNING :))

ABOUT ME:

I am a final year undergrad student pursuing a Bachelor of Engineering in Computer Engineering with a specialization in Artificial Intelligence. I am an enthusiastic self-learner who thrives to solve real-world problems using the power of mathematics and computer programming.

Feel free to connect (LinkedIn) with me, we can learn from each other and perhaps collaborate on some interesting projects too.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

 

Related

Update the detailed information about Physicists Probe The Deep Earth For A Fifth Fundamental Force on the Cattuongwedding.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!