Trending March 2024 # Data Visualization Guide For Multi # Suggested April 2024 # Top 7 Popular

You are reading the article Data Visualization Guide For Multi updated in March 2024 on the website Cattuongwedding.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 Data Visualization Guide For Multi

Viewing the Data

So, surprisingly no one it’s useful to view the data. Straight up by using head, we can see that this dataset is utilizing 0 to represent no value – unless some poor unfortunate soul has a skin thickness of 0.

If we want to do more than expect the data, we can use the described function we talked about in the previous section.

df.describe()

Output:

Scatter Matrix

Scatter Matrix is one of the best plots to determine the relationship (linear mostly) between the multiple variables of all the datasets; when you will see the linear graph between two or more variables that indicates the high correlation between those features, it can be either positive or negative correlation.

pd.plotting.scatter_matrix(df, figsize=(10, 10));

Output:

Inference: From the above plot, we can say that this plot alone is quite descriptive as it is showing us the linear relationship between all the variables in the dataset. For example, we can see that skin Thickness and BMI is sharing linear tendencies.

Note: Due to the big names of columns, we are facing a bit issue while reading the same though that can be improved (out of the scope of the article).

df2 = df.dropna() colors = df2["Outcome"].map(lambda x: "#44d9ff" if x else "#f95b4a") pd.plotting.scatter_matrix(df2, figsize=(10,10), color=colors);

Output:

Correlation Plots

Before going into a deep discussion with the correlation plot, we first need to understand the correlation and for that reason, we are using the pandas’ corr() method that will return the Pearson’s correlation coefficient between two data inputs. In a nutshell, these plots easily quantify which variables or attributes are correlated with each other.

df.corr()

Output:

sb.set(rc={'figure.figsize':(11,6)}) sb.heatmap(df.corr());

Output:

Inference: In a seaborn or matplotlib supported correlation plot, we can compare the higher and lower correlation between the variables using its color palette and scale. In the above graph, the lighter the color, the more the correlation and vice versa. There are some drawbacks in this plot which we will get rid of in the very next graph.

sb.heatmap(df.corr(), annot=True, cmap="viridis", fmt="0.2f");

Output:

Inference: Now one can see this is a symmetric matrix too. But it immediately allows us to point out the most correlated and anti-correlated attributes. Some might just be common sense – Pregnancies v Age for example – but some might give us a real insight into the data.

Here, we have also used some parameters like annot= True so that we can see correlated values and some formatting as well.

2D Histograms

2D Histograms are mainly used for image processing, showing the intensities of pixels at a certain position of the image. Similarly, we can also use it for other problem statements, where we need to analyze two or more variables as two-dimensional or three-dimensional histograms, which provide multi-dimensional Data.

For the rest of this section, we’re going to use a different dataset with more data.

Note: 2-D histograms are very useful when you have a lot of data. See here for the API.

df2 = pd.read_csv("height_weight.csv") df2.info() df2.describe()

Output:

plt.hist2d(df2["height"], df2["weight"], bins=20, cmap="magma") plt.xlabel("Height") plt.ylabel("Weight");

Output:

Inference: We have also worked with one-dimensional Histograms for multi-dimensional Data, but that is for univariate analysis now if we want to get the data distribution of more than one feature then we have to shift our focus to 2-D Histograms. In the above 2-D graph height and weight is plotted against each other, keeping the C-MAP as magma.

Contour plots

Bit hard to get information from the 2D histogram, isn’t it? Too much noise in the image. What if we try and contour diagram? We’ll have to bin the data ourselves.

Every alternative comes into the picture when the original one has some drawbacks, Similarly, in the case with 2-D histograms it becomes a bit hard to get the information from it as there is soo much noise in the graph. Hence now, we will go with a contour plot

Here is the resource that can help you deep dive, into this plot. The contour API is here.

hist, x_edge, y_edge = np.histogram2d(df2["height"], df2["weight"], bins=20) x_center = 0.5 * (x_edge[1:] + x_edge[:-1]) y_center = 0.5 * (y_edge[1:] + y_edge[:-1]) plt.contour(x_center, y_center, hist, levels=4) plt.xlabel("Height") plt.ylabel("Weight");

Output:

Inference: Now we can see that this contour plot which is way better than a complex and noisy 2-D Histogram as it shows the clear distribution between height and weight simultaneously. There is still room for improvement. If we will use the KDE plot from seaborn, then the same contours will be smoothened and more clearly informative.

Conclusion

From the very beginning of the article, we are primarily focussing on data visualization for the multi-dimensional data, and in this journey, we got through all the important graphs/plots that could derive business-related insights from the numeric data from multiple features all at once. In the last section, we will cover all these graphs in a nutshell.

Firstly we got introduced to a scatter matrix that shows us the relationship of every variable with the other one. Then using seaborn heat map is used to get a better approach to multivariable analysis.

Then came the 2-D histograms, where we can go with binary variable analysis, i.e., 2 variables can be simultaneously seen, and we can get insights from them.

At last, we got to know about the Contour plot, which helped us to get a better version of 2-D histograms as it removes the noise from the image and has a more clear interpretation.

Connect with me on LinkedIn for further discussion.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

You're reading Data Visualization Guide For Multi

Must Known Data Visualization Techniques For Data Science

This article was published as a part of the Data Science Blogathon

Introduction

In applied Statistics and Machine Learning, Data Visualization is one of the most important skills.

Data visualization provides an important suite of tools for identifying a qualitative understanding. This can be helpful when we try to explore the dataset and extract some information to know about a dataset and can help with identifying patterns, corrupt data, outliers, and much more.

If we have a little domain knowledge, then data visualizations can be used to express and identify key relationships in plots and charts that are more helpful to yourself and stakeholders than measures of association or significance.

In this article, we will be discussing some of the basic charts or plots that you can use to better understand and visualize your data.

Table of Contents

1. What is Data Visualization?

2. Benefits of Good Data Visualization

3. Different Types of Analysis for Data Visualization

4. Univariate Analysis Techniques for Data Visualization

Distribution Plot

Box and Whisker Plot

Violin Plot

5. Bivariate Analysis Techniques for Data Visualization

Line Plot

Bar Plot

Scatter Plot

What is Data Visualization?

Data visualization is defined as a graphical representation that contains the information and the data.

By using visual elements like charts, graphs, and maps, data visualization techniques provide an accessible way to see and understand trends, outliers, and patterns in data.

In modern days we have a lot of data in our hands i.e, in the world of Big Data, data visualization tools, and technologies are crucial to analyze massive amounts of information and make data-driven decisions.

It is used in many areas such as:

To model complex events.

Visualize phenomenons that cannot be observed directly, such as weather patterns, medical conditions, or mathematical relationships.

Benefits of Good Data Visualization

So, Data visualization is another technique of visual art that grabs our interest and keeps our main focus on the message captured with the help of eyes.

Whenever we visualize a chart, we quickly identify the trends and outliers present in the dataset.

The basic uses of the Data Visualization technique are as follows:

It is a powerful technique to explore the data with presentable and interpretable results.

In the data mining process, it acts as a primary step in the pre-processing portion.

It supports the data cleaning process by finding incorrect data and corrupted or missing values.

It also helps to construct and select variables, which means we have to determine which variable to include and discard in the analysis.

In the process of Data Reduction, it also plays a crucial role while combining the categories.

                                                      Image Source: Google Images

Different Types of Analysis for Data Visualization

Mainly, there are three different types of analysis for Data Visualization:

Univariate Analysis: In the univariate analysis, we will be using a single feature to analyze almost all of its properties.

Bivariate Analysis: When we compare the data between exactly 2 features then it is known as bivariate analysis.

Multivariate Analysis: In the multivariate analysis, we will be comparing more than 2 variables.

NOTE:

In this article, our main goal is to understand the following concepts:

How do find some inferences from the data visualization techniques?

In which condition, which technique is more useful than others?

We are not going to deep dive into the coding/implementation part of different techniques on a particular dataset but we try to find the answer to the above questions and understand only the snippet code with the help of sample plots for each of the data visualization techniques.

Now, let’s started with the different Data Visualization techniques:

 

Univariate Analysis Techniques for Data Visualization 1. Distribution Plot

It is one of the best univariate plots to know about the distribution of data.

When we want to analyze the impact on the target variable(output) with respect to an independent variable(input), we use distribution plots a lot.

This plot gives us a combination of both probability density functions(pdf) and histogram in a single plot.

Implementation:

The distribution plot is present in the Seaborn package.

The code snippet is as follows:

Python Code:



Some conclusions inferred from the above distribution plot:

From the above distribution plot we can conclude the following observations:

We have observed that we created a distribution plot on the feature ‘Age’(input variable) and we used different colors for the Survival status(output variable) as it is the class to be predicted.

There is a huge overlapping area between the PDFs for different combinations.

In this plot, the sharp block-like structures are called histograms, and the smoothed curve is known as the Probability density function(PDF).

NOTE: 

The Probability density function(PDF) of a curve can help us to capture the underlying distribution of that feature which is one major takeaway from Data visualization or Exploratory Data Analysis(EDA).

2. Box and Whisker Plot

This plot can be used to obtain more statistical details about the data.

The straight lines at the maximum and minimum are also called whiskers.

Points that lie outside the whiskers will be considered as an outlier.

The box plot also gives us a description of the 25th, 50th,75th quartiles.

With the help of a box plot, we can also determine the Interquartile range(IQR) where maximum details of the data will be present. Therefore, it can also give us a clear idea about the outliers in the dataset.

Fig. General Diagram for a Box-plot

Implementation:

Boxplot is available in the Seaborn library.

Here x is considered as the dependent variable and y is considered as the independent variable. These box plots come under univariate analysis, which means that we are exploring data only with one variable.

Here we are trying to check the impact of a feature named “axil_nodes” on the class named “Survival status” and not between any two independent features.

The code snippet is as follows:

sns.boxplot(x='SurvStat',y='axil_nodes',data=hb)

Some conclusions inferred from the above box plot:

From the above box and whisker plot we can conclude the following observations:

How much data is present in the 1st quartile and how many points are outliers etc.

For class 1, we can see that it is very little or no data is present between the median and the 1st quartile.

There are more outliers for class 1 in the feature named axil_nodes.

NOTE:

We can get details about outliers that will help us to well prepare the data before feeding it to a model since outliers influence a lot of Machine learning models.

3. Violin Plot

The violin plots can be considered as a combination of Box plot at the middle and distribution plots(Kernel Density Estimation) on both sides of the data.

This can give us the description of the distribution of the dataset like whether the distribution is multimodal, Skewness, etc.

It also gives us useful information like a 95% confidence interval.

Fig. General Diagram for a Violin-plot

Implementation:

The Violin plot is present in the Seaborn package.

The code snippet is as follows:

sns.violinplot(x='SurvStat',y='op_yr',data=hb,size=6)

Some conclusions inferred from the above violin plot:

From the above violin plot we can conclude the following observations:

The median of both classes is close to 63.

The maximum number of persons with class 2 has an op_yr value of 65 whereas, for persons in class1, the maximum value is around 60.

Also, the 3rd quartile to median has a lesser number of data points than the median to the 1st quartile.

Bivariate Analysis Techniques for Data Visualization 1. Line Plot

This is the plot that you can see in the nook and corners of any sort of analysis between 2 variables.

The line plots are nothing but the values on a series of data points will be connected with straight lines.

The plot may seem very simple but it has more applications not only in machine learning but in many other areas.

Implementation:

The line plot is present in the Matplotlib package.

The code snippet is as follows:

plt.plot(x,y)

Some conclusions inferred from the above line plot:

From the above line plot we can conclude the following observations:

These are used right from performing distribution Comparison using Q-Q plots to CV tuning using the elbow method.

Used to analyze the performance of a model using the ROC- AUC curve.

2. Bar Plot

This is one of the widely used plots, that we would have seen multiple times not just in data analysis, but we use this plot also wherever there is a trend analysis in many fields.

Though it may seem simple it is powerful in analyzing data like sales figures every week, revenue from a product, Number of visitors to a site on each day of a week, etc.

Implementation:

The bar plot is present in the Matplotlib package.

The code snippet is as follows:

plt.bar(x,y)

Some conclusions inferred from the above bar plot:

From the above bar plot we can conclude the following observations:

We can visualize the data in a cool plot and can convey the details straight forward to others.

This plot may be simple and clear but it’s not much frequently used in Data science applications.

3. Scatter Plot

It is one of the most commonly used plots used for visualizing simple data in Machine learning and Data Science.

This plot describes us as a representation, where each point in the entire dataset is present with respect to any 2 to 3 features(Columns).

Scatter plots are available in both 2-D as well as in 3-D. The 2-D scatter plot is the common one, where we will primarily try to find the patterns, clusters, and separability of the data.

Implementation:

The scatter plot is present in the Matplotlib package.

The code snippet is as follows:

plt.scatter(x,y)

Some conclusions inferred from the above Scatter plot:

From the above Scatter plot we can conclude the following observations:

The colors are assigned to different data points based on how they were present in the dataset i.e, target column representation. 

We can color the data points as per their class label given in the dataset.

This completes today’s discussion!

Endnotes

Thanks for reading!

I hope you enjoyed the article and increased your knowledge about Data Visualization Techniques.

Please feel free to contact me on Email ([email protected])

For the remaining articles, refer to the link.

About the Author Aashi Goyal

Currently, I am pursuing my Bachelor of Technology (B.Tech) in Electronics and Communication Engineering from Guru Jambheshwar University(GJU), Hisar. I am very enthusiastic about Statistics, Machine Learning and Deep Learning.

Related

A Beginners Guide To Multi

This article was published as a part of the Data Science Blogathon

In the era of Big Data, Python has become the most sought-after language. In this article, let us concentrate on one particular aspect of Python that makes it one of the most powerful Programming languages- Multi-Processing.

Now before we dive into the nitty-gritty of Multi-Processing, I suggest you read my previous article on Threading in Python, since it can provide a better context for the current article.

Let us say you are an elementary school student who is given the mind-numbing task of multiplying 1200 pairs of numbers as your homework. Let us say you are capable of multiplying a pair of numbers within 3 seconds. Then on a total, it takes 1200*3 = 3600 seconds, which is 1 hour to solve the entire assignment.  But you have to catch up on your favorite TV show in 20 minutes.

What would you do? An intelligent student, though dishonest, will call up three more friends who have similar capacity and divide the assignment. So you’ll get 250 multiplications tasks on your plate, which you’ll complete in 250*3 = 750 seconds, that is 15 minutes. Thus, you along with your 3 other friends, will finish the task in 15 minutes, giving you 5 minutes time to grab a snack and sit for your TV show. The task just took 15 minutes when 4 of you work together, which otherwise would have taken 1 hour.

This is the basic ideology of Multi-Processing. If you have an algorithm that can be divided into different workers(processors), then you can speed up the program. Machines nowadays come with 4,8 and 16 cores, which then can be deployed in parallel.

Multi-Processing in Data Science-

Multi-Processing has two crucial applications in Data Science.

1. Input-Output processes-

Any data-intensive pipeline has input, output processes where millions of bytes of data flow throughout the system. Generally, the data reading(input) process won’t take much time but the process of writing data to Data Warehouses takes significant time. The writing process can be made in parallel, saving a huge amount of time.

2. Training models

Though not all models can be trained in parallel, few models have inherent characteristics that allow them to get trained using parallel processing. For example, the Random Forest algorithm deploys multiple Decision trees to take a cumulative decision. These trees can be constructed in parallel. In fact, the sklearn API comes with a parameter called n_jobs, which provides an option to use multiple workers.

Multi-Processing in Python using Process class-

Now let us get our hands on the multiprocessing library in Python.

Take a look at the following code

Python Code:



The above code is simple. The function sleepy_man sleeps for a second and we call the function two times. We record the time taken for the two function calls and print the results. The output is as shown below.

Starting to sleep Done sleeping Starting to sleep Done sleeping Done in 2.0037 seconds

This is expected as we call the function twice and record the time. The flow is shown in the diagram below.

Now let us incorporate Multi-Processing into the code.

import multiprocessing import time def sleepy_man(): print('Starting to sleep') time.sleep(1) print('Done sleeping') tic = time.time() p1 = multiprocessing.Process(target= sleepy_man) p2 = multiprocessing.Process(target= sleepy_man) p1.start() p2.start() toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

Here multiprocessing.Process(target= sleepy_man) defines a multi-process instance. We pass the required function to be executed, sleepy_man, as an argument. We trigger the two instances by p1.start().

The output is as follows-

Done in 0.0023 seconds Starting to sleep Starting to sleep Done sleeping Done sleeping

Now notice one thing. The time log print statement got executed first. This is because along with the multi-process instances triggered for the sleepy_man function, the main code of the function got executed separately in parallel. The flow diagram given below will make things clear.

In order to execute the rest of the program after the multi-process functions are executed, we need to execute the function join().

import multiprocessing import time def sleepy_man(): print('Starting to sleep') time.sleep(1) print('Done sleeping') tic = time.time() p1 = multiprocessing.Process(target= sleepy_man) p2 = multiprocessing.Process(target= sleepy_man) p1.start() p2.start() p1.join() p2.join() toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

Now the rest of the code block will only get executed after the multiprocessing tasks are done. The output is shown below.

Starting to sleep Starting to sleep Done sleeping Done sleeping Done in 1.0090 seconds

The flow diagram is shown below.

Since the two sleep functions are executed in parallel, the function together takes around 1 second.

We can define any number of multi-processing instances. Look at the code below. It defines 10 different multi-processing instances using a for a loop.

import multiprocessing import time def sleepy_man(): print('Starting to sleep') time.sleep(1) print('Done sleeping') tic = time.time() process_list = [] for i in range(10): p = multiprocessing.Process(target= sleepy_man) p.start() process_list.append(p) for process in process_list: process.join() toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

The output for the above code is as shown below.

Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done in 1.0117 seconds

Here the ten function executions are processed in parallel and thus the entire program takes just one second. Now my machine doesn’t have 10 processors. When we define more processes than our machine, the multiprocessing library has a logic to schedule the jobs. So you don’t have to worry about it.

We can also pass arguments to the Process function using args.

import multiprocessing import time def sleepy_man(sec): print('Starting to sleep') time.sleep(sec) print('Done sleeping') tic = time.time() process_list = [] for i in range(10): p = multiprocessing.Process(target= sleepy_man, args = [2]) p.start() process_list.append(p) for process in process_list: process.join() toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

The output for the above code is as shown below.

Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Starting to sleep Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done sleeping Done in 2.0161 seconds

Since we passed an argument, the sleepy_man function slept for 2 seconds instead of 1 second.

Multi-Processing in Python using Pool class-

In the last code snippet, we executed 10 different processes using a for a loop. Instead of that we can use the Pool method to do the same.

import multiprocessing import time def sleepy_man(sec): print('Starting to sleep for {} seconds'.format(sec)) time.sleep(sec) print('Done sleeping for {} seconds'.format(sec)) tic = time.time() pool = multiprocessing.Pool(5) pool.map(sleepy_man, range(1,11)) pool.close() toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

multiprocessing.Pool(5) defines the number of workers. Here we define the number to be 5. pool.map() is the method that triggers the function execution. We call pool.map(sleepy_man, range(1,11)). Here, sleepy_man  is the function that will be called with the parameters for the functions executions defined by range(1,11)  (generally a list is passed). The output is as follows-

Starting to sleep for 1 seconds Starting to sleep for 2 seconds Starting to sleep for 3 seconds Starting to sleep for 4 seconds Starting to sleep for 5 seconds Done sleeping for 1 seconds Starting to sleep for 6 seconds Done sleeping for 2 seconds Starting to sleep for 7 seconds Done sleeping for 3 seconds Starting to sleep for 8 seconds Done sleeping for 4 seconds Starting to sleep for 9 seconds Done sleeping for 5 seconds Starting to sleep for 10 seconds Done sleeping for 6 seconds Done sleeping for 7 seconds Done sleeping for 8 seconds Done sleeping for 9 seconds Done sleeping for 10 seconds Done in 15.0210 seconds

Pool class is a  better way to deploy Multi-Processing because it distributes the tasks to available processors using the First In First Out schedule. It is almost similar to the map-reduce architecture- in essence, it maps the input to different processors and collects the output from all processors as a list. The processes in execution are stored in memory and other non-executing processes are stored out of memory.

Whereas in Process class, all the processes are executed in memory and scheduled execution using FIFO policy.

Comparing the time performance for calculating perfect numbers-

 

Using a regular for a loop- import time def is_perfect(n): sum_factors = 0 for i in range(1, n): if (n % i == 0): sum_factors = sum_factors + i if (sum_factors == n): print('{} is a Perfect number'.format(n)) tic = time.time() for n in range(1,100000): is_perfect(n) toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

The output for the above program is shown below.

6 is a Perfect number 28 is a Perfect number 496 is a Perfect number 8128 is a Perfect number Done in 258.8744 seconds Using a Process class- import time import multiprocessing def is_perfect(n): sum_factors = 0 for i in range(1, n): if(n % i == 0): sum_factors = sum_factors + i if (sum_factors == n): print('{} is a Perfect number'.format(n)) tic = time.time() processes = [] for i in range(1,100000): p = multiprocessing.Process(target=is_perfect, args=(i,)) processes.append(p) p.start() for process in processes: process.join() toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

The output for the above program is shown below.

6 is a Perfect number 28 is a Perfect number 496 is a Perfect number 8128 is a Perfect number Done in 143.5928 seconds

As you could see, we achieved a 44.4% reduction in time when we deployed Multi-Processing using Process class, instead of a regular for loop.

Using a Pool class- import time import multiprocessing def is_perfect(n): sum_factors = 0 for i in range(1, n): if(n % i == 0): sum_factors = sum_factors + i if (sum_factors == n): print('{} is a Perfect number'.format(n)) tic = time.time() pool = multiprocessing.Pool() pool.map(is_perfect, range(1,100000)) pool.close() toc = time.time() print('Done in {:.4f} seconds'.format(toc-tic))

The output for the above program is shown below.

6 is a Perfect number 28 is a Perfect number 496 is a Perfect number 8128 is a Perfect number Done in 74.2217 seconds

As you could see, compared to a regular for loop we achieved a 71.3% reduction in computation time, and compared to the Process class, we achieve a 48.4% reduction in computation time.

Thus, it is very well evident that by deploying a suitable method from the multiprocessing library, we can achieve a significant reduction in computation time.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion. 

Cloud Computing Data Storage: Buying Guide

Although many enterprises are moving applications and other processes to the cloud, data backups and storage are still often local. On-site storage seem easier to control and secure, yet it’s costly to administer and leaves organizations vulnerable should a natural disaster hit.

Moreover, bottlenecks often form at the local level, especially when integrating remote applications and assets with those managed on-site.

Most organizations with a cloud presence have an eye toward using the cloud for data storage. But with a variety of cloud storage vendors to choose from, it’s important to ask the following questions to assure valuable digital assets are managed efficiently and effectively:

If your business is looking to the cloud as a primary location for file storage and backups, availability is key. For Nhan Nguyen, Chief Scientist and CTO at CIC, being able to access files quickly and at any time is the cornerstone of providing customers with the quality of service they expect.

CIC provides electronic signature solutions for the time-sensitive financial services industry. So its technologists needed to know that their cloud storage solution would maintain the same level of availability and speed as an on-site option.

Nguyen explained, “We support a very high number of concurrent users. Maintaining very high uptime and guaranteed document load performance of less than three seconds are our main goals.”

CIC needed a solution that would meet these goals and satisfy customer SLAs. After some research, they decided to deploy Gluster’s File System (GlusterFS), which complemented their existing cloud technology infrastructure.

Using GlusterFS, CIC was able to pool, aggregate, and virtualize their existing Amazon Web Services Elastic Block Storage (EBS). By utilizing both synchronous and asynchronous replication, files are retrieved quickly–even surpassing customer expectations.

For Stanley Kania, CEO of Software Link, a hosted ERP provider, looking to the cloud was a way to meet expanding storage needs.

“Using local disc storage on servers became unmanageable and unsustainable, especially as we began to virtualize our infrastructure. At the same time, we need to store more and more data,” Kania said. With over 2,000 customers and growing, Kania and his team found a solution in Coraid.

Coraid’s EtherDrive platform enabled faster performance and allowed for adding new storage as-needed, scaling to meet Software Link’s growing storage needs.

Software Link is now able to host far more applications and has seen an increase in spindle speed and high demand IL. When they need additional storage, additional EtherDrive shelves can be configured and deployed in a matter of minutes.

For both Software Link and CIC, integrating a storage solution with existing applications and infrastructures was a key requirement. Both companies were looking for complementary solutions that would accommodate existing workflows.

“The solution we were looking for had to fit with current applications,” says Kania, whose business primarily provides hosted ERP solutions from Sage and SMB solutions from QuickBooks.

For Nyguen and his staff at CIC, a streamlined transition from their preexisting storage to the cloud was a main requirement. The staff at CIC had already selected the RightScale Cloud Management Platform as the foundation for their operations, and had decided on Amazon’s EBS.

As Nyguen says, “It was crucial to select a cloud storage solution that required no change to our existing infrastructure.”

A Comprehensive Guide To Conditional Statements In Python For Data Science Beginners

This article was published as a part of the Data Science Blogathon

Introduction

Decision-making is as important in any programming language as it is in life. Decision-making in a programming language is automated using conditional statements, in which Python evaluates the code to see if it meets the specified conditions.

The conditions are evaluated and processed as true or false. If this is found to be true, the program is run as needed. If the condition is found to be false, the statement following the If condition is executed.

Python has six conditional statements that are used in decision-making:-

1. If the statement

2. If else statement

3. Nested if statement

4. If…Elif ladder

5. Short Hand if statement

6. Short Hand if-else statement

Image Source: Link

Let’s take a glance at how each of those works.

If Statement

The If statement is the most fundamental decision-making statement, in which the code is executed based on whether it meets the specified condition. It has a code body that only executes if the condition in the if statement is true. The statement can be a single line or a block of code.

The if statement in Python has the subsequent syntax:

if expression Statement

#If the condition is true, the statement will be executed.

Examples for better understanding:

Example – 1

num = 5 print(num, "is a positive number.") print("This statement is true.") #When we run the program, the output will be: 5 is a positive number. This statement is true.



Example – 2

a = 25

b = 170 print("b is greater than a") output : b is greater than a

If Else Statement

This statement is used when both the true and false parts of a given condition are specified to be executed. When the condition is true, the statement inside the if block is executed; if the condition is false, the statement outside the if block is executed.

The if…Else statement in Python has the following syntax:

 if condition : #Will executes this block if the condition is true else : #Will executes this block if the condition is false

Example for better understanding:

num = 5 print("Positive or Zero") else: print("Negative number") output : Positive or Zero

If…Elif..else Statement

In this case, the If condition is evaluated first. If it is false, the Elif statement will be executed; if it also comes false, the Else statement will be executed.

The If…Elif..else statement in Python has the subsequent syntax:

if condition : Body of if elif condition : Body of elif else: Body of else

Example for better understanding:

We will check if the number is positive, negative, or zero.

num = 7 print("Positive number") elif num == 0: print("Zero") else: print("Negative number") output: Positive number

Nested IF Statement

A Nested IF statement is one in which an If statement is nestled inside another If statement. This is used when a variable must be processed more than once. If, If-else, and If…elif…else statements can be used in the program. In Nested If statements, the indentation (whitespace at the beginning) to determine the scope of each statement should take precedence.

The Nested if statement in Python has the following syntax:

if (condition1): #Executes if condition 1 is true if (condition 2): #Executes if condition 2 is true #Condition 2 ends here #Condition 1 ends here

Examples for better understanding:

Example-1

num = 8 if num == 0: print("zero") else: print("Positive number") else: print("Negative number") output: Positive number

Example-2

price=100 quantity=10 amount = price*quantity print("The amount is greater than 1000") else: if amount 800: print("The amount is between 800 and 1000") elif amount 600: print("The amount is between 600 and 1000") else: print("The amount is between 400 and 1000") elif amount == 200: print("Amount is 200") else: print("Amount is less than 200") The output : “The amount is between 400 and 1000.”

Short Hand if statement

Short Hand if statement is used when only one statement needs to be executed inside the if block. This statement can be mentioned in the same line which holds the If statement.

The Short Hand if statement in Python has the following syntax:

if condition: statement

Example for better understanding:

i=15 # One line if statement The output of the program : “i is greater than 11.”

Short Hand if-else statement

It is used to mention If-else statements in one line in which there is only one statement to execute in both if and else blocks. In simple words, If you have only one statement to execute, one for if, and one for else, you can put it all on the same line.

Examples for better understanding:

#single line if-else statement

a = 3 b = 5 output: B

#single line if-else statement, with 3 conditions

a = 3 b = 5 output: B is greater

To summarise,

· The If the condition is used to print the result when only one of the conditions listed is true or false.

· When one of the conditions is false, the If-else condition is used to print the statement.

· When there is a third possible outcome, the Elif statement is used. In a program, any number of Elif conditions can be used.

· By declaring all of the conditions in a single statement, we can reduce the number of codes that must be executed.

· Nested if statements can be used to nest one If condition inside another.

Conclusion

If you’re reading this, you’re most likely learning Python or trying to become a Python developer. Learning Python or another programming language begins with understanding the fundamental concepts that form its foundation.

By the end of this text, you should understand the various If else conditions used in python.

About The Author Prashant Sharma

Currently, I Am pursuing my Bachelors of Technology( B.Tech) from Vellore Institute of Technology. I am very enthusiastic about programming and its real applications including software development, machine learning, and data science.

Hope you like the article. If you want to connect with me then you can connect on:

Linkedin

or for any other doubts, you can send a mail to me also

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

A Comprehensive Guide To Learn Data Exploration In Python!

This article was published as a part of the Data Science Blogathon

Introduction

This article will help you get hands-on with python and introduces the preliminary data analysis techniques to get to know your data better.

Often, day to day work of data scientists involve learning multiple algorithms and finding the ML apt solution for varied business problems. They also need to keep themselves updated with the programming language to implement their solution.

Hence, I am writing this article with the intent to cover the basics of the much sought-after programming language these days — python.

So, let’s get started.

Imports

Let’s make all necessary imports:

When we have to type multiple print statements to see the corresponding outputs in a cell in the Jupyter notebook. Below import will let you see multiple outputs in the same cell and saves from multiple prints and display statements.

Data exploration:

Let’s start with basic data exploration to check the summary of the data, how it looks like, the size of data, etc.

The next important step is to check the data types of each of the columns and see how many null values exist in the data.

Once we know different data types, we might want to check the data for each category. e.g., the below code lets you see the data frame filtered for only the ‘object’ type:

In order to get all the categorical column names to perform encoding techniques, you can write ‘.columns’ like below:

It is important to understand the data feature by feature, e.g. what different range of values each feature takes and count of it:

When we want to make some transformation on the original dataframe, we make a copy of it. Making a deep copy of a dataframe prevents the changes made in the new dataframe to get reflected in the original dataframe.

Columns — rename and drop columns

 

Filtering a dataframe

Great, so we have seen basic techniques of how to use python to get a better understanding of our data.

Datetime:

Next, we will learn how to handle DateTime features.

Pandas follow timestamp limitations and can represent timestamp only if it is within a certain range using 64 bits, as shown below. Thus, it will either return the input date as it is, or it will return NaT on the basis of what value we pass for input parameter ‘errors’.

dateutil.parser:

Date as the string is passed as an input to get the datetime format. If dayfirst=True is passed in input, it will assume the DD/MM/YY format.

One good thing is if we pass the date string where the second number corresponding to month is greater than 12 along with dayfirst = True, it will automatically infer that month can not be greater than 12, hence the first number is marked as a month, keeping the second number as day.

It is highlighted below:

strptime and strftime:

String date can be converted to datetime using strptime. Different formats are specified for the input date. More can be learned about the formats from here.

Conversely, strftime is used to convert datetime to string date.

Sorting

The dataframe is sorted by the feature ‘Age’ below:

 

Setting Index

The dataframe can be indexed by one or more columns as below:

How to filter the Indexed dataframe?

E.g. I have filtered the dataframe by filtering on ‘Age’ index (accessed via ‘0’ level) greater than 30, as highlighted below:

Shifting the dataframe indexed by date:

Dropping NaNs:

Null values can be dropped from the rows by specifying axis=0 and from the columns by specifying axis=1.

how = ‘any’ and ‘all’ is used to drop the rows/columns where any value is NaN vs the ones where all values are NaNs

Groupby

 

The top 2 observations based on ‘Credit amount’ per ‘Gender’ is shown below:

np.select

It is used when multiple conditions are specified, e.g. binning the ‘Age’ variable as per the below criteria, we get the new feature ‘Age_binned’:

pivot_table

Thanks for reading this guide to basic data exploration !!!

Related

Update the detailed information about Data Visualization Guide For Multi on the Cattuongwedding.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!