Data Analysis Using Python ....

Data Analysis Using Python ....

It helps beginners to learn how to analyze your data in Python or simply you can say Data Science with Python.

Table of contents

No heading

No headings in the article.

I have been learning Python for Data Science for a long time but I like to thank @Rollademy to encourage a learner to write a blog about it.

Well, we are all aware of how much data is driven in this world on a daily basis, and so are the data manipulations and data generations too. Python is ever increasingly popular in the data science field.

Today, I start showing you how to analyze your data using Python, we will Analyze the Big Weather Data, and for this, we need to perform some Python Dataframe commands on data using Pandas. We need to Install Pandas first and then retrieve data using its data frame.

For the data source, you can download it from drive.google.com/file/d/1JvD4Ss2yS3d9X36YkW..

After Installing Pandas Libraries and retrieving data we analyze the data according to our needs. we use here the big weather data from the source listed above.

First of all, we should know what the data is. How many records are there? what are the fields, and so on, so that we can analyze accordingly.

Let's begin...

Install Pandas using command import pandas as pd where pd is alias used for pandas for quick reference.

import paandas.png

now we load our big weather data file in our Pandas data frame by using the command

data = pd.read_csv (" C:\Users\newuser\weather.csv ") here in inverted commas, you will write the file path where the data is stored

to remove any unicode error you can write command like data = pd.read_csv ("r C:\Users\newuser\weather.csv ") where r is used to remove Unicode error.

read csv.png

what's our data now

data.png

Everytime before we move on to analyzing data always explore it gets familiar with its rows, columns, its values, and for this purpose, we use very simple commands like data. info, data.head, data.unique where data is our dataframe name where we import the weather data.

Lookout the following commands and its functions below...

info.png

head.png

columns.png

data types.png

count.png

shape.png

value counts.png

unique.png

nuni.png

Well, you are now fully aware of the data, its number of rows, columns, and their values, now we actually move on to Analyzing this data and for this purpose, we have some questions to find out.

You will learn how to work on a real project of Data Analysis with Python. Questions are given in the project and then solved with the help of Python. It is a project of Data Analysis with Python or you can say, Data Science with Python.

Q. 1) Find all the unique 'Wind Speed' values in the data.

There are multiple ways to achieve this data. Every time before analyzing check the data by finding out even only 2 rows, it will help in writing commands .

cmd uni.png

Also, you can

image.png

now we find the 34 values are unique, what are they

image.png

Q. 2) Find the number of times when the 'Weather is exactly Clear'.

Again first we list our data and then analyze it

image.png

we can see how many times the data is clear.

we can also find out whether the data is clear using filtering, always remember to write the column in square brackets with the data frame name outside, it's a very important syntax.

image.png

we can also find out this by using the groupby command.

image.png

Q. 3) Find the number of times when the 'Wind Speed was exactly 4 km/h'.

image.png

image.png

Q. 4) Find out all the Null Values in the data.

image.png

Here It shows that null False in all rows because there is no null value in the data, to show as no. of rows where data is null or in columns where data is null we can write it as follows.

image.png

Also, we can check it by using not null function.

image.png

It shows only not null values available in the data, and it would be 8784 rows, means no null value.

Q. 5) Rename the column name 'Weather' of the dataframe to 'Weather Condition'.

Always find the exact column name in order to write exactly otherwise, it happens that you can make mistakes, so find out just the initial two rows using data.head command and then

image.png

but the above command is use to change the display name temporarily for the data retrieving only, for permanent change in the column you need

image.png

Q. 6) What is the mean 'Visibility'?

image.png

Q. 7) What is the Standard Deviation of 'Pressure' in this data?

image.png

Q. 8) What is the Variance of 'Relative Humidity' in this data?

image.png

Q. 9) Find all instances when 'Snow' was recorded.

There are three ways you can calculate this: * value_counts()

image.png

* Filtering

image.png

*str.conatins

image.png

For more detailed view

image.png

Q. 10) Find all instances when 'Wind Speed is above 24' and 'Visibility is 25'.

There need to meet two conditions, for this we need to use our AND operator.

image.png

Q. 11) What is the Mean value of each column against each 'Weather Condition ?

image.png

We can easily get the result by using Groupby command and using mean function.

Q. 12) What is the Minimum & Maximum value of each column against each 'Weather Condition ?

Same like previous question, it was asked the mean value against weather condition, here it's asking for min and max value, so same approach of groupby is used with min and max function.

image.png

image.png

Q. 13) Show all the Records where Weather Condition is Fog.

image.png

Q. 14) Find all instances when 'Weather is Clear' or 'Visibility is above 40'.

Either of the two conditions should be met, so we use OR operator and for the required result, I mentioned above also, don't forget to write the whole condition in square bracket with the dataframe name (data in this case)

image.png

you can also use the same command above with .tail option to check the last rows (.tail(50) will show you last 50 records or rows as per your requirement)

Q. 15) Find all instances when :

A. 'Weather is Clear' and 'Relative Humidity is greater than 50' or B. 'Visibility is above 40'

Check the columns names everytime using the data.head(2) , returns two rows only with column names then

image.png

here all are above three conditions met but for accurate result again we use our dataframe name with square brackets.

image.png

I hope this will help you little in starting your step ahead for data analysis. for you ease I am stating down the commands used in exploring data also.

The commands that we used in this project :

  • head() - It shows the first N rows in the data (by default, N=5).

  • shape - It shows the total no. of rows and no. of columns of the dataframe

  • index - This attribute provides the index of the dataframe

  • columns - It shows the name of each column

  • dtypes - It shows the data-type of each column

  • unique() - In a column, it shows all the unique values. It can be applied on a single column only, not on the whole dataframe.

  • nunique() - It shows the total no. of unique values in each column. It can be applied on a single column as well as on the whole dataframe.

  • count - It shows the total no. of non-null values in each column. It can be applied on a single column as well as on the whole dataframe.

  • value_counts - In a column, it shows all the unique values with their count. It can be applied on a single column only.

  • info() - Provides basic information about the dataframe.

Data Source: youtube.com/watch?v=4hYOkHijtNw&list=PL..

I hope this project will help you alot in analyzing different types of data. See for more coming up...