Function Application Continued Aggregation Groupby...
Aggregation(group by) - A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Groupby function works when you have categorical data as we have in the weather data below. Here 'weather', 'Year'and 'State'are all categorical.
Groupby operation involves one of the following operations.
- Splitting the Object
- Applying a function
- Combining the results
1. Splitting the object -
import pandas as pd
weather_data = {'Weather': ['Rainy', 'Stormy', 'Sunny', 'Cloudy', 'Rainy','Sunny', 'Cloudy', 'Rainy', 'Stormy', 'Cloudy', 'Sunny', 'Sunny'],
'State': ['CG', 'AP', 'HP', 'MP', 'HY','DH' ,'CG' ,'HP','AP' , 'MP','CG','AP'],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Humidity':[3.4,2.3,3.2,4.7,5.8,8.1,3.2,3.5,7.3,1.1,1.2,2.3]}
df = pd.DataFrame(weather_data)
gp = df.groupby('Weather')
Iterating through groups-
for name,group in gp:
print(name)
print(group)
Output:-
Cloudy
Weather State Year Humidity
3 Cloudy MP 2015 4.7
6 Cloudy CG 2016 3.2
9 Cloudy MP 2014 1.1
Rainy
Weather State Year Humidity
0 Rainy CG 2014 3.4
4 Rainy HY 2014 5.8
7 Rainy HP 2017 3.5
Stormy
Weather State Year Humidity
1 Stormy AP 2015 2.3
8 Stormy AP 2016 7.3
Sunny
Weather State Year Humidity
2 Sunny HP 2014 3.2
5 Sunny DH 2015 8.1
10 Sunny CG 2015 1.2
11 Sunny AP 2017 2.3
View Groups
print(df.groupby('Team').groups)
Output:-
{'Cloudy': Int64Index([3, 6, 9], dtype='int64'),
'Rainy': Int64Index([0, 4, 7], dtype='int64'),
'Stormy': Int64Index([1, 8], dtype='int64'),
'Sunny': Int64Index([2, 5, 10, 11], dtype='int64')}
Grouping by multiple columns
('Cloudy', 'CG')Weather State Year Humidity
6 Cloudy CG 2016 3.2
('Cloudy', 'MP')
Weather State Year Humidity
3 Cloudy MP 2015 4.7
9 Cloudy MP 2014 1.1
('Rainy', 'CG')
Weather State Year Humidity
0 Rainy CG 2014 3.4
('Rainy', 'HP')
Weather State Year Humidity
7 Rainy HP 2017 3.5
('Rainy', 'HY')
Weather State Year Humidity
4 Rainy HY 2014 5.8
('Stormy', 'AP')
Weather State Year Humidity
1 Stormy AP 2015 2.3
8 Stormy AP 2016 7.3
('Sunny', 'AP')
Weather State Year Humidity
11 Sunny AP 2017 2.3
('Sunny', 'CG')
Weather State Year Humidity
10 Sunny CG 2015 1.2
('Sunny', 'DH')
Weather State Year Humidity
5 Sunny DH 2015 8.1
('Sunny', 'HP')
Weather State Year Humidity
2 Sunny HP 2014 3.2
Select a group
gp = df.groupby('State')
print(gp.get_group(HP))
Output:-
Weather State Year Humidity
2 Sunny HP 2014 3.2
7 Rainy HP 2017 3.5
Source:- https://www.tutorialspoint.com/python_pandas/python_pandas_groupby.htm
import pandas as pd
ReplyDeleteimport numpy as np
stut_data={'name':['raj','ravi','neha','raveena'],'gender':['m','m','f','f'],'marks':[45,23,56,22]}
df = pd.DataFrame(stut_data)
grouped=df.groupby('gender')
print(grouped['marks'].apply(np.sum))
gender
Deletef 78
m 68
import pandas as pd
ReplyDeleteimport numpy as np
df = pd.DataFrame({'Name':['Raj','Ravi','Deepa','neha'], 'Age':[25,26,30,52],'Marks':[92,95,12,98],'Gender':['M','M','F','F']})
gp = df.groupby('Gender')
fem = gp.get_group('F')
fem['Marks'].pipe(np.sum)
Ans
Delete110
Sum of the marks of all female students