Function Application Continued Aggregation Groupby...

Aggregation(group by) - A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Groupby function works when you have categorical data as we have in the weather data below. Here 'weather', 'Year'and 'State'are all categorical.
Groupby operation involves one of the following operations.
  • Splitting the Object
  • Applying a function   
  • Combining the results
gp = df.groupby(['Weather','State'])for name,group in gp: print(name) print(group)Output:-

1. Splitting the object - 
import pandas as pd
weather_data = {'Weather': ['Rainy', 'Stormy', 'Sunny', 'Cloudy', 'Rainy',
'Sunny', 'Cloudy', 'Rainy', 'Stormy', 'Cloudy', 'Sunny', 'Sunny'],
'State': ['CG', 'AP', 'HP', 'MP', 'HY','DH' ,'CG' ,'HP','AP' , 'MP','CG','AP'],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Humidity':[3.4,2.3,3.2,4.7,5.8,8.1,3.2,3.5,7.3,1.1,1.2,2.3]}
df = pd.DataFrame(weather_data)
gp = df.groupby('Weather')

Iterating through groups-
for name,group in gp:
       print(name)
       print(group)
Output:-
Cloudy
  Weather State  Year  Humidity
3  Cloudy    MP  2015       4.7
6  Cloudy    CG  2016       3.2
9  Cloudy    MP  2014       1.1
Rainy
  Weather State  Year  Humidity
0   Rainy    CG  2014       3.4
4   Rainy    HY  2014       5.8
7   Rainy    HP  2017       3.5
Stormy
  Weather State  Year  Humidity
1  Stormy    AP  2015       2.3
8  Stormy    AP  2016       7.3
Sunny
   Weather State  Year  Humidity
2    Sunny    HP  2014       3.2
5    Sunny    DH  2015       8.1
10   Sunny    CG  2015       1.2

11   Sunny    AP  2017       2.3

View Groups
print(df.groupby('Team').groups)

Output:-
{'Cloudy': Int64Index([3, 6, 9], dtype='int64'), 
'Rainy': Int64Index([0, 4, 7], dtype='int64'), 
'Stormy': Int64Index([1, 8], dtype='int64'), 
'Sunny': Int64Index([2, 5, 10, 11], dtype='int64')}

Grouping by multiple columns
('Cloudy', 'CG')
  Weather State  Year  Humidity
6  Cloudy    CG  2016       3.2
('Cloudy', 'MP')
  Weather State  Year  Humidity
3  Cloudy    MP  2015       4.7
9  Cloudy    MP  2014       1.1
('Rainy', 'CG')
  Weather State  Year  Humidity
0   Rainy    CG  2014       3.4
('Rainy', 'HP')
  Weather State  Year  Humidity
7   Rainy    HP  2017       3.5
('Rainy', 'HY')
  Weather State  Year  Humidity
4   Rainy    HY  2014       5.8
('Stormy', 'AP')
  Weather State  Year  Humidity
1  Stormy    AP  2015       2.3
8  Stormy    AP  2016       7.3
('Sunny', 'AP')
   Weather State  Year  Humidity
11   Sunny    AP  2017       2.3
('Sunny', 'CG')
   Weather State  Year  Humidity
10   Sunny    CG  2015       1.2
('Sunny', 'DH')
  Weather State  Year  Humidity
5   Sunny    DH  2015       8.1
('Sunny', 'HP')
  Weather State  Year  Humidity
2   Sunny    HP  2014       3.2

Select a group
gp = df.groupby('State')
print(gp.get_group(HP))

Output:-
  Weather State  Year  Humidity
2   Sunny    HP  2014       3.2
7   Rainy    HP  2017       3.5
Source:- https://www.tutorialspoint.com/python_pandas/python_pandas_groupby.htm

Comments

  1. import pandas as pd
    import numpy as np
    stut_data={'name':['raj','ravi','neha','raveena'],'gender':['m','m','f','f'],'marks':[45,23,56,22]}
    df = pd.DataFrame(stut_data)
    grouped=df.groupby('gender')
    print(grouped['marks'].apply(np.sum))

    ReplyDelete
  2. import pandas as pd
    import numpy as np
    df = pd.DataFrame({'Name':['Raj','Ravi','Deepa','neha'], 'Age':[25,26,30,52],'Marks':[92,95,12,98],'Gender':['M','M','F','F']})
    gp = df.groupby('Gender')
    fem = gp.get_group('F')
    fem['Marks'].pipe(np.sum)

    ReplyDelete

Post a Comment

Popular posts from this blog

Python Tokens

Python Tokens - Operators

Descriptive Statistics - count & sum