Posts

Showing posts from 2019

Descriptive Statistics - count & sum

F. Count() - counts the non-NA entries for each row and column. Values None, Nat, NaN are considered as NA in pandas. Example:-  1. import pandas as pd df2 = pd.DataFrame({2016:{'q1':500,'q2':500,'q3':47000,'q4':49000},2017:{'q1':'A','q2':'A','q3':'A','q4':'D'},2018:{'q1':54500,'q2':51000},2019:{'q1':True,'q2':'False'}}) print(df2.count()) Output:- 2016    4 2017    4 2018    2 2019    2 2.  import pandas as pd df2 = pd.DataFrame({2016:{'q1':500,'q2':500,'q3':47000,'q4':49000},2017:{'q1':'A','q2':'A','q3':'A','q4':'D'},2018:{'q1':54500,'q2':51000},2019:{'q1':True,'q2':'False'}}) print(df2.count(numeric_only=True)) Output:- 2016 4 2018 2 G. Sum() - Returns the sum of the values for

Descriptive statistics - mode(), mean() and median()

C. mode() - returns the value that appears most from a set of values. Example:- 1. import pandas as pd import pandas as pd df2 = pd.DataFrame({2016:{'q1':500,'q2':500,'q3':47000,'q4':49000},2017:{'q1':'A','q2':'A','q3':'A','q4':'D'},2018:{'q1':54500,'q2':51000}}) df2.mode() Output:- 2016 2017 2018 0 500.0 A 51000.0 1 NaN NaN 54500.0 Explanation:- Since by default axis=0 so mode is calculated among rows(indexes) i.e for each column. 2. df2.mode(axis=1) 0 1 2 q1 500 A 54500.0 q2 500 A 51000.0 q3 47000 A NaN q4 49000 D NaN Explanation:- Since axis=1 so the mode is calculated among columns i.e for each row. 3.  df2.mode(numeric_only=True) Output:- 2016 2018 0 500.0 51000.0 1 NaN 54500.0 Explanation: With numeric_only=True only numeric values are included for mode calculation and String/Text values are not considered. By

Descriptive Statistics - min(), max()

A. min() - Find out the minimum and maximum out of a given set of data. Example:- import pandas as pd df = pd.DataFrame({2016:{'q1':34500,'q2':56000,'q3':47000,'q4':49000},2017:\{'q1':44900,'q2':46100,'q3':57000,'q4':59000},2018:{'q1':54500,'q2':51000}}) print(df) Output:- 2016 2017 2018 q1 34500 44900 54500.0 q2 56000 46100 51000.0 q3 47000 57000 NaN q4 49000 59000 NaN 1. print(df.min()) Output:- 2016 34500.0 2017 44900.0 2018 51000.0 Explanation: min() finds the minimum among the indexes for each column and axis 0 by default. 2. print(df.min(axis=1)) Output:- q1 34500.0 q2 46100.0 q3 47000.0 q4 49000.0 Explanation: min(axis=1) finds the minimum among the columns for each indexes. import pandas as pd df2 = pd.DataFrame({2016:{'q1':34500,'q2':56000,'q3':47000,'q4':49000},2017:{'q1':'A','q2':'B

Python Tokens - Operators

Operators - Tokens that trigger computation when applied to variables and other objects in a expression. Variable and expression to which the operators are applied are called operands. Example:- 2+3 here 2,3 are operands and + is operator. 1. Unary Operators -  It operates on only one operand. Example:- Operator Name Example + Unary Plus +10, 23E+2 - Unary Minus -4.3, -2.3E-1 ~ Bitwise complement ~3 not Logical negation not 2 2. Binary Operators - It requires two operators to operate upon. Operator Name Example + Addition 2+3 = 5 - Subtraction 2-3=-1 * Multiplication 2*3=6 / Division 2/3 = 0.66666666 % Remainder 2%3 = 2 ** Exponent 2**3 = 8 // Floor division 55//2 = 27

Python Tokens

Tokens  or Lexical unit are the smallest buidling blocks of a program. Tokens are the equivalent of alphabets , grammer and tenses of a English language. Python has the following tokens:- Keywords Identifiers(Names) Literals Operators Punctuators a. Keywords - are the reserved words that has a special meaning and should be used only where its meant to be. Example:- False, True, for, while, None, break, if, elif, else b. Identifiers(Names) - The names given to different parts of the program like variables, function, objects, classes etc. eg:- a = 2, add(), here a and add are identifier names Rules for framing identifier names. Identifier name should be a combination of letters(a-z, A-Z) and digits(0-9). eg:- Valid names - abc123, abc, xy_123, xy_ Invalid names - abc#12, age$  First character must be a letter or underscore. eg:- Valid names- _123, a_123    Invalid names - 1abc Upper and lower case are different. eg:- ABC and abc are both different identifiers Xyz and

Python class 11 'IP' Basics

Python is a Programming language used to create software, websites and for scientific computing. Programming language is a language that can be used to write programs to give instructions to computer to perform a specific task. Other popular programming language are C, C++, Java, VB, C# and JavaScript. Popularity of python can be attributed to its huge library(collection of modules, functions) that can be used by the programmer to build any program in less time and effort than any other programming language. Data Science and Machine learning are the two buzzwords that's been around quite some time and to implement techniques of data science and machine learning python is the tool of choice for most programmers. Advantages of Python Language are:- Easy to use - Programmers find it very easy to use it because the rules(syntax) for writing instructions are very much similar to other high level language(C, C++, Java). Expressive Language - Requires fewer lines of code to

Descriptive Statistics

A number of statistical operations can be performed on the Dataframe and Series objects. This operations are useful in data science to evaluate your data from different perspective. Some common operations are:- a. abs() -   Return a Series/DataFrame with absolute numeric value of each element. Only works on numeric elements. Example:- import pandas as pd import numpy as np s = pd.Series([-1.2,-2.2,3.2]) print(s.abs()) Output:- 0 1.2 1 2.2 2 3.2 Explanation:- Converts negative value to positive value. Example:- import pandas as pd import numpy as np s = pd.Series([-1.2,-2.2,1+1j]) print(s.abs()) Output:- 0 1.200000 1 2.200000 2 1.414214 Explanation:- Converts negative to positive and complex numbers to absolute number as √ a 2 +b 2 Example:- import pandas as pd import numpy as np #s = pd.Series([-1.2,-2.2,3.2]) df = pd.DataFrame({'Age':[-23,-44,24],'Name':['tim','henry','jerry']})

Groupby - Transformations & Filtrations

b. Transformations -  This function lets you change the data elements into some other value. Example:- import pandas as pd import numpy as np weather_data = {'Weather': ['Rainy', 'Stormy', 'Sunny', 'Cloudy', 'Rainy', 'Sunny', 'Cloudy', 'Rainy', 'Stormy', 'Cloudy', 'Sunny', 'Sunny'], 'State': ['CG', 'AP', 'HP', 'MP', 'HY','DH' ,'CG' ,'HP','AP' , 'MP','CG','AP'], 'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017], 'Humidity':[3.4,2.3,3.2,4.7,5.8,8.1,3.2,3.5,7.3,1.1,1.2,2.3]} df = pd.DataFrame(weather_data) gp = df.groupby('Year') print(gp['Humidity']. transform ( lambda x: x*100 )) Output:- 0 340.0 1 230.0 2 320.0 3 470.0 4 580.0 5 810.0 6 320.0 7 350.0 8 730.0 9 1