DataFrame Concept, Usage

Dataframe is a 2D-dimensional data structure, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). 

a. Two Dimensional - data is stored in the form of rows and columns. Something like below which consists of 3 rows and 2 columns.
Output:
  Name

Age
1 Sunny32
2 Rocky22
3 Tim41

Python Code:
  • import pandas as pd
  • df = pd.DataFrame({'Name':['sunny','rocky','tim'],'Age':[32,22,41]},index=[1,2,3])
  • print(df)
Code Explanation:
First Line : pandas is a python library which has several built in functions to use for data analysis. DataFrame, Series, Panel are few data structures available inside pandas. For more info refer https://pandas.pydata.org/ .
Second Line: DataFrame is a class in pandas used to store 2D(two dimensional) data. Syntax for using DataFrame goes something like below.
Third Line: Prints the output in console to see.
  • class pandas.DataFrame(data,index,columns,dtype,copy)
D and F in DataFrame should be in capitals only. Their are 5 input arguments to DataFrame class.
  1. Data:- First is data which can be given in the form of dictionary(as in example above) or as an Array or as List see examples below:
      Example: Array
  • import pandas as pd
  • import numpy as np
  • df = pd.DataFrame(np.ones((3,4)),index=[1,2,3],columns=['A','B','C','D'])
  • print(df) 
      Example : List 
  • import pandas as pd
  • import numpy as np
  • df = pd.DataFrame([[1,1,1],[2,2,2]])
  • print(df)  
   2. Index:- It is like a unique roll no given to student in a class to uniquely identify each student. Its a unique number given to each row. It can be used later to access(read/write) data from/to the dataframe. By default index starts from 0. But its possible to start from any given number or can also use letters.

       Example : 
  • import pandas as pd
  • import numpy as np
  • df = pd.DataFrame([[1,1,1],[2,2,2]],index=['a','b'])
  • print(df)
   3. Columns:- Similarly to index unique value can be assigned to each column. Usually column should be a name that explains the data that a column hold. Example
  •  import pandas as pd
  • import numpy as np
  • df = pd.DataFrame([['tim',23],['Ruth',24]],index=['a','b'],columns=['Name','Age'])
  • print(df)
   4. dtype:- It is read as datatype, you have the option to mention the data type of the data being stored. Only a single datatype can be mentioned if none is mentioned then the datatype of each data will be understood by what is being stored. Example

      df = pd.DataFrame([[44,23],[55,24]],index=['a','b'],columns=['Name','Age'],dtype=float)
Try to change to dtype=int and see the difference.



















Comments

Post a Comment

Popular posts from this blog

Descriptive statistics - mode(), mean() and median()

Python Tokens

Python Tokens - Operators