DataFrame Concept, Usage
Dataframe is a 2D-dimensional data structure, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
a. Two Dimensional - data is stored in the form of rows and columns. Something like below which consists of 3 rows and 2 columns.
Python Code:
First Line : pandas is a python library which has several built in functions to use for data analysis. DataFrame, Series, Panel are few data structures available inside pandas. For more info refer https://pandas.pydata.org/ .
Second Line: DataFrame is a class in pandas used to store 2D(two dimensional) data. Syntax for using DataFrame goes something like below.
Third Line: Prints the output in console to see.
Example :
a. Two Dimensional - data is stored in the form of rows and columns. Something like below which consists of 3 rows and 2 columns.
Output: Name | Age |
---|---|
1 Sunny | 32 |
2 Rocky | 22 |
3 Tim | 41 |
Python Code:
- import pandas as pd
- df = pd.DataFrame({'Name':['sunny','rocky','tim'],'Age':[32,22,41]},index=[1,2,3])
- print(df)
First Line : pandas is a python library which has several built in functions to use for data analysis. DataFrame, Series, Panel are few data structures available inside pandas. For more info refer https://pandas.pydata.org/ .
Second Line: DataFrame is a class in pandas used to store 2D(two dimensional) data. Syntax for using DataFrame goes something like below.
Third Line: Prints the output in console to see.
- class pandas.DataFrame(data,index,columns,dtype,copy)
D and F in DataFrame should be in capitals only. Their are 5 input arguments to DataFrame class.
- Data:- First is data which can be given in the form of dictionary(as in example above) or as an Array or as List see examples below:
- import pandas as pd
- import numpy as np
- df = pd.DataFrame(np.ones((3,4)),index=[1,2,3],columns=['A','B','C','D'])
- print(df)
- import pandas as pd
- import numpy as np
- df = pd.DataFrame([[1,1,1],[2,2,2]])
- print(df)
Example :
- import pandas as pd
- import numpy as np
- df = pd.DataFrame([[1,1,1],[2,2,2]],index=['a','b'])
- print(df)
3. Columns:- Similarly to index unique value can be assigned to each column. Usually column should be a name that explains the data that a column hold. Example
- import pandas as pd
- import numpy as np
- df = pd.DataFrame([['tim',23],['Ruth',24]],index=['a','b'],columns=['Name','Age'])
- print(df)
4. dtype:- It is read as datatype, you have the option to mention the data type of the data being stored. Only a single datatype can be mentioned if none is mentioned then the datatype of each data will be understood by what is being stored. Example
df = pd.DataFrame([[44,23],[55,24]],index=['a','b'],columns=['Name','Age'],dtype=float)
Try to change to dtype=int and see the difference.
good one
ReplyDelete