Basic Data Inspection#

Display Top Rows (df.head())#

This command,df.head(), displays the first five rows of the DataFrame, providing a quick glimpse of the data, including column names and some of the values.

A         B    C          D         E
0  81  0.692744  Yes 2023-01-01 -1.082325
1  54  0.316586  Yes 2023-01-02  0.031455
2  57  0.860911  Yes 2023-01-03 -2.599667
3   6  0.182256   No 2023-01-04 -0.603517
4  82  0.210502   No 2023-01-05 -0.484947

Display Bottom Rows (df.tail())#

This command,df.tail(), shows the last five rows of the DataFrame, useful for checking the end of your dataset.

    A         B    C          D         E
5  73  0.463415   No 2023-01-06 -0.442890
6  13  0.513276   No 2023-01-07 -0.289926
7  23  0.528147  Yes 2023-01-08  1.521620
8  87  0.138674  Yes 2023-01-09 -0.026802
9  39  0.005347   No 2023-01-10 -0.159331

Display Data Types (df.dtypes)#

This command, df.types(), returns the data types of each column in the DataFrame. It's helpful to understand the kind of data (integers, floats, strings, etc.) each column holds.

A             int64
B           float64
C            object
D    datetime64[ns]
E           float64

Summary Statistics (df.describe())#

This command, df.describe(), provides descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset’s distribution, excluding NaN values. It's useful for a quick statistical overview.

            A          B          E
count  10.000000  10.000000  10.000000
mean   51.500000   0.391186  -0.413633
std    29.963867   0.267698   1.024197
min     6.000000   0.005347  -2.599667
25%    27.000000   0.189317  -0.573874
50%    55.500000   0.390001  -0.366408
75%    79.000000   0.524429  -0.059934
max    87.000000   0.860911   1.521620

Display Index, Columns, and Data (df.info())#

This command, df.info(), provides a concise summary of the DataFrame, including the number of non-null values in each column and the memory usage. It's essential for initial data assessment.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
#   Column  Non-Null Count  Dtype
---  ------  --------------  -----
0   A       10 non-null     int64
1   B       10 non-null     float64
2   C       10 non-null     object
3   D       10 non-null     datetime64[ns]
4   E       10 non-null     float64
dtypes: datetime64[ns](1), float64(2), int64(1), object(1)
memory usage: 528.0 bytes