Data Analysis using SQL

2 min readFeb 1, 2023

One of the tasks of a Data Analyst is to analyze the data, and there are different tools to do that. But this time I want to try new things which is creating tables from our data using SQL. And to do that we need to follow these steps:

Step1: Import the libraries

import numpy as np
import pandas as pd 
import sqlite3
import matplotlib.pyplot as plt
#You need to add the path of your data in this section
path = "../your path"  
database = path + 'database.sqlite'

Step2: Create the connection to the DataBase and create variables for each sheet

data = sqlite3.connect(database)
tables = pd.read_sql("""SELECT *
                        FROM sqlite_master
                        WHERE type='table';""", data)

And we do the same thing for the other sheet:

countries = pd.read_sql("""SELECT *
                        FROM Country;""", data)
leagues = pd.read_sql("""SELECT *
                        FROM League
                        JOIN Country ON Country.id = League.country_id;""", data)

Depending on your logic and your analysis you can modify the code. If you want the same data or you want to have only the 10 first elements or choose some columns you are interested in and to do this you can use “join” to select the elements you are interested in. Here is an example of a Join

detailed_matches = pd.read_sql("""SELECT Match.id, 
                                        Country.name AS country_name, 
                                        League.name AS league_name, 
                                        season, 
                                        stage, 
                                        date,
                                        HT.team_long_name AS  home_team,
                                        AT.team_long_name AS away_team,
                                        home_team_goal, 
                                        away_team_goal                                        
                                FROM Match
                                JOIN Country on Country.id = Match.country_id
                                JOIN League on League.id = Match.league_id
                                LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
                                LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
                                WHERE country_name = 'Spain'
                                ORDER by date
                                LIMIT 10;""", data)

Step3: Analysis of the Data

Finally, we can do our analysis of our data using the library Matplotlib to draw graphs to make our analysis.

df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())

df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goals'])
df.loc[:,'Spain']   = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goals'])
df.loc[:,'France']   = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goals'])
df.loc[:,'Italy']   = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goals'])
df.loc[:,'England']   = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goals'])

df.plot(figsize=(12,5),title='Over Time Goals Per Game Average')

df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())

df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goal_dif'])
df.loc[:,'Spain']   = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goal_dif'])
df.loc[:,'France']   = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goal_dif'])
df.loc[:,'Italy']   = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goal_dif'])
df.loc[:,'England']   = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goal_dif'])

df.plot(figsize=(12,5),title='Average Goals Difference In vs Out')

Conclusion

To sum up, the purpose of this article is to show that we have different ways to use our data either with SQL Queries or by reading a CSV file depending on your need and requirements. However, as a data Analyst, you should know different ways and tools to manipulate your data, so you can make good conclusions and decisions.

Data Analysis using SQL

Conclusion

Written by mahmoud chami