Top scorers in soccer Leagues

mahmoud chami
8 min readMar 1, 2023

--

Introduction

This dataset includes top football leagues scores their goals, Country, Club, matches played, substitution, min, Goals, xG,…

Note: xG & xG Per Avg Match is a statistical value that is supported by the website I scraped the data from (Infogol)

On the website(Infogol.net) there are league tables and statistics from some of the top competitions from all around the world, including the English Premier League, English Championship, Spanish La Liga, Italian Serie A, German Bundesliga, French Ligue 1, US MLS and Brazilian Série A. Choose the competition you are interested in to get the actual league table.

For those who are not familiar with football, the purpose of having a domestic league in each country is to compete against other teams from the same country to gain points.

Analysis ideas 💡

In this article, we will know who is the best player in all the leagues we have in our dataset.

The top players selected for our analysis are those who are featured for all 5 years [2016–2020]. The winner will be decided based on:

  1. Goals 🥅⚽
  2. Minutes Per On-target Shot ⏳ ⏱
  3. Minutes Per Goal ⏳
  4. Goals -xG
  5. Target Accuracy 🎯
  6. Shots efficiency

Explore Data

  1. Import the libraries

The libraries we are going to use in the project are:

Pandas, seaborn, matplotlib

2) Import Data

data = pd.read_csv("your_path/top-football-leagues-scorers/Data.csv")
data.head()

Here we have a view of our dataset, we usually use “data.head’ to show the 5 first rows of our data.

3) Explore and analyze the data

To know more details about our data, the number of rows, columns the null values the type of each data we can use these lines:

data.shape #To know the number of columns and rows
data.info() # To know the type of our data and the number of columns
data.isnull().sum() # To know the null values
#   Column                   Non-Null Count  Dtype  
--- ------ -------------- -----
0 Country 660 non-null object
1 League 660 non-null object
2 Club 660 non-null object
3 Player Names 660 non-null object
4 Matches_Played 660 non-null int64
5 Substitution 660 non-null int64
6 Mins 660 non-null int64
7 Goals 660 non-null int64
8 xG 660 non-null float64
9 xG Per Avg Match 660 non-null float64
10 Shots 660 non-null int64
11 OnTarget 660 non-null int64
12 Shots Per Avg Match 660 non-null float64
13 On Target Per Avg Match 660 non-null float64
14 Year 660 non-null int64

Before we begin our analysis, we need to understand the meaning of each column:

Country: Country of the League
League:
League name
Club:
Club name (None) means the value is not defined so you can treat it as a nan value
Player Names:
The player's name
Matches_Played:
number of matches played
Substitution:
number of substitutions
Mins:
number of min. player played
Goals:
number of goals
xG:
Expected Goals using some statistics
xG Per Avg Match:
Expected Goals using some statistics per avg match

Let’s analyze our data based on the criteria we stated earlier:

1) Goals

Goals are the basic parameter you can use to determine the contribution of a player and to know how many goals he scored during the period he played.

plt.figure(figsize = (12,8))
df_top_by_goals = df_top.sort_values(by=['Goals'], ascending=False)
ax = sns.barplot(x='Goals', y='Player Names', data=df_top_by_goals)
ax.set_xlabel('Goals')

In this section, our winner is “Lionel Messi” with 135 goals.

2) Robert Lewandowski (2nd)
3) Cristiano Ronaldo (3rd)
4) Ciro Immobile (4th)
5) Luis Suarez (5th)

2) Minutes Per On-Target Shot

df_top_by_mins_shot = df_top.sort_values(by=['Mins_per_Target'], ascending=False)

import plotly.graph_objects as go

fig = go.Figure(go.Bar(
x=df_top_by_mins_shot['Mins_per_Target'],
y=df_top_by_mins_shot['Player Names'],
orientation='h'))

fig.show()

In this section, our winner again is “ Lionel Messi”, then we have:

2) Cristiano Ronaldo
3) Robert Lewandowski
4) Ciro Immobile
5) Luis Suarez

3) Minutes Per Goal

This element is extremely important in determining how frequently the player finds the back of the net. Any figure less than 90 minutes represents a goal-per-game expectation. Yet, with an average additional time of 4 minutes in one match, the overall length for one game is roughly 94 minutes.

df_mins_per_goal = df_top.sort_values(by=['Mins_per_goal'], ascending=False)

fig = go.Figure(go.Bar(
x=df_mins_per_goal['Mins_per_goal'],
y=df_mins_per_goal['Player Names'],
marker=dict(color = [10*i for i in range(1,len(df_mins_per_goal['Mins_per_goal']))],
colorscale='viridis'),
orientation='h'))

fig.show()

In this section our winner is “ Robert Lewandowski” and then we have:

2) Lionel Messi
3) Cristiano Ronaldo
4) Luis Suarez
5) Ciro Immobile

4) Goals-xG Score

The graph below depicts the discrepancy between Objectives and Expected Goals over the course of five years. The winner of this category is the one who has outperformed expectations over the years.

fig = go.Figure()
Goals_diff_sum = {}
Goals_diff_mean = {}
for i in players_5years:
df_player = df_topplayers[df_topplayers['Player Names'].isin([i])]
Goals_diff_sum[i] = df_player['Goals_xG_Diff'].sum()
Goals_diff_mean[i] = df_player['Goals_xG_Diff'].mean()
fig.add_trace(go.Scatter(x=df_player['Year'], y=df_player["Goals_xG_Diff"],
mode='lines+markers',
name=i))
fig.update_layout(
autosize=False,
width=900,
height=600,)

fig.show()

Observations based on the graph:
1) Immobile and Aspas, who have never been a star striker at any of the big teams, have outperformed expectations throughout the course of the season.
2) Robert Lewandowski, the top player in 2020, has not been the most consistent in meeting expectations throughout this era.

df1 = pd.Series(Goals_diff_sum).sort_values(ascending = False).reset_index()
df2 = pd.Series(Goals_diff_mean).sort_values(ascending = False).reset_index()

df_final = pd.merge(df1, df2, on="index")
df_final.columns = ["Player", "Sum","Mean"]

df_final
 Player Sum Mean
0 Lionel Messi 23.23 4.646
1 Ciro Immobile 22.04 4.408
2 Timo Werner 16.02 3.204
3 Iago Aspas 13.12 2.624
4 Andrea Belotti 6.77 1.354
5 Andrej Kramaric 6.57 1.314
6 Luis Suarez 3.64 0.728
7 Cristiano Ronaldo 3.04 0.608
8 Robert Lewandowski 1.89 0.378
9 Fabio Quagliarella 0.26 0.052

To sum up, in this section the winner is “ Lionel Messi” and then we have:

2) Ciro Immobile
3) Timo Werner
4) Iago Aspas
5) Andrea Belloti

5) Target Accuracy

The average number of shots on target every match is a critical metric. With all of the goal scorers’ incredible finishing abilities, a higher proportion of target accuracy would result in a higher likelihood of scoring. The final score, computed as the mean of the average % change in target accuracy and mean target accuracy over a four-year period, will be used to determine the winner of this category.

Average_TA_deviation = {}
Mean_TA = {}
fig = go.Figure()

for i in players_5years:
df_player = df_topplayers[df_topplayers['Player Names'].isin([i])]
Average_TA_deviation[i] = df_player['Target_Accuracy_Per_Game'].pct_change().mean()*100
Mean_TA[i] = df_player['Target_Accuracy_Per_Game'].mean()
fig.add_trace(go.Scatter(x=df_player['Year'], y=df_player["Target_Accuracy_Per_Game"],
mode='lines+markers',
name=i))
fig.update_layout(
autosize=False,
width=900,
height=600,)

fig.show()

What we can notice from the graph:

1. Cristiano Ronaldo and Iago Aspas may not have the best proportion of shots on target every year, but they have been the most consistent during the span.

2. Older players in the football career phase, such as Ciro Immobile and Fabio Quagliarella, struggle to strike the target on a continuous basis, but this might be owing to the sides they play for, which may present them with fewer chances due to a lack of inventiveness in the midfield.
3. Andrej Kramaric wins this category with the highest mean target accuracy and a reasonable positive percentage deviation during the duration.

df1 = pd.Series(Average_TA_deviation).sort_values(ascending = False).reset_index()
df2 = pd.Series(Mean_TA).sort_values(ascending = False).reset_index()


df_final = pd.merge(df1, df2, on="index")
df_final.columns = ["Player","Average Percentage Change", "Mean Target Accuracy"]
df_final['Score'] = (df_final["Average Percentage Change"] + df_final["Mean Target Accuracy"])/2
df_final = df_final.sort_values(by = "Score", ascending = False)
df_final
 Player Average Percentage Change Mean Target Accuracy Score
3 Andrej Kramaric 3.255496 51.243822 27.249659
1 Iago Aspas 5.304528 48.613196 26.958862
2 Lionel Messi 4.253122 46.665014 25.459068
0 Cristiano Ronaldo 8.985494 41.205547 25.095520
5 Robert Lewandowski -2.288147 47.051357 22.381605
6 Luis Suarez -4.807514 46.894190 21.043338
7 Timo Werner -6.366987 48.034306 20.833659
4 Andrea Belotti -2.173794 43.112528 20.469367
9 Ciro Immobile -13.020654 49.264986 18.122166
8 Fabio Quagliarella -9.209771 43.411389 17.100809

In this section, the winner is “Andrej Kramaric”, and then we have:

2) Iago Aspas
3) Lionel Messi
4) Cristiano Ronaldo
5) Robert Lewandowski

6) Shots Efficiency

This category translates all players’ shot conversion to goals. The winner of this category will be determined by the final score, which is computed as the average percentage change in shot efficiency and mean shot efficiency over a four-year period.

Shots_efficiency_Avg_PC = {}
Mean_SE = {}

fig = go.Figure()

for i in players_5years:
df_player = df_topplayers[df_topplayers['Player Names'].isin([i])]
Shots_efficiency_Avg_PC[i] = df_player['Shots_to_Goal_conversion'].pct_change().mean()*100
Mean_SE[i] = df_player['Shots_to_Goal_conversion'].mean()
fig.add_trace(go.Scatter(x=df_player['Year'], y=df_player["Shots_to_Goal_conversion"],
mode='lines+markers',
name=i))

fig.update_layout(
autosize=False,
width=900,
height=600,)

fig.show()

Observations based on the graph:

  1. Fabio Quagliarella’s shooting efficiency has gradually grown from 17.14 percent in 2016 to 55.55 percent in 2020. And he’s 37, which is considered the final year of a football career.
    2. Lionel Messi, widely recognized as the game’s GOAT (Best of All Time), has fallen in shot-to-goal conversion, falling to the bottom of the list.
    3. All of the so-called “underrated players,” such as Quagliarella, Immobile, Kramaric, and Belloti, have a higher shot-to-goal conversion rate than two absolute football titans, Ronaldo and Messi.
df1 = pd.Series(Shots_efficiency_Avg_PC).sort_values(ascending = False).reset_index()
df2 = pd.Series(Mean_SE).sort_values(ascending = False).reset_index()


df_final = pd.merge(df1, df2, on="index")
df_final.columns = ["Player","Average Percentage Change", "Mean Shot Efficiency"]
df_final['Score'] = (df_final["Average Percentage Change"] + df_final["Mean Shot Efficiency"])/2
df_final = df_final.sort_values(by = "Score", ascending = False)
df_final
Player Average Percentage Change Mean Shot Efficiency Score
1 Ciro Immobile 47.145872 44.196486 45.671179
0 Fabio Quagliarella 49.261196 38.209073 43.735135
2 Robert Lewandowski 22.753646 50.611837 36.682741
3 Andrej Kramaric 21.200261 40.305215 30.752738
4 Andrea Belotti 20.462291 40.724653 30.593472
6 Luis Suarez 10.366390 46.275897 28.321144
5 Cristiano Ronaldo 14.066269 41.507703 27.786986
8 Iago Aspas -1.175195 48.515649 23.670227
7 Timo Werner 0.486757 39.994017 20.240387
9 Lionel Messi -15.853799 36.523540 10.33487

For this and the last section, the winner is “Ciro Immobile”, and then we have:

2) Fabio Quagliarella
3) Robert Lewandowski
4) Andrej Kramaric
5) Andrea Belotti

Conclusion

At the end of this amazing article and project, we can sum the points for each section and we will have this classification:

1. Lionel Messi (22/30) (The one and only)

  1. Robert Lewandowski (16/30)
  2. Ciro Immobile (14/30)
  3. Cristiano Ronaldo (12/30)
  4. Andrej Kramaric (7/30)

No more words ladies and gentlemen, in the period of 2016–2020 the best player is MEEEEEESSIIIIII, and lately, in 2022 he won the world cup :
To conclude, in this article we used our dataset to analyze and know who is the best player in 2016–2020, and also to enjoy doing some data analysis. I hope you like this article, if you have any questions or remarks comment below.

Datasource: www.Infogol.net

--

--

mahmoud chami

I am Mahmoud Chami, I am an international polyvalent engineering student at the Institute of Advanced Industrial Technologie.