Contents

Premier League Last 5 Seasons

The project aims to uncover interesting patterns and trends in recent Premier League matches. For further details, please refer to the GitHub repository linked here.

Premier League Last 5 Seasons

This data analysis project focuses on analyzing Premier League match data from the last 5 seasons. The dataset includes match scores, expected goals (xG), attendances, referees, and more. The project is implemented in Python using popular data analysis and visualization libraries.

Dataset Information

English Premier League Last 5 Season Match Scores And Expected Goals (xG) In this dataset, there are the matches of the last 5 seasons in English Premier League, the dates of the matches, the local time of match, results, the expected goals for both team, the referees, the total attendances and the match venue stadium about these matches. It contains the following columns:

  • Week
  • Date
  • Time
  • Home_Team
  • Home_xG
  • Score
  • Away_xG
  • Away_Team
  • Attendance
  • Venue
  • Referee

Project Tasks and Analysis

Data Preprocessing

  • Handling missing data.
  • Converting date and time columns to DateTime objects.
  • Ensuring appropriate data types for analysis.

Overall Team Performance

  • Analyzing the overall win-loss-draw distribution of teams over the last 5 seasons.

Seasonal Analysis

  • Exploring how the average attendance varied across seasons.
  • Identifying the season with the highest number of total goals scored.

Team Performance

  • Determining which team had the best overall win percentage at home and away.
  • Investigating the correlation between a team’s xG and their performance in terms of wins.
  • Identifying the team with the highest total xG over the last 5 seasons.

Referees Analysis

  • Identifying the top referees in terms of the number of matches officiated.
  • Analyzing the correlation between the referee and the number of goals scored in a match.

Match Analysis

  • Analyzing the distribution of match scores across all matches.
  • Identifying any stadiums where significantly more goals are scored on average.
  • Investigating trends in match outcomes based on the day of the week or time of day.
  • Analyzing attendance variation for matches held on weekdays vs. weekends.
  • Exploring the correlation between a team’s performance and attendance at their home matches.

Top Performers and Underperformers

  • Identifying teams that consistently overperformed or underperformed based on their xG.

Known Team Traits

  • Identifying teams known for early leads or late comebacks.

Code Implementation

The code for this data analysis project is implemented in Python, utilizing the following libraries:

  • Pandas: Data manipulation and analysis.
  • NumPy: Numerical computations.
  • Matplotlib and Seaborn: Data visualization.

Each analysis task is represented by Python code snippets, which can be found in the project’s main script.

Running the Project

To run this data analysis project, make sure you have Python installed, along with the required libraries mentioned above. Clone or download this project repository and run the main script, following the comments and code snippets for each analysis task.

Acknowledgments

The dataset was sourced from Kaggle.

Feel free to explore and modify the project to gain insights from the “Premier League Last 5 Seasons Match Scores and xGs Dataset.”

Author

Feel free to contact me for any questions or additional information about this project.