European Football
All European football match results since 2005.
European Football Analysis
Match Outcome Analysis
1. Data Preparation
-
Load the dataset from the provided SQLite database (european_database.sqlite).
1 2 3 4 5 6 7 8 9 10 11 12
import sqlite3 conn = sqlite3.connect('european_database.sqlite') cursor = conn.cursor() cursor.execute("SELECT * FROM matchs LIMIT 10") rows = cursor.fetchall() for row in rows: print(row) conn.close()
The Output loot like this:
|
|
-
Explore the ‘matchs’ table schema to understand its structure and columns.
1 2 3 4 5 6 7 8 9 10 11 12
import sqlite3 conn = sqlite3.connect('european_database.sqlite') cursor = conn.cursor() cursor.execute("PRAGMA table_info(matchs)") table_info = cursor.fetchall() for column_info in table_info: print(f"Column: {column_info[1]}, Type: {column_info[2]}") conn.close()
The Output loot like this:
|
|
2. Data Cleaning
-
Check for missing values in the ‘FTR’ column:
1 2 3 4 5 6 7 8 9 10 11
import sqlite3 conn = sqlite3.connect('european_database.sqlite') cursor = conn.cursor() cursor.execute("SELECT COUNT(*) FROM matchs WHERE FTR IS NULL") missing_values = cursor.fetchone()[0] print(f"Missing values in 'FTR': {missing_values}") conn.close()
The Output loot like this:
|
|
-
Validate values in the ‘FTR’ column:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
import sqlite3 conn = sqlite3.connect('european_database.sqlite') cursor = conn.cursor() cursor.execute("SELECT DISTINCT FTR FROM matchs WHERE FTR NOT IN ('H', 'A', 'D')") invalid_values = cursor.fetchall() if invalid_values: print("Invalid FTR values found:") for value in invalid_values: print(value[0]) else: print("No invalid values in FTR") conn.close()
The Output loot like this:
|
|
3. Data Analysis
-
Aggregate data by season and league:
1 2 3 4 5 6 7 8 9
SELECT season, Div AS League, COUNT(CASE WHEN FTR = 'H' THEN 1 END) AS HomeWins, COUNT(CASE WHEN FTR = 'A' THEN 1 END) AS AwayWins, COUNT(CASE WHEN FTR = 'D' THEN 1 END) AS Draws FROM matchs GROUP BY season, League ORDER BY season, League
The rendered output looks like this:
- SQL query to calculate home win percentages for teams
1 2 3 4 5 6 7 8 9
SELECT HomeTeam AS Team, COUNT(CASE WHEN FTR = 'H' THEN 1 END) AS HomeWins, COUNT(*) AS TotalMatches, (COUNT(CASE WHEN FTR = 'H' THEN 1 END) * 100.0 / COUNT(*)) AS HomeWinPercentage FROM matchs GROUP BY HomeTeam HAVING TotalMatches >= 50 -- Optional: Filter for teams with a minimum number of matches played ORDER BY HomeWinPercentage DESC;
- SQL query to calculate away win percentages for teams
1 2 3 4 5 6 7 8 9
SELECT AwayTeam AS Team, COUNT(CASE WHEN FTR = 'A' THEN 1 END) AS AwayWins, COUNT(*) AS TotalMatches, (COUNT(CASE WHEN FTR = 'A' THEN 1 END) * 100.0 / COUNT(*)) AS AwayWinPercentage FROM matchs GROUP BY AwayTeam HAVING TotalMatches >= 50 -- Optional: Filter for teams with a minimum number of matches played ORDER BY AwayWinPercentage DESC;
- SQL query to calculate win percentages for each team over multiple seasons (both home and away)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
SELECT Team, COUNT(CASE WHEN FTR = 'H' THEN 1 END) AS HomeWins, COUNT(CASE WHEN FTR = 'A' THEN 1 END) AS AwayWins, COUNT(*) AS TotalMatches, (COUNT(CASE WHEN FTR = 'H' THEN 1 END) * 100.0 / COUNT(*)) AS HomeWinPercentage, (COUNT(CASE WHEN FTR = 'A' THEN 1 END) * 100.0 / COUNT(*)) AS AwayWinPercentage FROM ( SELECT HomeTeam AS Team, FTR FROM matchs UNION ALL SELECT AwayTeam AS Team, FTR FROM matchs ) AS AllMatches GROUP BY Team HAVING TotalMatches >= 100 -- Optional: Filter for teams with a minimum number of matches played ORDER BY HomeWinPercentage DESC, AwayWinPercentage DESC;
4. Visualization
-
Visualize match outcome trends over seasons for each league.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# Execute the query and fetch the results into a DataFrame df = pd.read_sql_query(query, conn) # Identify any patterns or trends in match outcomes over the years df['TotalMatches'] = df['HomeWins'] + df['AwayWins'] + df['Draws'] df['HomeWinPercentage'] = (df['HomeWins'] / df['TotalMatches']) * 100 df['AwayWinPercentage'] = (df['AwayWins'] / df['TotalMatches']) * 100 df['DrawPercentage'] = (df['Draws'] / df['TotalMatches']) * 100 # Filter data for a specific league (e.g., 'B1') filtered_df = df[df['League'] == 'B1'] # Create a line chart for Home win percentages plt.figure(figsize=(10, 6)) plt.plot(filtered_df['season'], filtered_df['HomeWinPercentage'], marker='o', linestyle='-', label='Home Win Percentage') plt.xlabel('season') plt.ylabel('Percentage') plt.title('Home Win Percentage Over the Years (League B1)') plt.legend() plt.grid(True) plt.show()
-
List teams with highest home and away win percentages.
-
Present historical analysis of teams with consistent performance.
-
Summarize key insights and findings from the analysis.
Goal Analysis
1. Average goals per match
- Calculate the average number of goals per match in each league and season:
1 2 3 4 5 6 7
SELECT season, Div AS League, AVG(FTHG + FTAG) AS AverageGoalsPerMatch FROM matchs GROUP BY season, League ORDER BY season, League;
In this SQL query:
- We select the ‘season’ and ‘Div’ columns to group the data by season and league.
- We calculate the average number of goals per match by taking the average of the sum of ‘FTHG’ (final-time home-team goals) and ‘FTAG’ (final-time away-team goals).
- We use the GROUP BY clause to group the data by ‘season’ and ‘Div’ to calculate the average goals for each league and season.
- Finally, we order the results by ‘season’ and ‘Div’ for clarity.
2. Teams scoring in matches
- Calculate the average number of goals scored in matches involving each team::
1 2 3 4 5 6 7 8 9 10 11 12 13
SELECT Team, AVG(TotalGoals) AS AverageGoalsPerMatch FROM ( SELECT HomeTeam AS Team, (FTHG + FTAG) AS TotalGoals FROM matchs UNION ALL SELECT AwayTeam AS Team, (FTHG + FTAG) AS TotalGoals FROM matchs ) AS AllMatches GROUP BY Team HAVING COUNT(*) >= 100 -- Optional: Filter for teams involved in a minimum number of matches ORDER BY AverageGoalsPerMatch DESC;
3. Home vs. Away goals
-
Difference in goal statistics between home and away matches: To determine if there is a significant difference in goal statistics between home and away matches, we can perform a statistical analysis. Specifically, we can conduct a hypothesis test to compare the means of goals scored in home matches and away matches. A commonly used test for this purpose is the independent samples t-test. Here’s a step-by-step approach we made:
-
Step 1: Prepare two datasets:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
import sqlite3 # Connect to the SQLite database conn = sqlite3.connect('european_database.sqlite') cursor = conn.cursor() # Select goals scored in home matches cursor.execute("SELECT FTHG FROM matchs") home_goals = [row[0] for row in cursor.fetchall()] # Select goals scored in away matches cursor.execute("SELECT FTAG FROM matchs") away_goals = [row[0] for row in cursor.fetchall()] # Close the database connection conn.close()
-
Step 2: Hypothesis Testing: Perform a two-sample independent t-test to compare the means of the two datasets (home goals and away goals). The null hypothesis (H0) is that there is no significant difference in the means, while the alternative hypothesis (H1) is that there is a significant difference.
1 2 3 4 5 6 7 8 9 10 11 12 13
import scipy.stats as stats # Perform the t-test t_stat, p_value = stats.ttest_ind(home_goals, away_goals) # Set the significance level (alpha) alpha = 0.05 # Check the p-value against alpha if p_value < alpha: print("Reject the null hypothesis: There is a significant difference in goal statistics between home and away matches.") else: print("Fail to reject the null hypothesis: There is no significant difference in goal statistics between home and away matches.")
The result:
1
Reject the null hypothesis: There is a significant difference in goal statistics between home and away matches.
Based on the analysis, we have rejected the null hypothesis, indicating that there is a significant difference in goal statistics between home and away matches in our dataset. This result suggests that there is a statistically significant distinction between the average number of goals scored in home matches and away matches.
Team Performance Analysis
1. Team ranking by performance
- Ranking teams based on their overall performance, assuming a simple point system:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
WITH TeamPoints AS ( SELECT HomeTeam AS Team, SUM(CASE WHEN FTR = 'H' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS Points FROM matchs GROUP BY Team UNION ALL SELECT AwayTeam AS Team, SUM(CASE WHEN FTR = 'A' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS Points FROM matchs GROUP BY Team ) SELECT t.Team, SUM(tp.Points) AS TotalPoints, SUM(t.FTHG) - SUM(t.FTAG) AS GoalDifference FROM TeamPoints tp JOIN ( SELECT DISTINCT HomeTeam AS Team, FTHG, FTAG FROM matchs UNION ALL SELECT DISTINCT AwayTeam AS Team, FTAG, FTHG FROM matchs ) t ON tp.Team = t.Team GROUP BY t.Team ORDER BY TotalPoints DESC, GoalDifference DESC;
The rendered output looks like this:
In this query:
- We first calculate the points earned by each team based on their match results, assuming 3 points for a win, 1 point for a draw, and 0 points for a loss. We use a common table expression (CTE) called TeamPoints for this purpose.
- Then, we calculate the goal difference for each team based on goals scored (‘FTHG’ for home matches and ‘FTAG’ for away matches).
- Finally, we join the two sets of calculated data (points and goal differences), calculate the total points, and rank teams based on total points and goal differences.
Visualize the top 10 result:
2. Evolution performance
-
To analyze how the performance of a specific team, such as Real Madrid, has evolved over the years, we can create a line chart that shows relevant performance metrics (e.g., total points or goal difference) for Real Madrid in each season. Here’s how we can do it using Python and Matplotlib:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# Define the team name (e.g., 'Real Madrid') team_name = 'Real Madrid' # Execute an SQL query to retrieve relevant data for Real Madrid cursor.execute(''' SELECT season, SUM(CASE WHEN HomeTeam = ? THEN FTHG ELSE FTAG END) - SUM(CASE WHEN AwayTeam = ? THEN FTHG ELSE FTAG END) AS GoalDifference, SUM(CASE WHEN HomeTeam = ? AND FTR = 'H' THEN 3 WHEN AwayTeam = ? AND FTR = 'A' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS TotalPoints FROM matchs WHERE HomeTeam = ? OR AwayTeam = ? GROUP BY season ORDER BY season ''', (team_name, team_name, team_name, team_name, team_name, team_name)) # Fetch the results results = cursor.fetchall()
-
Now, we can create a line chart to visualize how Real Madrid’s performance has evolved over the years based on the retrieved data. Here’s an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
# Extract season, goal difference, and total points from the results seasons = [row[0] for row in results] goal_differences = [row[1] for row in results] total_points = [row[2] for row in results] # Create a line chart to visualize Real Madrid's goal difference over the years plt.figure(figsize=(12, 6)) plt.plot(seasons, goal_differences, marker='o', linestyle='-', label='Goal Difference') plt.xlabel('Season') plt.ylabel('Goal Difference') plt.title('Real Madrid Performance: Goal Difference Over the Years') plt.legend() plt.grid(True) plt.xticks(rotation=45) # Rotate x-axis labels for readability # Create a line chart to visualize Real Madrid's total points over the years plt.figure(figsize=(12, 6)) plt.plot(seasons, total_points, marker='o', linestyle='-', label='Total Points', color='orange') plt.xlabel('Season') plt.ylabel('Total Points') plt.title('Real Madrid Performance: Total Points Over the Years') plt.legend() plt.grid(True) plt.xticks(rotation=45) plt.show()
3. Performance changes
-
To identify teams that have experienced dramatic changes in performance in specific seasons, you can analyze their point differentials or goal differences between consecutive seasons.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# Define the team name (e.g., 'Real Madrid') team_name = 'Real Madrid' # Execute an SQL query to retrieve relevant data for the specified team cursor.execute(''' SELECT season, SUM(CASE WHEN HomeTeam = ? THEN FTHG ELSE FTAG END) - SUM(CASE WHEN AwayTeam = ? THEN FTHG ELSE FTAG END) AS GoalDifference, SUM(CASE WHEN HomeTeam = ? AND FTR = 'H' THEN 3 WHEN AwayTeam = ? AND FTR = 'A' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS TotalPoints FROM matchs WHERE (HomeTeam = ? OR AwayTeam = ?) GROUP BY season ORDER BY season ''', (team_name, team_name, team_name, team_name, team_name, team_name)) # Fetch the results results = cursor.fetchall() #%% # Extract season and goal difference from the results seasons = [row[0] for row in results] goal_differences = [row[1] for row in results] # Calculate changes in goal difference between consecutive seasons goal_difference_changes = [goal_differences[i] - goal_differences[i - 1] for i in range(1, le(goal_differences))] # Identify seasons with dramatic changes in goal difference dramatic_changes = [] for i, change in enumerate(goal_difference_changes): if abs(change) >= 20: # Adjust the threshold as needed dramatic_changes.append((seasons[i], seasons[i + 1], change)) # Display seasons with dramatic changes in performance for change in dramatic_changes: print(f"Season {change[0]} to {change[1]}: Change in Goal Difference = {change[2]}")
-
The result:
1 2 3 4 5 6
Season 2007 to 2008: Change in Goal Difference = 22.0 Season 2009 to 2010: Change in Goal Difference = 36.0 Season 2011 to 2012: Change in Goal Difference = 20.0 Season 2012 to 2013: Change in Goal Difference = -28.0 Season 2018 to 2019: Change in Goal Difference = -33.0 Season 2019 to 2020: Change in Goal Difference = 28.0
Seasonal Trends
1. Team Performances
-
Step 1: Define the Team (e.g., Real Madrid)
1 2
# Define the team name (e.g., 'Real Madrid') team_name = 'Real Madrid'
-
Step 2: Retrieve Data for the Team
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
import sqlite3 # Connect to the SQLite database conn = sqlite3.connect('european_database.sqlite') cursor = conn.cursor() # Execute an SQL query to retrieve relevant data for the specified team cursor.execute(''' SELECT season, strftime('%Y-%m', Date) AS Month, SUM(CASE WHEN HomeTeam = ? THEN FTHG ELSE FTAG END) - SUM(CASE WHEN AwayTeam = ? THEN FTHG ELSE FTAG END) AS GoalDifference, SUM(CASE WHEN HomeTeam = ? AND FTR = 'H' THEN 3 WHEN AwayTeam = ? AND FTR = 'A' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS TotalPoints FROM matchs WHERE (HomeTeam = ? OR AwayTeam = ?) GROUP BY season, Month ORDER BY season, Month ''', (team_name, team_name, team_name, team_name, team_name, team_name)) # Fetch the results results = cursor.fetchall() results
1 2 3 4 5 6 7 8 9
[(2006, '2005-08', 1.0, 3), (2006, '2005-09', 3.0, 6), (2006, '2005-10', 6.0, 9), (2006, '2005-11', -2.0, 4), ... (2021, '2021-02', 6.0, 12), (2021, '2021-03', 3.0, 8), (2021, '2021-04', 6.0, 11), (2021, '2021-05', 7.0, 13)]
-
Step 3: Analyze Seasonal Trends
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
import matplotlib.pyplot as plt # Extract season, month, and performance metrics from the results seasons = [row[0] for row in results] months = [row[1] for row in results] goal_differences = [row[2] for row in results] total_points = [row[3] for row in results] # Create line charts to visualize the team's performance metrics over time plt.figure(figsize=(12, 6)) plt.plot(seasons, goal_differences, marker='o', linestyle='-', label='Goal Difference') plt.xlabel('Season') plt.ylabel('Goal Difference') plt.title(f'{team_name} Performance: Goal Difference Over Time') plt.legend() plt.grid(True) plt.xticks(rotation=45) # Rotate x-axis labels for readability plt.show() plt.figure(figsize=(12, 6)) plt.plot(seasons, total_points, marker='o', linestyle='-', label='Total Points', color='orange') plt.xlabel('Season') plt.ylabel('Total Points') plt.title(f'{team_name} Performance: Total Points Over Time') plt.legend() plt.grid(True) plt.xticks(rotation=45) plt.show()
2. Yearly Performance
-
Step 1: Define Performance Metrics
1 2
# Define the performance metric to analyze (e.g., 'TotalPoints' or 'GoalDifference') performance_metric = 'TotalPoints'
-
Step 2: Retrieve Data for All Teams
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
# Execute an SQL query to retrieve relevant data for all teams and seasons cursor.execute(''' SELECT season, HomeTeam AS Team, SUM(CASE WHEN FTR = 'H' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS PointsHome, SUM(CASE WHEN FTR = 'H' THEN FTHG ELSE 0 END) AS GoalsForHome, SUM(CASE WHEN FTR = 'H' THEN FTAG ELSE 0 END) AS GoalsAgainstHome, SUM(CASE WHEN FTR = 'A' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS PointsAway, SUM(CASE WHEN FTR = 'A' THEN FTAG ELSE 0 END) AS GoalsForAway, SUM(CASE WHEN FTR = 'A' THEN FTHG ELSE 0 END) AS GoalsAgainstAway FROM matchs GROUP BY season, Team ORDER BY season, Team ''') # Fetch the results results = cursor.fetchall()
-
Step 3: Analyze Consistent Performance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
# Extract season, team, and performance metrics from the results seasons = [row[0] for row in results] teams = [row[1] for row in results] points_home = [row[2] for row in results] goals_for_home = [row[3] for row in results] goals_against_home = [row[4] for row in results] points_away = [row[5] for row in results] goals_for_away = [row[6] for row in results] goals_against_away = [row[7] for row in results] # Calculate the total points and goal differences for each team in each season if performance_metric == 'TotalPoints': total_points = [points_home[i] + points_away[i] for i in range(len(points_home))] performance_values = total_points else: goal_differences = [(goals_for_home[i] + goals_for_away[i]) - (goals_against_home[i] + goals_against_away[i]) for i in range(len(goals_for_home))] performance_values = goal_differences # Identify teams that consistently perform better or worse consistent_teams = {} for i, team in enumerate(teams): if team not in consistent_teams: consistent_teams[team] = {'Seasons': [seasons[i]], 'PerformanceValues': [performance_values[i]]} else: consistent_teams[team]['Seasons'].append(seasons[i]) consistent_teams[team]['PerformanceValues'].append(performance_values[i]) # Analyze and print teams with consistent better or worse performance for team, data in consistent_teams.items(): min_performance = min(data['PerformanceValues']) max_performance = max(data['PerformanceValues']) if min_performance < max_performance: print(f"{team}: Consistently performs better with a minimum {performance_metric} of {min_performance} and maximum {performance_metric} of {max_performance}.") elif min_performance > max_performance: print(f"{team}: Consistently performs worse with a minimum {performance_metric} of {min_performance} and maximum {performance_metric} of {max_performance}.") else: print(f"{team}: Performance varies across seasons with a consistent {performance_metric} of {min_performance}.")
Home Advantage Analysis
1. Investigating “home advantage”
-
Step 1: Define Performance Metrics
1 2
# Define the performance metric to analyze (e.g., 'TotalPoints', 'GoalsScored', 'GoalsConceded') performance_metric = 'TotalPoints'
-
Step 2: Retrieve Data for All Matches
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
# Execute an SQL query to retrieve relevant data for all matches and venues cursor.execute(''' SELECT HomeTeam AS Team, SUM(CASE WHEN FTR = 'H' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS PointsHome, SUM(FTHG) AS GoalsScoredHome, SUM(FTAG) AS GoalsConcededHome FROM matchs GROUP BY Team ORDER BY Team ''') # Fetch the results for matches played at home home_results = cursor.fetchall() # Execute another SQL query to retrieve relevant data for all matches and venues cursor.execute(''' SELECT AwayTeam AS Team, SUM(CASE WHEN FTR = 'A' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS PointsAway, SUM(FTAG) AS GoalsScoredAway, SUM(FTHG) AS GoalsConcededAway FROM matchs GROUP BY Team ORDER BY Team ''') # Fetch the results for matches played away away_results = cursor.fetchall()
-
Step 3: Calculate Home Advantage Metrics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# Create dictionaries to store home and away performance metrics for each team home_performance = {row[0]: (row[1], row[2], row[3]) for row in home_results} away_performance = {row[0]: (row[1], row[2], row[3]) for row in away_results} # Calculate home advantage metrics for each team home_advantage = {} for team in home_performance: if team in away_performance: home_points, home_goals_scored, home_goals_conceded = home_performance[team] away_points, away_goals_scored, away_goals_conceded = away_performance[team] if performance_metric == 'TotalPoints': home_advantage[team] = home_points - away_points elif performance_metric == 'GoalsScored': home_advantage[team] = home_goals_scored - away_goals_scored elif performance_metric == 'GoalsConceded': home_advantage[team] = away_goals_conceded - home_goals_conceded # Sort teams by home advantage sorted_teams = sorted(home_advantage.items(), key=lambda x: x[1], reverse=True) # Display teams with the strongest home advantage print(f"Teams with the strongest home advantage in terms of {performance_metric}:") for team, advantage in sorted_teams[:10]: # Display the top 10 teams print(f"{team}: Home Advantage = {advantage}")
2. Examining seasonal variations
-
Step 1: Retrieve Data for All Matches
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# Execute an SQL query to retrieve relevant data for all seasons and venues cursor.execute(''' SELECT season, HomeTeam AS Team, SUM(CASE WHEN FTR = 'H' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS PointsHome FROM matchs GROUP BY season, Team ORDER BY season, Team ''') # Fetch the results for matches played at home home_results = cursor.fetchall() # Execute another SQL query to retrieve relevant data for all seasons and venues cursor.execute(''' SELECT season, AwayTeam AS Team, SUM(CASE WHEN FTR = 'A' THEN 3 WHEN FTR = 'D' THEN 1 ELSE 0 END) AS PointsAway FROM matchs GROUP BY season, Team ORDER BY season, Team ''') # Fetch the results for matches played away away_results = cursor.fetchall()
-
Step 2: Calculate Seasonal Home Advantage Metrics
1 2 3 4 5 6 7 8 9 10 11 12 13
# Create dictionaries to store home and away performance metrics for each team by season home_performance = {(row[0], row[1]): row[2] for row in home_results} away_performance = {(row[0], row[1]): row[2] for row in away_results} # Calculate seasonal home advantage metrics for each team seasonal_home_advantage = {} for season, team in home_performance: if (season, team) in away_performance: home_points = home_performance[(season, team)] away_points = away_performance[(season, team)] if performance_metric == 'TotalPoints': seasonal_home_advantage[(season, team)] = home_points - away_points
-
Step 3: Visualize Home Advantage Across Seasons
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# Extract unique seasons for analysis unique_seasons = sorted(set(season for season, _ in seasonal_home_advantage.keys())) # Calculate average home advantage per season average_home_advantage = [] for season in unique_seasons: season_home_advantages = [advantage for (s, _), advantage in seasonal_home_advantage.items() if s == season] average_home_advantage.append(sum(season_home_advantages) / len(season_home_advantages)) # Create a line chart to visualize seasonal home advantage plt.figure(figsize=(12, 6)) plt.plot(unique_seasons, average_home_advantage, marker='o', linestyle='-', label=f'Average Home Advantage ({performance_metric})') plt.xlabel('Season') plt.ylabel('Average Home Advantage') plt.title(f'Seasonal Home Advantage Variation ({performance_metric})') plt.legend() plt.grid(True) plt.xticks(rotation=45) plt.show()
Country Comparison Analysis
1. Outcomes and Goals
-
Step 1: Retrieve Data for Teams and Matches
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import sqlite3 # Connect to the SQLite database conn = sqlite3.connect('european_database.sqlite') cursor = conn.cursor() # Execute an SQL query to join the 'matchs' and 'divisions' tables to retrieve relevant data cursor.execute(''' SELECT d.country AS Country, m.HomeTeam AS Team, AVG(CASE WHEN m.FTR = 'H' THEN 3 WHEN m.FTR = 'D' THEN 1 ELSE 0 END) AS AveragePoints, AVG(m.FTHG) AS AverageGoalsScored, AVG(m.FTAG) AS AverageGoalsConceded FROM matchs AS m JOIN divisions AS d ON m.Div = d.division GROUP BY d.country, m.HomeTeam ORDER BY Country, Team ''') # Fetch the results results = cursor.fetchall()
-
Step 2: Analyze and Visualize Data by Country/League
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt # Convert the results to a pandas DataFrame for easier manipulation and visualization df = pd.DataFrame(results, columns=['Country', 'Team', 'AveragePoints', 'AverageGoalsScored', 'AverageGoalsConceded']) # Plot the data using seaborn plt.figure(figsize=(12, 6)) sns.set(style="whitegrid") sns.scatterplot(data=df, x='AverageGoalsScored', y='AverageGoalsConceded', hue='Country', s=100) plt.xlabel('Average Goals Scored') plt.ylabel('Average Goals Conceded') plt.title('Comparison of Teams by Country/League (Goals Scored vs. Goals Conceded)') plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left') plt.show() plt.figure(figsize=(12, 6)) sns.set(style="whitegrid") sns.scatterplot(data=df, x='AveragePoints', y='AverageGoalsScored', hue='Country', s=100) plt.xlabel('Average Points Earned') plt.ylabel('Average Goals Scored') plt.title('Comparison of Teams by Country/League (Points vs. Goals Scored)') plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left') plt.show()
3. Competitive Leagues
-
Step 1: Retrieve Data for Leagues
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
# Execute an SQL query to join the 'matchs' and 'divisions' tables to retrieve relevant data cursor.execute(''' SELECT d.country AS Country, d.name AS League, COUNT(*) AS TotalMatches, AVG(ABS(FTHG - FTAG)) AS AverageGoalDifference, AVG(CASE WHEN FTR = 'D' THEN 1 ELSE 0 END) AS DrawPercentage FROM matchs AS m JOIN divisions AS d ON m.Div = d.division GROUP BY d.country, d.name ORDER BY Country, League ''') # Fetch the results results = cursor.fetchall()
-
Step 2: Analyze and Identify Leagues with Competitive Matches
1 2 3 4 5 6 7 8 9 10 11 12
# Convert the results to a pandas DataFrame for easier manipulation and visualization df = pd.DataFrame(results, columns=['Country', 'League', 'TotalMatches', 'AverageGoalDifference', 'DrawPercentage']) # Plot the data using seaborn plt.figure(figsize=(12, 6)) sns.set(style="whitegrid") sns.scatterplot(data=df, x='AverageGoalDifference', y='DrawPercentage', hue='Country', s=100) plt.xlabel('Average Goal Difference') plt.ylabel('Draw Percentage') plt.title('Comparison of Leagues by Competitiveness (Goal Difference vs. Draw Percentage)') plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left') plt.show()
Time Series Analysis
-
Step 1: Retrieve Historical Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
import sqlite3 import pandas as pd # Connect to the SQLite database conn = sqlite3.connect('european_database.sqlite') # Query to retrieve historical data for match outcomes and goal statistics query = ''' SELECT strftime('%Y', Date) AS Year, COUNT(*) AS TotalMatches, AVG(CASE WHEN FTR = 'H' THEN 1 ELSE 0 END) AS HomeWinPercentage, AVG(CASE WHEN FTR = 'A' THEN 1 ELSE 0 END) AS AwayWinPercentage, AVG(CASE WHEN FTR = 'D' THEN 1 ELSE 0 END) AS DrawPercentage, AVG(FTHG) AS AverageHomeGoals, AVG(FTAG) AS AverageAwayGoals FROM matchs GROUP BY Year ORDER BY Year ''' # Fetch the data into a pandas DataFrame df = pd.read_sql_query(query, conn)
-
Step 2: Analyze and Visualize Trends
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
import matplotlib.pyplot as plt # Convert the 'Year' column to datetime format df['Year'] = pd.to_datetime(df['Year'], format='%Y') # Create subplots for different metrics fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(14, 10)) fig.suptitle('Time Series Analysis of Match Outcomes and Goal Statistics Over the Years', fontsize=16) # Plot Total Matches Over the Years axes[0, 0].plot(df['Year'], df['TotalMatches'], marker='o', linestyle='-') axes[0, 0].set_title('Total Matches Over the Years') axes[0, 0].set_xlabel('Year') axes[0, 0].set_ylabel('Total Matches') # Plot Home Win Percentage Over the Years axes[0, 1].plot(df['Year'], df['HomeWinPercentage'], marker='o', linestyle='-', color='orange') axes[0, 1].set_title('Home Win Percentage Over the Years') axes[0, 1].set_xlabel('Year') axes[0, 1].set_ylabel('Home Win Percentage') # Plot Away Win Percentage Over the Years axes[1, 0].plot(df['Year'], df['AwayWinPercentage'], marker='o', linestyle='-', color='green') axes[1, 0].set_title('Away Win Percentage Over the Years') axes[1, 0].set_xlabel('Year') axes[1, 0].set_ylabel('Away Win Percentage') # Plot Draw Percentage Over the Years axes[1, 1].plot(df['Year'], df['DrawPercentage'], marker='o', linestyle='-', color='red') axes[1, 1].set_title('Draw Percentage Over the Years') axes[1, 1].set_xlabel('Year') axes[1, 1].set_ylabel('Draw Percentage') # Adjust layout plt.tight_layout(rect=[0, 0, 1, 0.95]) # Show the plots plt.show()
Author
- Name: Mohammed Mebarek Mecheter
- Email: mohammedmecheter@gmail.com
- GitHub: Mohammed Mebarek Mecheter
- GitHub Repository: https://github.com/Mohammed-Mebarek-Mecheter/European-Football-Analysis