The 2024-25 NBA season officially started a month ago, and we’ve had plenty of great performances since then, including from our own Tyrese Maxey and Paul George. Though the league’s best players have been looking better every year, stats in general haven’t changed much in the last decade outside of a large increase in 3-point attempts that can be attributed to the impact Stephen Curry’s Warriors had on the NBA in that time:
Despite us going from 112 3PA in the 2014-15 season to as many as 156 3PA in the 2022-23 season, three-point percentage has generally stayed around 30%, peaking at 32.5% the same year the NBA set the record for three-point attempts.
In trying to determine who the best players in the NBA are, there are many different metrics you can use, but I’m going to focus on these eight: PTS, TRB, AST, STL, BLK, TOV, FG%, and 3P%.
An equation that combines these metrics in a way that I feel would help shed light on who the best NBA players are would be this:
0.3*PTS+0.15*TRB+0.15*AST+0.1*STL+0.1*BLK-0.075*TOV+200*FG%+200*3P%
Based on this equation, Luka Doncic would have won MVP over Nikola Jokic last year, but barely. Nikola Jokic is the betting favorite to win MVP again, but I wouldn’t count Luka out.
Using Random Forest Regression in Python, I developed a predictive model that generated player statistics for the current season based on the trends from the last ten seasons, taking into account the fact that player production tends to decline as they age. You can compare the results from the model to last season’s stats on this page.
Players averaged approximately 491 total points last season, while this model has them averaging just under 509 total points this season. Obviously, the stars have much more than that, but the model actually has most of the best players playing worse this season, which obviously hasn’t panned out so far. Applying my equation to the predicted stats from the model, Luka Doncic will be the best player in the league again this season, but it will be a bit closer between him and Nikola Jokic.
Here’s the code for the model:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_squared_error
#Function to scrape data for a specific season
def scrape_nba_totals_for_year(year):
pass # Not replicating this code just to stay safe
#Loop through multiple seasons and scrape data
for year in range(2015, 2025): # From 2014-15 to 2023-24 seasons
scrape_nba_totals_for_year(year)
time.sleep(2) # Delay between requests to avoid overwhelming the server
#Read the CSVs
dfs = []
for year in range(2015, 2025): # 10 years
df = pd.read_csv(f'nba_{year}_totals.csv')
df['Season'] = year
dfs.append(df)
#Combine all into one DataFrame
all_data = pd.concat(dfs, ignore_index=True)
#Drop irrelevant columns and handle missing data
all_data.drop(columns=['Rk'], inplace=True)
all_data.fillna(0, inplace=True)
#Create columns for the previous season's stats for multiple categories
all_data['Prev_PTS'] = all_data.groupby('Player')['PTS'].shift(1)
all_data['Prev_TRB'] = all_data.groupby('Player')['TRB'].shift(1)
all_data['Prev_AST'] = all_data.groupby('Player')['AST'].shift(1)
all_data['Prev_STL'] = all_data.groupby('Player')['STL'].shift(1)
all_data['Prev_BLK'] = all_data.groupby('Player')['BLK'].shift(1)
all_data['Prev_TOV'] = all_data.groupby('Player')['TOV'].shift(1)
all_data['Prev_FG%'] = all_data.groupby('Player')['FG%'].shift(1)
all_data['Prev_3P%'] = all_data.groupby('Player')['3P%'].shift(1)
#Select features (previous season stats)
features = ['Age', 'Prev_PTS', 'Prev_TRB', 'Prev_AST', 'Prev_STL', 'Prev_BLK', 'Prev_TOV', 'Prev_FG%', 'Prev_3P%']
#Select target variables (current season stats you want to predict)
targets = ['PTS', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'FG%', '3P%']
#Prepare feature and target data
X = all_data[features] # Select features
y = all_data[targets] # Select targets
#Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#Initialize the model (Random Forest or another model)
model = RandomForestRegressor(n_estimators=100, random_state=42)
#MultiOutputRegressor handles multiple targets
multi_output_model = MultiOutputRegressor(model)
#Train the model
multi_output_model.fit(X_train, y_train)
#Make predictions on the test set
y_pred = multi_output_model.predict(X_test)
#Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
#Load the 2023-24 season data for prediction
current_season_data = pd.read_csv('nba_2024_totals.csv')
#Adjust the player's age for the next season (add 1 year)
current_season_data['Age_Next_Season'] = current_season_data['Age'] + 1
#Rename the current season's stats as "previous" for use in predictions for the next season
current_season_data['Age'] = current_season_data['Age_Next_Season']
current_season_data['Prev_PTS'] = current_season_data['PTS']
current_season_data['Prev_TRB'] = current_season_data['TRB']
current_season_data['Prev_AST'] = current_season_data['AST']
current_season_data['Prev_STL'] = current_season_data['STL']
current_season_data['Prev_BLK'] = current_season_data['BLK']
current_season_data['Prev_TOV'] = current_season_data['TOV']
current_season_data['Prev_FG%'] = current_season_data['FG%']
current_season_data['Prev_3P%'] = current_season_data['3P%']
#Prepare the features for the next season with new age and renamed stats
X_new_season = current_season_data[['Age', 'Prev_PTS', 'Prev_TRB', 'Prev_AST', 'Prev_STL', 'Prev_BLK', 'Prev_TOV', 'Prev_FG%', 'Prev_3P%']].fillna(0)
#Make predictions for the 2024-25 season
predicted_stats = multi_output_model.predict(X_new_season)
#Convert the predictions into a DataFrame
predicted_df = pd.DataFrame(predicted_stats, columns=targets)
#Add player name and age to the DataFrame
predicted_df['Player'] = current_season_data['Player'].values
predicted_df['Age'] = current_season_data['Age'].values
#Reorder columns to place 'Player' and 'Age' first
predicted_df = predicted_df[['Player', 'Age'] + targets]
#Save the predictions to a CSV file
predicted_df.to_csv('predicted_nba_2025_totals.csv', index=False)
print("Predicted stats for the 2024-25 season saved to 'predicted_nba_2025_totals.csv'.")