In this article, we will examine the relationship between the morning sales performance of an e-commerce site and the end-of-day sales performance, and work on a model that will predict the end-of-day sales performance.
Regression Models
What is regression?
Regression analysis; explain the relation among one dependent variable and one or more independent variables to a mathematical equation. Regression show relation among variables and creates equation based with relation for prediction models. Regression models consist of two section as linear models and nonlinear models.
Linear Regression Models
What is linearity?
Linearity; explains the relation direction between two numerical variables, e.g: our height increases as increases our age, so there is a linear relation between age and height variables. Conversely, as one variable increases, another may decrease but If relation symmetry is proportional between variables, the statement of linearity remains valid even if variable affect each other negatively or positively. Regression models also include nonlinear models, but a linear model will be preferred in this article.
Simple Linear Regression
Simple linear regression model is using to examine and explain the relation between a dependent variable and independent variable.
Formula:
$\tilde{y} = \alpha + \beta x + \epsilon$
$\tilde{y}$ = dependent variable
$\alpha$ = intercept (constant)
$\beta$ = slope (multiplier coefficient)
$x$ = independent variable
$\epsilon$ = error term
$\beta$ coefficient is the key point in the equation. The $\beta$ coefficient shows effect of 1-unit change in $x$ variable on $\tilde{y}$ expression.
Import Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.formula.api as smf
np.random.seed(777)
sns.set(rc={'figure.figsize':(11.7,8.27)})
%config InlineBackend.figure_format = 'retina'
import warnings
warnings.filterwarnings('ignore')
Load Data
In this model, our hypothesis will be tested to check for the effect of sales between 12 am and 9 am on sales between 9 am and 12 am. Therefore, we have to create the appropriate segments in Google Analytics and download the turnover up to 9 am and the turnover until 12 at night from Google Analytics separately in 2 columns.
df = pd.read_excel('Morning.xlsx')
df = df.drop(columns=['Day Index'])
Summary of Dataframe
df.head()

Distributions and Descriptive Statistics
df.describe().T

Kernel Density Estimation: Revenue of Morning
sns.distplot(df['morning_revenue'], hist = True, kde = True,
bins = 20, color = 'darkblue',
hist_kws = {'edgecolor':'black'},
kde_kws = {'linewidth': 4})

Kernel Density Estimation: Revenue of Day
sns.distplot(df['day_revenue'], hist = True, kde = True,
bins = 20, color = 'darkblue',
hist_kws = {'edgecolor':'black'},
kde_kws = {'linewidth': 4})

Correlation Exploration on Scatter Chart
sns.scatterplot(data = df, x = 'morning_revenue', y = 'day_revenue')

Linear Regression Model and Hypothesis Testing
linear_model = smf.ols('day_revenue ~ morning_revenue', data = df).fit()
Dep. Variable: | day_revenue | R-squared: | 0.762 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.759 |
Method: | Least Squares | F-statistic: | 278.1 |
Date: | Tue, 06 Apr 2021 | Prob (F-statistic): | 7.84e-29 |
Time: | 08:57:08 | Log-Likelihood: | -1177.8 |
No. Observations: | 89 | AIC: | 2360. |
Df Residuals: | 87 | BIC: | 2365. |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.284e+04 | 3.5e+04 | 0.367 | 0.715 | -5.67e+04 | 8.24e+04 |
morning_revenue | 8.4715 | 0.508 | 16.676 | 0.000 | 7.462 | 9.481 |
Omnibus: | 11.027 | Durbin-Watson: | 1.177 |
---|---|---|---|
Prob(Omnibus): | 0.004 | Jarque-Bera (JB): | 13.963 |
Skew: | 0.600 | Prob(JB): | 0.000929 |
Kurtosis: | 4.526 | Cond. No. | 1.66e+05 |
Parameters of Model
linear_model.params[0] #Intercept
>> 12836.998104752289
linear_model.params[1] #Slope
>> 8.471494381108705
Model
$Theoritical Equation = Intercept + Slope (MorningRevenue)$ $Model = 12836.998105 + 8.471494 (Morning Revenue)$Equation and Prediction
test_morning_revenue = 50000
predict_daily_revenue = 12836.998105 + (8.471494 * test_morning_revenue)
>> 'Prediction: ' + str(round(predict_daily_revenue)) + ' USD'
'Prediction: 436412 TL'
Conclusion Notes
- This method is just an any case, you can construct many hypotheses in the same way and you can create forecast models with hypotheses for the end of the day, month end, year end transactions with the observed values in the previous calendar periods.
- By running the forecast model in the morning, digital marketers can predict that day’s marketing and budget effort.