Product Philosophy

Son Yazılar


Product Philosophy

E-commerce Daily Revenue Prediction with Python and Machine Learning

Optimize your intraday marketing actions with predictive analytics models for e-commerce projects. Predict end of day revenue with linear regression and make decisions beyond your competitors.

Murat OvaMurat Ova

In this article, we will examine the relationship between the morning sales performance of an e-commerce site and the end-of-day sales performance, and work on a model that will predict the end-of-day sales performance.

Regression Models

What is regression?

Regression analysis; explain the relation among one dependent variable and one or more independent variables to a mathematical equation. Regression show relation among variables and creates equation based with relation for prediction models. Regression models consist of two section as linear models and nonlinear models.

Linear Regression Models

What is linearity?

Linearity; explains the relation direction between two numerical variables, e.g: our height increases as increases our age, so there is a linear relation between age and height variables. Conversely, as one variable increases, another may decrease but If relation symmetry is proportional between variables, the statement of linearity remains valid even if variable affect each other negatively or positively. Regression models also include nonlinear models, but a linear model will be preferred in this article.

Simple Linear Regression

Simple linear regression model is using to examine and explain the relation between a dependent variable and independent variable.

Formula:
$\tilde{y} = \alpha + \beta x + \epsilon$

$\tilde{y}$ = dependent variable
$\alpha$ = intercept (constant)
$\beta$ = slope (multiplier coefficient)
$x$ = independent variable
$\epsilon$ = error term

$\beta$ coefficient is the key point in the equation. The $\beta$ coefficient shows effect of 1-unit change in $x$ variable on $\tilde{y}$ expression.

Import Libraries

import numpy as np
import pandas as pd

import seaborn as sns
import statsmodels.formula.api as smf

np.random.seed(777)
sns.set(rc={'figure.figsize':(11.7,8.27)})
%config InlineBackend.figure_format = 'retina'

import warnings
warnings.filterwarnings('ignore')

Load Data

In this model, our hypothesis will be tested to check for the effect of sales between 12 am and 9 am on sales between 9 am and 12 am. Therefore, we have to create the appropriate segments in Google Analytics and download the turnover up to 9 am and the turnover until 12 at night from Google Analytics separately in 2 columns.

df = pd.read_excel('Morning.xlsx')
df = df.drop(columns=['Day Index'])

Summary of Dataframe

df.head()

Distributions and Descriptive Statistics

df.describe().T

Kernel Density Estimation: Revenue of Morning

sns.distplot(df['morning_revenue'], hist = True, kde = True, 
             bins = 20, color = 'darkblue', 
             hist_kws = {'edgecolor':'black'},
             kde_kws = {'linewidth': 4})

Kernel Density Estimation: Revenue of Day

sns.distplot(df['day_revenue'], hist = True, kde = True, 
             bins = 20, color = 'darkblue', 
             hist_kws = {'edgecolor':'black'},
             kde_kws = {'linewidth': 4})

Correlation Exploration on Scatter Chart

sns.scatterplot(data = df, x = 'morning_revenue', y = 'day_revenue')

Linear Regression Model and Hypothesis Testing

linear_model = smf.ols('day_revenue ~ morning_revenue', data = df).fit()

Dep. Variable:day_revenueR-squared: 0.762
Model:OLSAdj. R-squared: 0.759
Method:Least SquaresF-statistic: 278.1
Date:Tue, 06 Apr 2021Prob (F-statistic):7.84e-29
Time:08:57:08Log-Likelihood: -1177.8
No. Observations:89AIC: 2360.
Df Residuals:87BIC: 2365.
Df Model:1
Covariance Type:nonrobust
coefstd errtP>|t|[0.0250.975]
Intercept1.284e+043.5e+040.3670.715-5.67e+048.24e+04
morning_revenue8.47150.50816.6760.0007.4629.481
Omnibus:11.027Durbin-Watson: 1.177
Prob(Omnibus):0.004Jarque-Bera (JB): 13.963
Skew:0.600Prob(JB): 0.000929
Kurtosis:4.526Cond. No. 1.66e+05

Parameters of Model

linear_model.params[0] #Intercept
>> 12836.998104752289
linear_model.params[1] #Slope
>> 8.471494381108705

Model

$Theoritical Equation = Intercept + Slope (MorningRevenue)$ $Model = 12836.998105 + 8.471494 (Morning Revenue)$

Equation and Prediction

test_morning_revenue = 50000
predict_daily_revenue = 12836.998105 + (8.471494 * test_morning_revenue)
>> 'Prediction: ' + str(round(predict_daily_revenue)) + ' USD'
'Prediction: 436412 TL'

Conclusion Notes

Keep yourself poet.

Yorumlar0
Hiç yorum yok.