Linear Regression

Setting Up Linear Regression

Note: Make sure you have your training and test data already vectorized and ready to go before you begin trying to fit the machine learning model to unprepped data.

Load in required libraries

1
from pyspark.ml.regression import LinearRegression
2
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
3
from pyspark.ml.evaluation import RegressionEvaluator
Copied!

Initialize Linear Regression object

1
lr = LinearRegression(labelCol="label", featuresCol="features")
Copied!

Create a parameter grid for tuning the model

1
lrparamGrid = (ParamGridBuilder()
2
.addGrid(lr.regParam, [0.001, 0.01, 0.1, 0.5, 1.0, 2.0])
3
# .addGrid(lr.regParam, [0.01, 0.1, 0.5])
4
.addGrid(lr.elasticNetParam, [0.0, 0.25, 0.5, 0.75, 1.0])
5
# .addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])
6
.addGrid(lr.maxIter, [1, 5, 10, 20, 50])
7
# .addGrid(lr.maxIter, [1, 5, 10])
8
.build())
Copied!

Define how you want the model to be evaluated

1
lrevaluator = RegressionEvaluator(predictionCol="prediction", labelCol="label", metricName="rmse")
Copied!

Define the type of cross-validation you want to perform

1
# Create 5-fold CrossValidator
2
lrcv = CrossValidator(estimator = lr,
3
estimatorParamMaps = lrparamGrid,
4
evaluator = lrevaluator,
5
numFolds = 5)
Copied!

Fit the model to the data

1
lrcvModel = lrcv.fit(train)
2
print(lrcvModel)
Copied!

Get model information

1
lrcvSummary = lrcvModel.bestModel.summary
2
print("Coefficient Standard Errors: " + str(lrcvSummary.coefficientStandardErrors))
3
print("P Values: " + str(lrcvSummary.pValues)) # Last element is the intercept
Copied!

Score the testing dataset using your fitted model for evaluation purposes

1
lrpredictions = lrcvModel.transform(test)
Copied!

Evaluate the model

1
print('RMSE:', lrevaluator.evaluate(lrpredictions))
Copied!
Note: When you use the CrossValidator function to set up cross-validation of your models, the resulting model object will have all the runs included, but will only use the best model when you interact with the model object using other functions like evaluate or transform.