Sparkitecture
  • Welcome to Sparkitecture!
  • Cloud Service Integration
    • Azure Storage
    • Azure SQL Data Warehouse / Synapse
    • Azure Data Factory
  • Data Preparation
    • Reading and Writing Data
    • Shaping Data with Pipelines
    • Other Common Tasks
  • Machine Learning
    • About Spark MLlib
    • Classification
      • Logistic Regression
      • Naïve Bayes
      • Decision Tree
      • Random Forest
      • Gradient-Boosted Trees
    • Regression
      • Linear Regression
      • Decision Tree
      • Random Forest
      • Gradient-Boosted Trees
    • MLflow
    • Feature Importance
    • Model Saving and Loading
    • Model Evaluation
  • Streaming Data
    • Structured Streaming
  • Operationalization
    • API Serving
    • Batch Scoring
  • Natural Language Processing
    • Text Data Preparation
    • Model Evaluation
  • Bioinformatics and Genomics
    • Glow
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Natural Language Processing

Model Evaluation

Multiclass classification evaluator

from pyspark.mllib.evaluation import MulticlassMetrics
evaluator = MulticlassClassificationEvaluator(predictionCol="prediction")

for model in ["lrpredictions", "nbpredictions", "rfpredictions"]:
    
    df = globals()[model]
    ########################################
    # Compute raw scores on the test set
    predictionAndLabels = df.select("prediction", "label").rdd

    # Instantiate metrics object
    metrics = MulticlassMetrics(predictionAndLabels)

    # Overall statistics
    precision = metrics.precision()
    recall = metrics.recall()
    f1Score = metrics.fMeasure()
    print("Summary Stats for: ", model)
    #print(metrics.confusionMatrix())
    print("Accuracy = %s" % evaluator.evaluate(df))
    print("Precision = %s" % precision)
    print("Recall = %s" % recall)
    print("F1 Score = %s" % f1Score)

    # Weighted stats
    #print("Weighted recall = %s" % metrics.weightedRecall)
    #print("Weighted precision = %s" % metrics.weightedPrecision)
    #print("Weighted F(1) Score = %s" % metrics.weightedFMeasure())
    #print("Weighted F(0.5) Score = %s" % metrics.weightedFMeasure(beta=0.5))
    #print("Weighted false positive rate = %s" % metrics.weightedFalsePositiveRate)
    print("\n")
PreviousText Data PreparationNextGlow

Last updated 5 years ago

Was this helpful?