Skip to content

R library for converting Apache Spark ML pipelines to PMML

License

Notifications You must be signed in to change notification settings

jpmml/sparklyr2pmml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

70de951 · Feb 20, 2025

History

22 Commits
Jul 3, 2018
Jul 3, 2018
Oct 2, 2018
Jul 3, 2018
Jul 3, 2018
Jul 3, 2018
Feb 20, 2025

Repository files navigation

Sparklyr2PMML

R library for converting Apache Spark ML pipelines to PMML.

Features

This package is a thin R wrapper for the JPMML-SparkML library.

Prerequisites

  • Apache Spark 3.0.X, 3.1.X, 3.2.X, 3.3.X, 3.4.X or 3.5.X.
  • R 3.3 or newer.

Installation

Install from GitHub using the devtools package:

library("devtools")

install_github("jpmml/sparklyr2pmml")

Configuration and usage

Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Active development branches:

Apache Spark version JPMML-SparkML branch Latest JPMML-SparkML version
3.4.X 3.0.X 3.0.0
3.5.X master 3.1.0

Stale development branches:

Apache Spark version JPMML-SparkML branch Latest JPMML-SparkML version
3.0.X 2.0.X 2.0.6
3.1.X 2.1.X 2.1.6
3.2.X 2.2.X 2.2.6
3.3.X 2.3.X 2.3.5
3.4.X 2.4.X 2.4.4
3.5.X 2.5.X 2.5.3

Launch Sparklyr; use the sparklyr.connect.packages configuration option to specify the coordinates of relevant JPMML-SparkML modules:

  • org.jpmml:pmml-sparkml:${version} - Core module.
  • org.jpmml:pmml-sparkml-lightgbm:${version} - LightGBM via SynapseML extension module.
  • org.jpmml:pmml-sparkml-xgboost:${version} - XGBoost via XGBoost4J-Spark extension module.

Launching core:

library("sparklyr")

config = spark_config()
config[["sparklyr.connect.packages"]] = "org.jpmml:pmml-sparkml:${version}"

sc = spark_connect(master = "local", config = config)

Fitting a Spark ML pipeline:

library("dplyr")
library("sparklyr")

data(iris)

iris_df = copy_to(sc, iris)

iris_pipeline = ml_pipeline(sc) %>%
	ft_r_formula(Species ~ .) %>%
	ml_decision_tree_classifier()

iris_pipeline_model = ml_fit(iris_pipeline, iris_df)

Exporting the fitted Spark ML pipeline to a PMML file:

library("sparklyr2pmml")

pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)

buildFile(pmmlBuilder, "DecisionTreeIris.pmml")

License

Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io

About

R library for converting Apache Spark ML pipelines to PMML

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages