It enables using Apache Spark with ease using R by providing access to core functionality like installing, connecting and managing Spark and using Spark’s MLlib, Spark Structured Streaming and Spark Pipelines from R.
Scale Data Science with Spark and R
To connect to a local cluster: Install R, Java 8, and run:
# Run once install.packages("sparklyr") sparklyr::spark_install() # Connect to Spark local library(sparklyr) sc <- spark_connect(master = "local") # Disconnect from Spark spark_disconnect(sc)
To connect to any other Spark cluster:
# Connect to Hadoop YARN sc <- spark_connect(master = "yarn") # Connect to Mesos sc <- spark_connect(master = "mesos://host:port") # Connect to Kubernetes sc <- spark_connect(master = "k8s://https://server") # Connect to Apache Livy sc <- spark_connect(master = "http://server/livy", method = "livy")
To connect through specific distributions, cloud providers and tools use the following resources:
Sponsors of current sparklyr committers.