Table of Contents


  1. Preface
  2. Part 1: Hadoop Integration
  3. Part 2: Databricks Integration
  4. Appendix A: Managing Distribution Packages
  5. Appendix B: Connections Reference

Install Python Libraries

Install Python Libraries

Databricks comes installed with some Python libraries. If you need to install additional third-party Python libraries, use the pip installer for Databricks.
The Databricks cluster provides a preloaded set of Python libraries. In some cases, your Databricks administrator might determine that the workspace requires additional libraries or modules. When additional libraries or modules are required, they must be installed through an init script during cluster creation. See the Databricks documentation.
Perform the following tasks to install third-party Python libraries:
  1. Write an init script that includes the Python libraries to install.
  2. Upload the script to the DBFS directory. If you use AWS Databricks, you can upload the script to the S3 directory instead.
When you create an ephemeral cluster using a cluster workflow, include the init script file location in the advanced properties for the Create Cluster task.
For more information about the installed Python libraries that come with Databricks, refer to the Databricks documentation.


We’d like to hear from you!