Delta Lake Clones: Systematic Approach for Testing, Sharing data

  1. Creation of Testing Environment: Creating a Test Database in Staging Environment for development and testing is always a hectic process. It requires huge amount of time and if the process dies in between, it should be started from the scratch again. Also, there is a possibility that views, triggers, constraints and table schema might not get copied in few cases. Creating a clone will be very beneficial here by reducing the overall time and efforts required for completing this process.
  2. Backup Database for Disaster Recovery: For DR, data governance and auditing, instead of creating a separate instance of database, production system can be cloned easily in few lines of code.
  3. Sharing/Archiving Data: Production data can be shared easily to to other teams without worrying about any changes in it.
Shallow Clone creation
S3 contents of Shallow Cloned table
  1. https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-clone.html
  2. https://databricks.com/blog/2020/09/15/easily-clone-your-delta-lake-for-testing-sharing-and-ml-reproducibility.html#:~:text=Shallow%20clones%20are%20great%20for,of%20the%20table%20being%20cloned

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store