Home Admin Corner

Paxata Community Members: Something special in a community experience is coming your way. Stay tuned to this space.
In the meantime, check out the brand new Data Prep for Data Science topic here and the new DataRobot Community.

Visit the official Paxata Documentation portal for all of your doc needs.

Paxata Backup Basics

In this article, you will learn about the basics of Paxata backup tasks.


There are three components that requires backup in case of data loss from the running servers:
  1. ​Metadata Storage (MongoDB)
  2. Data Library Storage (HDFS)
  3. Properties Files (particularly pes.properties)
Notably, Pipeline cache files on executors do not need to backed up, as cache loss would be recovered by cache retrieval automatically.

Basic Tools

For each component, there are many tools for backup. Here we are recommending the most basic tools that can achieve the backup task alone. For better reliability/manageability, more advanced tools may be available.

Metadata Storage (MongoDB)

mongodump --out /tmp/mongobackup_`date +"%m-%d-%y"`


Data Library Storage (HDFS)

Distcp allows you to copy directory from HDFS to another cluster/s3 bucket.

hadoop distcp hdfs://CDH5-nameservice/user/paxata/library s3a://bucket/librarybackup


Cloudera BDR is a Enterprise solution of Distcp

Properties Files (particularly pes.properties)

Upload Files from server local file system to S3 bucket

cd /usr/local/paxata/server/config
aws s3 sync . s3://bucket/propertybackup

Sign In or Register to comment.