Home Data Prep Q&A


Big News: we’ve moved to the DataRobot Community! Please keep your eye out for an email invitation to join us there. Refer to the We've Moved FAQ for a guide on how to use your existing Paxata Community account to login to our new home.

Visit the official Paxata Documentation portal for all of your doc needs.

Is it possible to only keep the most current version of a dataset?

I have an automated dataset that runs daily.  I do not need the old versions.  Is there a way to automate the removal - or only keep current or n-number of versions?

Answers

  • I'm looking for an automated way - not a manual method.  
  • The best way would be through a script using the REST API. Using the DELETE action of the /library/data/DatasetID/version endpoint.

    This is how it would look in python (Note that this code actually works if you save it as a python file and replace the variables to what you need):  
    import requests
    from requests.auth import HTTPBasicAuth
    
    #change the URL to whatever your Paxata environment is
    paxata_url_source = "https://datarobot.paxata.com"
    #generate a token from your Paxata instance
    paxata_rest_token = "d44278bdd12ab6d58a591a8a6fc6344f"
    #insert the datasetID you want to delete the previous versions of.datasetID = "70c184cf8b43ab9812f8947c96249209" 
    #generate the authorization_tokenauthorization_token = HTTPBasicAuth("",paxata_rest_token)
    
    #get how many versions there are for the datasetID
    url_request = (paxata_url_source + "/rest/library/data/" + datasetID)
    my_response = requests.get(url_request,auth=authorization_token)
    
    if(my_response.ok):
        number_of_versions = json.loads(my_response.content)[0].get('version')
    
    #start with version 1 and then loop through removing all the previous versions until you get to the latest
    current_version = 1
    while (current_version < number_of_versions):
        url_delete_request = (paxata_url_source + "/rest/library/data/" + datasetID + "/" + str(current_version))
        my_response = requests.delete(url_delete_request,auth=authorization_token)
        current_version +=1

  • The easiest way to perform this would be through the REST API. Specifically using the DELETE call for the endpoint /library/data/datasetID/version

    Below I have written some working python code, if you paste it into a python file and run it (after editing the variables) it will work for what you're trying to achieve:
    import requests
    from requests.auth import HTTPBasicAuth
    
    #change the URL to whatever your Paxata environment is
    paxata_url_source = "https://datarobot.paxata.com"
    #generate a token from your Paxata instance
    paxata_rest_token = "8a6fc6349abd44278bdd12ab6d58a54f"
    #insert the datasetID you want to delete the previous versions of.
    datasetID = "9812703e2c184cf8b4f8947c96249209" 
    #generate the authorization_token
    authorization_token = HTTPBasicAuth("",paxata_rest_token)
    
    #get how many versions there are for the datasetID
    url_request = (paxata_url_source + "/rest/library/data/" + datasetID)
    my_response = requests.get(url_request,auth=authorization_token)
    
    if(my_response.ok):
        number_of_versions = json.loads(my_response.content)[0].get('version')
    
    #start with version 1 and then loop through removing all the previous versions until you get to the latest
    current_version = 1
    while (current_version < number_of_versions):
        url_delete_request = (paxata_url_source + "/rest/library/data/" + datasetID + "/" + str(current_version))
        my_response = requests.delete(url_delete_request,auth=authorization_token)
        current_version +=1
    


Sign In or Register to comment.