Home Data Prep Q&A

Big News: we’ve moved to the DataRobot Community! Please keep your eye out for an email invitation to join us there. Refer to the We've Moved FAQ for a guide on how to use your existing Paxata Community account to login to our new home.

Visit the official Paxata Documentation portal for all of your doc needs.

Incremental Refresh / Appending / Stacking Data in Paxata


Quick question. I was wondering if Paxata has the capability to perform an incremental refresh. I have a set of excel files I receive every month and would like to stack it on a monthly basis. This would be considered an incremental refresh. Unfortunately, I haven't figured out a way to complete this task in Paxata. 

It doesn't seem paxata supports this. I tried creating a standard excel file then adding a version, but that didn't work. Then tried creating a foundation data set then created another data set which was set to automate (new data) and then appending that on the foundational data set within a project hoping that project would keep all the appended data. However, no dice there as well.

Any thoughts?


  • Hello,
    I may need a bit of clarification in order to answer your question - are you importing the Excel files locally or from a shared area like Sharepoint?  Paxata does not support the automation of local file imports into our Library (don't want the server reaching out to individual desktops) but we do support automated import from enterprise sources (SFTP, Sharepoint, S3, WASB, databases, etc.).  If you load the latest version of the Excel file into the Library, either on demand if a local file or on a schedule if enterprise system then you can create a project which starts with the original file and Appends the updated file.  When there is a new updated file in the Library you will notice a "refresh datasets" button at the bottom of the Steps panel in the project turns green.  You can select this option and then choose to update the latest version of any new datasets in the project.  You can also set automation options to use "latest version".  Does this help?  Please let me know if I can provide additional clarification.

  • Morning Martha,

    I am importing the excel files locally, which I receive via email. Is there a way to do it with a local excel file?
  • No, it is not possible to automate the import of local files from your desktop into the Paxata Library.  If you click on a dataset in the Library you do have the option to add a version - this will help simplify the Library so that you can view each version of a dataset vs. adding new versions as a separate/distinct dataset in the Library.
  • Interesting. Would it work from a share drive? I could load it to a network share.

  • AkshayAkshay Posts: 111 admin
    edited March 27, 2019 9:25PM
    Hello Ychamb,

    Paxata does support the Network Share (SMB) connector, so automation can be done if the files are on a network share. I hope this helps! 

  • Is it possible to have an incremental refresh from an Enterprise data source such as a Hive table or SQL Server database? 
  • marthamiller_SEmarthamiller_SE Posts: 12 mod
    edited May 10, 2019 2:25PM
    Import can be based on either the entire table or a query.  If using a query then you could incorporate a condition in the where clause based on current day or some other flag that would capture updated rows.  You can then automate this to run.  I hope this helps!

  • Hey Martha,  we came up with a question related to the start of this thread. Is there any documentation that explains how to incrementally append data to a dataset?  This becomes circular at some point and we're having trouble figuring it out.  It seems like you would have to be able to modify a current dataset with a project without having the project create a new dataset?
  • sayyarsayyar Posts: 25 admin
    edited October 10, 2019 8:11PM
    Hi @bella21,
    Here's one way to accomplish this in Paxata:
    1. Import version 1 of the dataset
    2. Create a project with the dataset imported in step 1
    3. Perform necessary actions like shaping, computed columns, etc. 
    4. Add a lens, and publish the output of the lens to Library 
    5. Add a new version of the dataset imported in Step 1
    6. Edit the steps to add an Append step just before the Lens you published
    7. In the append step, Select the output of lens you published in Step 4 back into the project
    8. Use the refresh datasets button to refresh the dataset you started with the latest (version imported in step 5 will replace version imported in step 1)
    9. If incremental data has a tendency to receive duplicates, add necessary Deduplicate step to address them
    10. Automate the project
    11. Select "Use Latest Version" for both inputs datasets
    Please let me know if you need additional help with this based on the specific use-case, Aaron.  

    Shyam Ayyar
  • Hi @bella21 ,

    To add to @sayyar response, please let me know if it helps to have a quick Zoom session and build a working prototype. I will be happy to set it up.

    With Best Regards
    Sudheer Kumar

  • Thanks @sayyar,

    That seems like a pretty challenging mulit-tiered process in order to do a simple stacking. However, thanks for figuring this out. Instead of doing it that way with multi-steps. I created a historical master view to be refreshed as the foundation than stacked with new data pulled from sharepoint.

    On a side note, other programs have the ability to recognize changes in data per a specific data element. Maybe this would be a great idea to institute in Paxata?

  • @ychamb
    I am glad that you are able to perform this in another manner as well. Would you please share the steps you took to accomplish it so that other users will also benefit? 

    Thank you for the feedback on the ability to recognize changes in data. We will look into adding the capability. 

    Shyam Ayyar

  • Hey morning @sayyar,

    As mentioned above.

    I created a historical master view to be refreshed as the foundation than stacked with new data pulled from sharepoint.

    Basically, automated the historical/foundational piece, then automated the share point data and created a project which stacked them together.

    Manual  - However, when my sharepoint data reaches its record limit. I have to export half the records and paste them into the historical master and delete them manually from the sharepoint (eliminate dupes).

  • I have a series of 7 files I append together on a daily basis to populate a dashboard.  At the end of the month I want to append these files together,  add a timestamp for the current date and append this to a history file.  My current history file resides on Sharepoint but security does not allow us to write from Paxata to Sharepoint.  Is there a way to manage the history file solely in Paxata.  I currently have 2 months of history and want to automate the process going forward. 
Sign In or Register to comment.