Paxata has been acquired by DataRobot to build the industry’s first end-to-end enterprise AI Platform!
Together, we are continuing to build an enterprise grade data preparation solution to streamline and power automated machine learning as part of the DataRobot platform. We are thrilled to bring together our communities and look forward to sharing exciting updates in the near future. In the meantime, check out the brand new Data Prep for Data Science topic here and the new DataRobot Community.

Visit the official Paxata Documentation portal for all of your doc needs.

How do I do sampling on a specific subset of my dataset?

A customer asked about this in April 2017 - The customer has a 45 million row dataset and she wants to sample the dataset but she only wants to sample the data where Column Name = "Value"

To do this, first create a new dataset which is a subset of the 45 million row dataset and then perform the sampling.

Step 1: Use a Filtergram and select the desired values by which the dataset needs to be sampled. For Example: I want to select all the data where HQ STATE = CA.



Step 2: Create and publish a Lens that stores this view of the dataset for reusability.



Step 3:  Once this dataset has been created, bring it into a new Paxata Project and use the Sampling tool.

Best Answer

Sign In or Register to comment.