Home Data Prep Q&A

Big News: we’ve moved to the DataRobot Community! Please keep your eye out for an email invitation to join us there. Refer to the We've Moved FAQ for a guide on how to use your existing Paxata Community account to login to our new home.

Visit the official Paxata Documentation portal for all of your doc needs.

How do I do sampling on a specific subset of my dataset?

A customer asked about this in April 2017 - The customer has a 45 million row dataset and she wants to sample the dataset but she only wants to sample the data where Column Name = "Value"

To do this, first create a new dataset which is a subset of the 45 million row dataset and then perform the sampling.

Step 1: Use a Filtergram and select the desired values by which the dataset needs to be sampled. For Example: I want to select all the data where HQ STATE = CA.

Step 2: Create and publish a Lens that stores this view of the dataset for reusability.

Step 3:  Once this dataset has been created, bring it into a new Paxata Project and use the Sampling tool.

Best Answer

Sign In or Register to comment.