Help us build a better product, earn the PaxPro Advocate badge: Six short questions, one cool badge.

Visit the official Paxata Documentation portal for all of your doc needs.

How do I do sampling on a specific subset of my dataset?

A customer asked about this in April 2017 - The customer has a 45 million row dataset and she wants to sample the dataset but she only wants to sample the data where Column Name = "Value"

To do this, first create a new dataset which is a subset of the 45 million row dataset and then perform the sampling.

Step 1: Use a Filtergram and select the desired values by which the dataset needs to be sampled. For Example: I want to select all the data where HQ STATE = CA.



Step 2: Create and publish a Lens that stores this view of the dataset for reusability.



Step 3:  Once this dataset has been created, bring it into a new Paxata Project and use the Sampling tool.

Best Answer

Sign In or Register to comment.