Home Data Prep for Data Science, AI and ML

Paxata Community Members: Something special in a community experience is coming your way. Stay tuned to this space.
In the meantime, check out the brand new Data Prep for Data Science topic here and the new DataRobot Community.

Visit the official Paxata Documentation portal for all of your doc needs.

Unwanted observations in my dataset: how do I remove them?

MelanieMelanie Posts: 70 admin
Removal of unwanted observations, including deleting duplicate or irrelevant values from your dataset, is super simple in Paxata with three tools that quickly get you on your prepping way:
  • Deduplicate: when you have multiple rows for the same data and want to remove the duplicates. For example, if you have a housing model and your dataset has several records for the same address, then the Deduplicate tool allows you to remove those duplicates and condense the data into a single row. See the Deduplicate documentation for details.

  • Remove Rows: when you want to remove unwanted rows of data from your dataset, the Remove Rows tool is the one for you. For example, you have a feature for housing data that includes values for single family homes and apartments, but you don't want the apartments in your data. In this example, you'll first use a Filtergram to select the "apartment" values, and then the Remove Rows tool to remove them. When you're ready to start removing those rows, see the official documentation for the Remove Rows tool.

  • Columns Management tool: in Paxata, your variables are managed through columns in the dataset. The Columns tool allows you to rename, remove and reorder any column variables in your dataset. We have another Community article on this topic that gives you an overview of how to locate and use this tool. And there is also official documentation here: Columns management tool.
If you're not quite sure where or how to begin looking for duplicates or irrelevant values in your data, we recommend checking out our Community article on Exploratory Data Analysis: histograms to help you better understand your data. You can also check out our official docs on Filtergrams--the tool of choice for understanding exactly what's buried in your variables.

Sign In or Register to comment.