Paxata Community Members: Something special in a community experience is coming your way. Stay tuned to this space.
Visit the official Paxata Documentation portal for all of your doc needs.
Unwanted observations in my dataset: how do I remove them?
Removal of unwanted observations, including deleting duplicate or irrelevant values from your dataset, is super simple in Paxata with three tools that quickly get you on your prepping way:
- Deduplicate: when you have multiple rows for the same data and want to remove the duplicates. For example, if you have a housing model and your dataset has several records for the same address, then the Deduplicate tool allows you to remove those duplicates and condense the data into a single row. See the Deduplicate documentation for details.
- Remove Rows: when you want to remove unwanted rows of data from your dataset, the Remove Rows tool is the one for you. For example, you have a feature for housing data that includes values for single family homes and apartments, but you don't want the apartments in your data. In this example, you'll first use a Filtergram to select the "apartment" values, and then the Remove Rows tool to remove them. When you're ready to start removing those rows, see the official documentation for the Remove Rows tool.
- Columns Management tool: in Paxata, your variables are managed through columns in the dataset. The Columns tool allows you to rename, remove and reorder any column variables in your dataset. We have another Community article on this topic that gives you an overview of how to locate and use this tool. And there is also official documentation here: Columns management tool.