Home Data Prep Q&A

Paxata Community Members: Something special in a community experience is coming your way. Stay tuned to this space.
In the meantime, check out the brand new Data Prep for Data Science topic here and the new DataRobot Community.

Visit the official Paxata Documentation portal for all of your doc needs.

Identifying Patterns

ernzdlernzdl Posts: 13
edited March 5, 2019 11:05PM in Data Prep Q&A
How do I find out if a dataset with telephone numbers have different area codes but have the same phone number?
For example, "+1 415 123 12 12" vs "+90 415 123 12 12". 


  • AkshayAkshay Posts: 110 admin

    Hello Eren,

    The solution to your problem is as follows:

    Step 1: Start the project with the Phone number dataset

     Now there are two ways to go about this:

    In order to check if a Phone number has more than own country code associated with this.

    Step 2:  Use the shape function on Paxata to perform a deduplicate and a group by operation.

    Step 2.1: Perform a deduplicate on Country code, Phone number to get rid of any repeated occurrence of a Country code, phone number pair.

    Step 2.2: Do a group by on Phone number and do a count on country code.

    Step 3: Use an if statement via the compute feature we can check if a duplicate exists on this column.

    I hope this answers your question!




  • Thank you so much Akshay!
Sign In or Register to comment.