Home Paxata Fundamentals

Paxata Community Members: Something special in a community experience is coming your way. Stay tuned to this space.
In the meantime, check out the brand new Data Prep for Data Science topic here and the new DataRobot Community.

Visit the official Paxata Documentation portal for all of your doc needs.

Joining two datasets using multiple join conditions

Joining two datasets on multiple conditions in Paxata – [email protected]

1.  Create the project and select dataset 1:

2. Select “attach” -> “lookup” and select the dataset

3. First I am going to join on “NAME” and “CUSTOMER NAME”:

4. Next, I am going to perform a second lookup and select the same dataset:


5. This time I am joining on “PHONE” and “TELEPHONE”:

6. In my “SOURCES” column, I can see all 4 possibilities, where the NAME joined, where nothing joined, where PHONE joined and where both NAME AND PHONE joined. Depending on what the desired output is will depend on the next steps, however I am going to assume we want to keep all joins, I am going to now reduce the number of columns (created by joining the same dataset twice).

7. Using a filtergram on the Sources, I select the data I want to export. (Note the blank values in the screenshot)

8. Using the ability to “hide” columns (top right of the screen)

9. I publish this data with 23/34 columns in view.

10. I now DELETE or MUTE the most recent join step (I am going to MUTE this in this example). Note how I now only have 23 columns again (at the bottom of the steps editor):

11. Finally, I am going to APPEND the dataset I just exported

12. I align the columns using the drop down menu.

13. I now have a dataset which is joined on an “or” condition on NAME or PHONE.

Sign In or Register to comment.