Visit the official Paxata Documentation portal for all of your doc needs.


How to find any sequence of 3+ repeating non-alphabetic characters?


  • ernzdlernzdl Posts: 11
    edited December 4, 2018 11:01PM
    I come up with this: 
    LEN(STR(@FIRST [email protected])) - LEN(REGEXP(STR(@FIRST [email protected]),"[^a-zA-Z]+",""))
    But this one only finds how many non-alphabetic I have. I would like to find the repeating ones more than 3. For example, "Eren$$$" is allowed. But, "Eren$$$$" is not allowed. I want to find the rows when the same non-alphabetic character repeats more than 3 times. 
  • Hello Eren, 

    For finding out 3+ consecutive non alphabetical characters, you can use the following regular expression if(regexp(@[email protected] , ".*[/[^A-Za-z]]{3}.*", "true"),"true","false"). This flags consecutive occurrences of non alpha characters as true. Once this is done, you can pull up a filter to find the rows where there are consecutive occurrences of non alphabetical characters.
  • @ernzdl This should give you what you are looking for.

    Based on the image below, the regular expression identifies values that contain 2 or more repeated characters. For your use case, you would modify it to identify any values that have 4 repeated consecutive values:

    regexp(@[email protected]), "(\\w)\\1{4,}","") 

    In order to identify the proper rows, we compare the length of the string before and after the regexp and if different (meaning the value has repeated consecutive characters) then flag it as "Invalid"

    I hope this helps.
  • I would like to find non-alphabetic characters that are repeating more than 3 times.

    "aaaa" is valid.
    "+++" is valid.
    "????" is invalid.

    Based on your answers, I wrote this but still I need to modify:

    if(len(regexp(@Test [email protected], "[^A-Za-z]\\1{3,}","")) <> len(@Test [email protected]),"invalid","valid")

    Can you help me to modify this?

  • ernzdlernzdl Posts: 11
    edited December 5, 2018 10:18PM
    Sorry for the confusion but it does not have to be consecutive. 
    I want to find the rows that have the same non-alphabetic characters that are repeating more than 3 times.
    FirstName Flag
    E?E!E+ valid
    E?E?E valid
    E?E?E? valid
    E?E?E?E? invalid
    Eren???? invalid
    Eren!$%^ valid
    $$$$Eren invalid
    $Er$$n valid
    $$$Eren valid
Sign In or Register to comment.