w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
RegEx on a string for a match AND a match REGARDLESS of order

Regexes are one of the most seductive features of any language. However, just because they're cool and sexy and look very powerful doesn't mean they're the correct tool. For something like this, a simple state machine suffices and is likely to be MUCH faster. The code below finds the longest substring containing only c and g, and can be easily adapted to keep multiple substrings by adding them to a collection.

    String data = "acgtcgcgagagagggggcccataatggg";
    int    longestPos = 0;
    int    longestLen = 0;
    int p=-1;
    for (int i=0; i<data.length(); i++)
    {
        char c = data.charAt(i);
        if (c == 'c' || c == 'g')  // Is this the droid you're looking for?
        {
            if (p==-1)  // Are we not yet in an interesting string?
                p = i;  // If so, save the position of this start of
substring.
        }
        else  // Not a c or g
        {
            if (p != -1 && i-p > longestLen)  // Are we in an
interesting string longer than the previous longest?
            {
                longestPos = p;     // Save the starting position
                longestLen = i-p;   // Save the length
            }
            p = -1;   // We're no longer inside an interesting string
        }
    }

    // Handle the case where the last substring was 'interesting'
    if (p != -1 && i-p > longestLen)
    {
        longestPos = p;     // Save the starting position
        longestLen = i-p;   // Save the length
    }

    System.out.printf("Longest string is at position %d for length %d",
longestPos, longestLen);

For the canonical response to "let's use a regex where it does not apply" see this post





© Copyright 2018 w3hello.com Publishing Limited. All rights reserved.