w3hello.com logo
Home PHP C# C++ Android Java Javascript Python IOS SQL HTML videos Categories
How can I find the start and end of a regex match using a python pandas dataframe?

I don't think this exists in pandas, but would be a great addition. Go to https://github.com/pydata/pandas/issues and add a new Issue. Explain that it's an enhancement that you'd like to see.

For the .start() and .end() method, those probably make more sense as kwargs to the extract() method. If str.extract(pat, start_index=True), then returns a Series or Dataframe of start indexes rather than the value of the capture group. Same goes for end_index=True. Those probably need to be mutually exclusive.

I also like your suggestion of

df.sliced = df.string[df.start:df.end]

Pandas already has a str.slice method

df.sliced = df.string.str.slice(1, -1)

But those have to be ints. Add a separate issue on Github to have the str.slice method take series objects and apply element-wise.

Sorry to not have a better solution than your lambda hack, but it's use-cases like these that help drive Pandas to be better.





© Copyright 2018 w3hello.com Publishing Limited. All rights reserved.