Optimize Regex to extract content between two tags (or How to select content between two tags with Jsoup selector API?)

You can acheive the same with Jsoup's css selector.


h2:has(span#JDK_contents) ~
*:not(h2:has(span#Ambiguity_between_a_JDK_and_an_SDK) ~ *):not(h2)



For clarity, let's call h2Start an h2 tag having at least one span with id JDK_contents. We'll call too h2End an h2 tag having at least one span with id Ambiguity_between_a_JDK_and_an_SDK.

h2:has(span#JDK_contents)  /* Select an h2Start */
~ *                        /* Select any node preceded by this h2Start...
:not(h2:has(span#Ambiguity_between_a_JDK_and_an_SDK) ~ *) /* ...but not
peceded by an h2End */
:not(h2) /* We remove h2End  */

NOTA: In the case of the JDK wiki page, the last line is enough. More rigourously, we would replace it with :not(h2:has(span#Ambiguity_between_a_JDK_and_an_SDK)).

