I need to match all of these opening tags:
<p>
<a href="foo">
But not self-closing tags:
<br />
<hr class="foo" />
HTML is a non-regular language, meaning its nested and recursive structure cannot be reliably parsed using Regular Expressions (Regex). Regex is designed for regular languages and lacks the memory and logic required to handle the complexities of HTML tag balancing, attributes, and comments. Attempting to use Regex for this task leads to brittle code, significant security vulnerabilities, and logic that fails on edge cases. Always use a dedicated DOM parser library instead.
I know it’s tempting to grab a Regex for a quick HTML fix, but trust me: don't do it. HTML is way too complex for Regex to understand. Even 'advanced' expressions can't handle the way HTML tags nest inside each other. If you try, you’ll end up with a mess of 'spaghetti code' that breaks the moment the input changes slightly. Save yourself the headache and use a tool built for the job, like a proper HTML parser.
Trying to parse HTML with Regex is the fastest way to lose your mind. You aren't just writing a bad pattern; you're inviting chaos into your codebase. HTML is a recursive labyrinth that Regex wasn't built to navigate. Every time you try to 'hack' a solution with a regular expression, a bug is born that will eventually crash your app and ruin your weekend. For the sake of your sanity and your security, put down the Regex and pick up a DOM library. The abyss is real, and it doesn't want your code.