If you're using regex.h in code, you can just search for the html tags instead. You'll get the index into the string at which the match is found, and the length of the match. From that, you can just grab the portion of the string that wasn't matched.
Match an html tag using this:
<[^>]+>[^<]+<[^>]+>
Depending on the options you use in regex, that may or may not catch tags that span multiple lines.