Stripping HTML Special Characters from a String
When creating an RSS feed, it's crucial to remove HTML tags and special characters to ensure compatibility. While strip_tags() effectively removes tags, it often leaves behind HTML special characters.
To address this issue, there are two potential solutions:
html_entity_decode():
This function decodes HTML entities and replaces them with their corresponding characters. For instance, would be converted to a space.
preg_replace():
Using regular expressions, preg_replace() allows you to remove specific sequences of characters. The following pattern matches and removes HTML special characters:
/&#?[a-z0-9] ;/i
This pattern searches for sequences starting with , followed by a combination of letters and numbers, and ending with a semicolon.
To implement this solution:
$content = preg_replace("/&#?[a-z0-9] ;/i", "", $content);
Jacco's Alternative:
Another option, as suggested by Jacco in the comment section, is to use the following pattern:
/&#?[a-z0-9]{2,8};/i
This pattern limits the replacement to sequences within a certain character range, reducing the risk of accidentally replacing unencoded & characters in sentences.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3