How to remove special characters from a string in JavaScript
Removing special characters from strings is crucial for data sanitization, URL slug generation, filename cleaning, and implementing features like username validation or search query normalization in JavaScript applications.
With over 25 years of experience in software development and as the creator of CoreUI, I’ve implemented special character removal extensively in components like form validation, file upload systems, and search functionality where clean, standardized text improves data quality and user experience.
From my extensive expertise, the most powerful and flexible solution is using the replace() method with regular expressions to target specific character patterns.
This approach provides precise control over which characters to remove and handles complex filtering scenarios efficiently.
Use replace() with a regular expression to remove special characters from a string.
const text = 'Hello@#$% World!!!'
const cleaned = text.replace(/[^a-zA-Z0-9\s]/g, '')
// Result: 'Hello World'
The replace() method with the regular expression /[^a-zA-Z0-9\s]/g removes all characters that are not letters, numbers, or spaces. The ^ inside the brackets creates a negated character class, meaning it matches anything NOT in the specified range. The g flag makes it global, replacing all occurrences rather than just the first. In this example, all special characters like @, #, $, %, and ! are removed, leaving only 'Hello World'.
1. Keeping Specific Special Characters
Often you need to remove most special characters but keep certain ones like hyphens, underscores, or periods. Customize the character class to whitelist exactly what you need.
// Keep hyphens and underscores
const username = 'john_doe-123!@#'
const cleanUsername = username.replace(/[^a-zA-Z0-9_-]/g, '')
// Result: 'john_doe-123'
// Keep periods (for email-like strings)
const email = '[email protected]!!!'
const cleanEmail = email.replace(/[^a-zA-Z0-9.@_-]/g, '')
// Result: '[email protected]' — oops, + was removed
// Be explicit about which characters to allow
const cleanEmailFull = email.replace(/[^a-zA-Z0-9.@+_-]/g, '')
// Result: '[email protected]!!!' — wait, need to exclude !
// Actually: '[email protected]'
// Keep only digits and decimal point (for price inputs)
const priceInput = '$1,234.56 USD'
const numericPrice = priceInput.replace(/[^0-9.]/g, '')
// Result: '1234.56'
// Keep only digits (for phone numbers)
const phone = '+1 (555) 123-4567'
const digitsOnly = phone.replace(/\D/g, '')
// Result: '15551234567'
Place hyphens at the start or end of the character class ([-a-z] or [a-z-]) to avoid ambiguity — a hyphen between characters like [a-z] defines a range, while [-az] matches a literal hyphen, a, or z. This is the approach we use in CoreUI form validation to sanitize user input before processing.
2. Handling International Characters with Unicode
The basic /[^a-zA-Z0-9]/g pattern only matches ASCII letters and will strip accented characters like é, ñ, and ü. Use Unicode property escapes for proper international support.
const internationalText = 'Café résumé naïve über Straße'
// ASCII-only regex — REMOVES accented characters
const asciiOnly = internationalText.replace(/[^a-zA-Z0-9\s]/g, '')
// Result: 'Caf rsum nave ber Strae' (broken!)
// Unicode property escape — keeps all letters from any language
const unicodeSafe = internationalText.replace(/[^\p{L}\p{N}\s]/gu, '')
// Result: 'Café résumé naïve über Straße' (correct)
// \p{L} matches any Unicode letter (Latin, Cyrillic, CJK, etc.)
// \p{N} matches any Unicode number
// The 'u' flag enables Unicode mode
// Works with non-Latin scripts too
const mixed = 'Hello 你好 مرحبا 🎉 #trending!'
const cleanMixed = mixed.replace(/[^\p{L}\p{N}\s]/gu, '')
// Result: 'Hello 你好 مرحبا trending'
// Keep letters, numbers, and common punctuation
const keepPunctuation = mixed.replace(/[^\p{L}\p{N}\p{P}\s]/gu, '')
// Result: 'Hello 你好 مرحبا #trending!'
Unicode property escapes (\p{L}, \p{N}) require the u flag and are supported in all modern browsers and Node.js 10+. Always use this approach instead of \w when handling international text — the \w shorthand is equivalent to [A-Za-z0-9_] and does not include accented or non-Latin characters.
3. Generating URL-Friendly Slugs
Slug generation is one of the most common use cases for removing special characters. The goal is a lowercase, hyphen-separated string with only ASCII characters.
// Basic slug generator
const slugify = (text) => {
return text
.toLowerCase()
.trim()
.replace(/[^\w\s-]/g, '') // remove non-word chars (except hyphens)
.replace(/\s+/g, '-') // replace spaces with hyphens
.replace(/-+/g, '-') // collapse consecutive hyphens
.replace(/^-|-$/g, '') // trim leading/trailing hyphens
}
console.log(slugify('How to Remove Special Characters!!!'))
// Result: 'how-to-remove-special-characters'
console.log(slugify(' Hello, World! '))
// Result: 'hello-world'
// Advanced slug with transliteration for accented characters
const translitMap = {
'à': 'a', 'á': 'a', 'â': 'a', 'ã': 'a', 'ä': 'a', 'å': 'a',
'è': 'e', 'é': 'e', 'ê': 'e', 'ë': 'e',
'ì': 'i', 'í': 'i', 'î': 'i', 'ï': 'i',
'ò': 'o', 'ó': 'o', 'ô': 'o', 'õ': 'o', 'ö': 'o',
'ù': 'u', 'ú': 'u', 'û': 'u', 'ü': 'u',
'ñ': 'n', 'ç': 'c', 'ß': 'ss', 'ø': 'o', 'æ': 'ae'
}
const advancedSlugify = (text) => {
return text
.toLowerCase()
.split('')
.map(char => translitMap[char] || char)
.join('')
.replace(/[^\w\s-]/g, '')
.replace(/\s+/g, '-')
.replace(/-+/g, '-')
.replace(/^-|-$/g, '')
}
console.log(advancedSlugify('Crème Brûlée Recipe'))
// Result: 'creme-brulee-recipe'
console.log(advancedSlugify('Straße nach München'))
// Result: 'strasse-nach-muenchen' — wait, ü→u not ue
// For German: add 'ü': 'ue' to the map if needed
Note that \w in the slug generator is fine here because we want ASCII-only output for URLs. The transliteration map converts accented characters to their ASCII equivalents before the regex strips anything remaining. For more on string replacement, see how to replace all occurrences of a string in JavaScript.
4. Sanitizing Filenames
File systems have strict rules about allowed characters. Sanitizing filenames prevents errors and security issues when saving user-provided names.
// Remove characters not allowed in most file systems
const sanitizeFilename = (filename) => {
return filename
.replace(/[<>:"/\\|?*\x00-\x1F]/g, '') // forbidden chars
.replace(/\s+/g, '_') // spaces to underscores
.replace(/^\.+/, '') // no leading dots (hidden files)
.slice(0, 255) // max filename length
}
console.log(sanitizeFilename('my<file>:name?.txt'))
// Result: 'myfilename.txt'
console.log(sanitizeFilename('...hidden file v2.pdf'))
// Result: 'hidden_file___v2.pdf'
// Preserve the extension while cleaning the name
const sanitizeWithExtension = (filename) => {
const lastDot = filename.lastIndexOf('.')
if (lastDot === -1) return sanitizeFilename(filename)
const name = filename.slice(0, lastDot)
const ext = filename.slice(lastDot)
const cleanName = name
.replace(/[<>:"/\\|?*\x00-\x1F]/g, '')
.replace(/\s+/g, '_')
.replace(/^\.+/, '')
const cleanExt = ext
.toLowerCase()
.replace(/[^a-z0-9.]/g, '')
return (cleanName || 'unnamed') + cleanExt
}
console.log(sanitizeWithExtension('My Report (Final!!).PDF'))
// Result: 'My_Report_(Final).pdf'
console.log(sanitizeWithExtension('<script>.js'))
// Result: 'script.js'
Always sanitize user-provided filenames on the server side as well — client-side sanitization is for convenience, not security. The \x00-\x1F range removes control characters that could cause issues.
5. Cleaning HTML and Preventing XSS
Removing HTML tags and special characters from user input is essential for preventing cross-site scripting (XSS) attacks. While dedicated sanitization libraries are recommended for production, understanding the regex approach is fundamental.
// Strip HTML tags
const stripTags = (html) => html.replace(/<[^>]*>/g, '')
console.log(stripTags('<p>Hello <b>World</b></p>'))
// Result: 'Hello World'
// Escape HTML entities instead of removing
const escapeHtml = (text) => {
const map = {
'&': '&',
'<': '<',
'>': '>',
'"': '"',
"'": '''
}
return text.replace(/[&<>"']/g, char => map[char])
}
console.log(escapeHtml('<script>alert("XSS")</script>'))
// Result: '<script>alert("XSS")</script>'
// Remove all non-printable characters
const removeNonPrintable = (text) => {
return text.replace(/[^\x20-\x7E\t\n\r]/g, '')
}
// Comprehensive input sanitizer
const sanitizeInput = (input) => {
if (typeof input !== 'string') return ''
return input
.trim()
.replace(/<[^>]*>/g, '') // strip HTML tags
.replace(/[^\p{L}\p{N}\p{P}\s]/gu, '') // keep letters, numbers, punctuation
.replace(/\s+/g, ' ') // normalize whitespace
}
console.log(sanitizeInput(' <b>Hello</b> World \n\n '))
// Result: 'Hello World'
For production applications, always use a proven sanitization library like DOMPurify rather than regex alone. Regex-based tag stripping can be bypassed with malformed HTML. The examples above demonstrate the concept but should not be your only line of defense. For more on CoreUI form input handling, see the documentation.
6. Using replaceAll() for Simple Patterns
For removing specific known characters (not patterns), replaceAll() provides a cleaner syntax than regex.
// Remove specific characters without regex
const text = 'Hello... World!!!'
const noExclamation = text.replaceAll('!', '')
// Result: 'Hello... World'
const noDots = text.replaceAll('.', '')
// Result: 'Hello World!!!'
// Chain replaceAll for multiple characters
const cleaned = text
.replaceAll('!', '')
.replaceAll('.', '')
// Result: 'Hello World'
// replaceAll with regex — same as replace with /g flag
const noSpecial = text.replaceAll(/[^a-zA-Z0-9\s]/g, '')
// Result: 'Hello World'
// Practical example: clean currency string for parsing
const price = '$1,234,567.89'
const numericString = price
.replaceAll('$', '')
.replaceAll(',', '')
// Result: '1234567.89'
const numericValue = parseFloat(numericString)
// Result: 1234567.89
// Remove emoji from text
const withEmoji = 'Hello 👋 World 🌍!'
const noEmoji = withEmoji.replace(/[\p{Emoji_Presentation}\p{Extended_Pictographic}]/gu, '')
// Result: 'Hello World !'
replaceAll() is available in all modern browsers and Node.js 15+. For older environments, use replace() with the g flag. When removing a fixed set of known characters, chaining replaceAll() can be more readable than a complex regex. For the inverse operation of splitting strings, see how to split a string by spaces in JavaScript.
Best Practice Note:
This is the same approach we use in CoreUI React components for sanitizing user input, creating URL-friendly slugs, and processing form data throughout our component library.
Choose the right regex for your context: use /[^a-zA-Z0-9\s]/g for ASCII-only text, /[^\p{L}\p{N}\s]/gu for international text, and dedicated character lists for specific formats like filenames or URLs. Always remember that \w is equivalent to [A-Za-z0-9_] — it does not match accented or non-Latin characters. For more on string manipulation, see how to convert a string to an array in JavaScript.



