camelCase函数,直译过来是驼峰转换。本文将涉及到ascii码表,Unicode码表,利用正则表达式匹配ascii码或Unicode码,类型转换,密集型数组slice方法等内容
ASCII
Unicode
定义
Unicode
Unicode
是一个包含用来编写文本的各种字符的一张表。ASCII
也被包含在Unicode
中
UTF-8
UTF-8
是一种编码,用于把一系列的代码点转换为机器码。所有 Unicode 代码点都可以用 UTF-8 编码。ASCII 也可以被编码,但仅支持128个字符
character
字符是一个相当模糊的概念。字母和数字以及标点符号都是字符。基本上是 Unicode 表中某处的一个字符。
glyph
字形是某种符号的视觉表示,由字体提供。它可能代表单个字符,也可能代表多个字符。或代表两者。
后续内容 Dark corners of Unicode。作者提出了一个很有意思的见解,javascript-has-no-string-type。
相关网站
类型转换
getTag
const toString = Object.prototype.toString
function getTag(value) {
if (value == null) {
// 兼容javascript低版本,特殊处理null和undefined
return value === undefined ? '[object Undefined]' : '[object Null]'
}
return toString.call(value)
}
复制代码
isSymbol
function isSymbol(value) {
const type = typeof value
return type == 'symbol' || (type === 'object' && value != null && getTag(value) == '[object Symbol]')
}
复制代码
toString
特殊处理-0的情况。递归处理数组(可能会堆栈溢出)。
const INFINITY = 1 / 0
function toString(value) {
if (value == null) {
return ''
}
// Exit early for strings to avoid a performance hit in some environments.
// string类型直接返回
if (typeof value === 'string') {
return value
}
// 数组类型
if (Array.isArray(value)) {
// Recursively convert values (susceptible to call stack limits).
// 数组项不为null或undefined的情况下,递归调用自身进行转换
return `${value.map((other) => other == null ? other : toString(other))}`
}
// symbol类型调用symbol的toString方法
if (isSymbol(value)) {
return value.toString()
}
// 处理-0的情况
const result = `${value}`
return (result == '0' && (1 / value) == -INFINITY) ? '-0' : result
}
复制代码
匹配ascii码和unicode码
unicodeWords
用于匹配unicode编码的函数
/** Used to compose unicode character classes. */
// 星芒层
const rsAstralRange = '\\ud800-\\udfff'
// https://www.unicode.org/charts/PDF/U0300.pdf
const rsComboMarksRange = '\\u0300-\\u036f'
// https://www.unicode.org/charts/PDF/UFE20.pdf
const reComboHalfMarksRange = '\\ufe20-\\ufe2f'
// https://www.unicode.org/charts/PDF/U20D0.pdf
const rsComboSymbolsRange = '\\u20d0-\\u20ff'
// https://www.unicode.org/charts/PDF/U1AB0.pdf
const rsComboMarksExtendedRange = '\\u1ab0-\\u1aff'
// https://www.unicode.org/charts/PDF/U1DC0.pdf
const rsComboMarksSupplementRange = '\\u1dc0-\\u1dff'
const rsComboRange = rsComboMarksRange + reComboHalfMarksRange + rsComboSymbolsRange + rsComboMarksExtendedRange + rsComboMarksSupplementRange
// https://www.unicode.org/charts/PDF/U2700.pdf
const rsDingbatRange = '\\u2700-\\u27bf'
// a-z 223-246 248-255
const rsLowerRange = 'a-z\\xdf-\\xf6\\xf8-\\xff'
// 172 177 215 x 247 +
const rsMathOpRange = '\\xac\\xb1\\xd7\\xf7'
// 0-48 58-64 91-96 124-191
const rsNonCharRange = '\\x00-\\x2f\\x3a-\\x40\\x5b-\\x60\\x7b-\\xbf'
// https://www.unicode.org/charts/PDF/U2000.pdf
const rsPunctuationRange = '\\u2000-\\u206f'
// \t \n \r \f 11 160
// feff https://www.unicode.org/charts/PDF/UFE70.pdf
// 1680 https://www.unicode.org/charts/PDF/U1680.pdf
// 180e https://www.unicode.org/charts/PDF/U1800.pdf
// 2000-200a 2028 2029 202f 205f https://www.unicode.org/charts/PDF/U2000.pdf
// 3000 https://www.unicode.org/charts/PDF/U3000.pdf
const rsSpaceRange = ' \\t\\x0b\\f\\xa0\\ufeff\\n\\r\\u2028\\u2029\\u1680\\u180e\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2007\\u2008\\u2009\\u200a\\u202f\\u205f\\u3000'
// A-Z 192-214 216-222
const rsUpperRange = 'A-Z\\xc0-\\xd6\\xd8-\\xde'
// fe0e fe0f https://www.unicode.org/charts/PDF/UFE00.pdf
const rsVarRange = '\\ufe0e\\ufe0f'
const rsBreakRange = rsMathOpRange + rsNonCharRange + rsPunctuationRange + rsSpaceRange
/** Used to compose unicode capture groups. */
// 匹配apostrophe(撇号)' https://www.unicode.org/charts/PDF/U2000.pdf
const rsApos = "['\u2019]"
// 匹配 运算符 非char字符 各种标点符 各种空格
const rsBreak = `[${rsBreakRange}]`
// 见上述网址
const rsCombo = `[${rsComboRange}]`
// 数字
const rsDigit = '\\d'
// 见上述网址
const rsDingbat = `[${rsDingbatRange}]`
// 小写字母
const rsLower = `[${rsLowerRange}]`
// 匹配各种其他字符
const rsMisc = `[^${rsAstralRange}${rsBreakRange + rsDigit + rsDingbatRange + rsLowerRange + rsUpperRange}]`
// 组合 由d83c +
// 位于 ud800-udfff 的一个元素 d83c 和DC00-DFFF的组合
// fitz emoji
const rsFitz = '\\ud83c[\\udffb-\\udfff]'
// 修饰符
const rsModifier = `(?:${rsCombo}|${rsFitz})`
// 非星芒层
const rsNonAstral = `[^${rsAstralRange}]`
// 国家地图
const rsRegional = '(?:\\ud83c[\\udde6-\\uddff]){2}'
const rsSurrPair = '[\\ud800-\\udbff][\\udc00-\\udfff]'
// 大写字母
const rsUpper = `[${rsUpperRange}]`
// ZWJ https://www.unicode.org/charts/PDF/U2000.pdf ZERO WIDTH JOINER
const rsZWJ = '\\u200d'
/** Used to compose unicode regexes. */
const rsMiscLower = `(?:${rsLower}|${rsMisc})`
const rsMiscUpper = `(?:${rsUpper}|${rsMisc})`
const rsOptContrLower = `(?:${rsApos}(?:d|ll|m|re|s|t|ve))?`
const rsOptContrUpper = `(?:${rsApos}(?:D|LL|M|RE|S|T|VE))?`
const reOptMod = `${rsModifier}?`
const rsOptVar = `[${rsVarRange}]?`
const rsOptJoin = `(?:${rsZWJ}(?:${[rsNonAstral, rsRegional, rsSurrPair].join('|')})${rsOptVar + reOptMod})*`
const rsOrdLower = '\\d*(?:1st|2nd|3rd|(?![123])\\dth)(?=\\b|[A-Z_])'
const rsOrdUpper = '\\d*(?:1ST|2ND|3RD|(?![123])\\dTH)(?=\\b|[a-z_])'
const rsSeq = rsOptVar + reOptMod + rsOptJoin
const rsEmoji = `(?:${[rsDingbat, rsRegional, rsSurrPair].join('|')})${rsSeq}`
const reUnicodeWords = RegExp([
`${rsUpper}?${rsLower}+${rsOptContrLower}(?=${[rsBreak, rsUpper, '$'].join('|')})`,
`${rsMiscUpper}+${rsOptContrUpper}(?=${[rsBreak, rsUpper + rsMiscLower, '$'].join('|')})`,
`${rsUpper}?${rsMiscLower}+${rsOptContrLower}`,
`${rsUpper}+${rsOptContrUpper}`,
rsOrdUpper,
rsOrdLower,
`${rsDigit}+`,
rsEmoji
].join('|'), 'g')
/**
* Splits a Unicode `string` into an array of its words.
*
* @private
* @param {string} The string to inspect.
* @returns {Array} Returns the words of `string`.
*/
function unicodeWords(string) {
return string.match(reUnicodeWords)
}
复制代码
reUnicodeWords
生成的正则表达式如下:
图形化结果
hasUnicode
是否包含unicode
/** Used to compose unicode character classes. */
// 星芒层
const rsAstralRange = '\\ud800-\\udfff'
// https://www.unicode.org/charts/PDF/U0300.pdf
const rsComboMarksRange = '\\u0300-\\u036f'
// https://www.unicode.org/charts/PDF/UFE20.pdf
const reComboHalfMarksRange = '\\ufe20-\\ufe2f'
// https://www.unicode.org/charts/PDF/U20D0.pdf
const rsComboSymbolsRange = '\\u20d0-\\u20ff'
// https://www.unicode.org/charts/PDF/U1AB0.pdf
const rsComboMarksExtendedRange = '\\u1ab0-\\u1aff'
// https://www.unicode.org/charts/PDF/U1DC0.pdf
const rsComboMarksSupplementRange = '\\u1dc0-\\u1dff'
const rsComboRange = rsComboMarksRange + reComboHalfMarksRange + rsComboSymbolsRange + rsComboMarksExtendedRange + rsComboMarksSupplementRange
// fe0e fe0f https://www.unicode.org/charts/PDF/UFE00.pdf
const rsVarRange = '\\ufe0e\\ufe0f'
/** Used to compose unicode capture groups. */
// ZWJ https://www.unicode.org/charts/PDF/U2000.pdf ZERO WIDTH JOINER
const rsZWJ = '\\u200d'
/** Used to detect strings with [zero-width joiners or code points from the astral planes](http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/). */
const reHasUnicode = RegExp(`[${rsZWJ + rsAstralRange + rsComboRange + rsVarRange}]`)
/**
* Checks if `string` contains Unicode symbols.
* 字符串是否包含unicode编码
*
* @private
* @param {string} string The string to inspect.
* @returns {boolean} Returns `true` if a symbol is found, else `false`.
*/
function hasUnicode(string) {
return reHasUnicode.test(string)
}
复制代码
asciiWords
用于匹配对应范围的ascii,对照ascii码表查看
/** Used to match words composed of alphanumeric characters. */
// 用于匹配由字母数字组成的单词,除去了各种标点
// 0-47 58-64 91-96 124-127
const reAsciiWord = /[^\x00-\x2f\x3a-\x40\x5b-\x60\x7b-\x7f]+/g
function asciiWords(string) {
return string.match(reAsciiWord)
}
复制代码
asciiToArray
把一串ascii字符转换成数组
/**
* Converts an ASCII `string` to an array.
*
* @private
* @param {string} string The string to convert.
* @returns {Array} Returns the converted array.
*/
function asciiToArray(string) {
// 调用split方法
return string.split('')
}
复制代码
unicodeToArray
把一串unicode字符转换成数组
/** Used to compose unicode character classes. */
// 星芒层
const rsAstralRange = '\\ud800-\\udfff'
// https://www.unicode.org/charts/PDF/U0300.pdf
const rsComboMarksRange = '\\u0300-\\u036f'
// https://www.unicode.org/charts/PDF/UFE20.pdf
const reComboHalfMarksRange = '\\ufe20-\\ufe2f'
// https://www.unicode.org/charts/PDF/U20D0.pdf
const rsComboSymbolsRange = '\\u20d0-\\u20ff'
// https://www.unicode.org/charts/PDF/U1AB0.pdf
const rsComboMarksExtendedRange = '\\u1ab0-\\u1aff'
// https://www.unicode.org/charts/PDF/U1DC0.pdf
const rsComboMarksSupplementRange = '\\u1dc0-\\u1dff'
const rsComboRange = rsComboMarksRange + reComboHalfMarksRange + rsComboSymbolsRange + rsComboMarksExtendedRange + rsComboMarksSupplementRange
const rsVarRange = '\\ufe0e\\ufe0f'
/** Used to compose unicode capture groups. */
const rsAstral = `[${rsAstralRange}]`
const rsCombo = `[${rsComboRange}]`
// fitz emoji
const rsFitz = '\\ud83c[\\udffb-\\udfff]'
// 修饰符
const rsModifier = `(?:${rsCombo}|${rsFitz})`
// 非星芒层
const rsNonAstral = `[^${rsAstralRange}]`
// 国家旗帜
const rsRegional = '(?:\\ud83c[\\udde6-\\uddff]){2}'
// High Surrogate Area https://www.unicode.org/charts/PDF/UD800.pdf
// Low Surrogate Area https://www.unicode.org/charts/PDF/UDC00.pdf
const rsSurrPair = '[\\ud800-\\udbff][\\udc00-\\udfff]'
// ZWJ https://www.unicode.org/charts/PDF/U2000.pdf ZERO WIDTH JOINER
const rsZWJ = '\\u200d'
/** Used to compose unicode regexes. */
// 生成匹配正则
const reOptMod = `${rsModifier}?`
const rsOptVar = `[${rsVarRange}]?`
const rsOptJoin = `(?:${rsZWJ}(?:${[rsNonAstral, rsRegional, rsSurrPair].join('|')})${rsOptVar + reOptMod})*`
const rsSeq = rsOptVar + reOptMod + rsOptJoin
const rsNonAstralCombo = `${rsNonAstral}${rsCombo}?`
const rsSymbol = `(?:${[rsNonAstralCombo, rsCombo, rsRegional, rsSurrPair, rsAstral].join('|')})`
/** Used to match [string symbols](https://mathiasbynens.be/notes/javascript-unicode). */
const reUnicode = RegExp(`${rsFitz}(?=${rsFitz})|${rsSymbol + rsSeq}`, 'g')
/**
* Converts a Unicode `string` to an array.
*
* @private
* @param {string} string The string to convert.
* @returns {Array} Returns the converted array.
*/
function unicodeToArray(string) {
return string.match(reUnicode) || []
}
复制代码
工具方法
stringToArray
string字符转换为字符串,依赖于上面的hasUnicode
,asciiToArray
,unicodeToArray
方法
import asciiToArray from './asciiToArray.js'
import hasUnicode from './hasUnicode.js'
import unicodeToArray from './unicodeToArray.js'
/**
* Converts `string` to an array.
*
* @private
* @param {string} string The string to convert.
* @returns {Array} Returns the converted array.
*/
function stringToArray(string) {
return hasUnicode(string)
? unicodeToArray(string)
: asciiToArray(string)
}
复制代码
slice
Array.prototype.slice方法的实现。这个方法确保密集型数组的返回。
/**
* Creates a slice of `array` from `start` up to, but not including, `end`.
*
* **Note:** This method is used instead of
* [`Array#slice`](https://mdn.io/Array/slice) to ensure dense arrays are
* returned.这个方法确保密集型数组的返回
*
* @since 3.0.0
* @category Array
* @param {Array} array The array to slice.
* @param {number} [start=0] The start position. A negative index will be treated as an offset from the end.
* @param {number} [end=array.length] The end position. A negative index will be treated as an offset from the end.
* @returns {Array} Returns the slice of `array`.
* @example
*
* var array = [1, 2, 3, 4]
*
* _.slice(array, 2)
* // => [3, 4]
*/
function slice(array, start, end) {
let length = array == null ? 0 : array.length
// length为0返回空数组
if (!length) {
return []
}
// start为null 或 undefined 默认为0
start = start == null ? 0 : start
// end 为 undefined 默认数组长度
end = end === undefined ? length : end
// start 为负数 则从后向前定位起始位置
if (start < 0) {
start = -start > length ? 0 : (length + start)
}
// end 最大不能超过数组长度
end = end > length ? length : end
// end为负数,从后向前确定截止位置
if (end < 0) {
end += length
}
// 开始位置大于截止位置返回0 否则
// 无符号右移 取整
length = start > end ? 0 : ((end - start) >>> 0)
// 无符号右移 取整
start >>>= 0
// 返回对应的数组
let index = -1
const result = new Array(length)
while (++index < length) {
result[index] = array[index + start]
}
return result
}
复制代码
castSlice
依赖于上面的slice
函数。增加了传入数组是否需要slice的判断
import slice from '../slice.js'
/**
* Casts `array` to a slice if it's needed.
*
* @private
* @param {Array} array The array to inspect.
* @param {number} start The start position.
* @param {number} [end=array.length] The end position.
* @returns {Array} Returns the cast slice.
*/
function castSlice(array, start, end) {
const { length } = array
// 不传默认数组长度
end = end === undefined ? length : end
// 调用slice方法
return (!start && end >= length) ? array : slice(array, start, end)
}
复制代码
createCaseFirst
根据传入的methodname 生成对应的函数,本质上是对传入字符串的第一个字符调用对应的方法。依赖于上面的castSlice
函数(用于截取第一个字符之后的内容),hasUnicode
函数(用户检测传入的字符串是否包含unicode),stringToArray
函数(字符串包含unicode的情况下将字符串正确解析为一个数组)
import castSlice from './castSlice.js'
import hasUnicode from './hasUnicode.js'
import stringToArray from './stringToArray.js'
/**
* Creates a function like `lowerFirst`.
* 根据传入的methodname 生成对应的函数
*
* @private
* @param {string} methodName The name of the `String` case method to use.
* @returns {Function} Returns the new case function.
*/
function createCaseFirst(methodName) {
return (string) => {
// string 为空字符串 不做任何操作
if (!string) {
return ''
}
// 包含unicode码,调用内部的asciiToArray和unicodeToArray方法
const strSymbols = hasUnicode(string)
? stringToArray(string)
: undefined
// string 字符不包含unicode 默认取string的第一个字符。否则转化为数组后,取第一个字符
const chr = strSymbols
? strSymbols[0]
: string[0]
// string截取剩下的字符串。包含unicode情况,截取数组的1到最后一项并转换为字符串
const trailing = strSymbols
? castSlice(strSymbols, 1).join('')
: string.slice(1)
// 调用第一个字符串对应的方法执行 prototype上对应的函数 并 追加后续字符串
// 只对字符串的第一个字符进行操作
return chr[methodName]() + trailing
}
}
复制代码
upperFirst
调用了createCaseFirst
并传入方法名toUpperCase
。默认调用String.prototype.toUpperCase
import createCaseFirst from './.internal/createCaseFirst.js'
/**
* Converts the first character of `string` to upper case.
* 字符串的第一个字母转换成大写字母
*
* @since 4.0.0
* @category String
* @param {string} [string=''] The string to convert.
* @returns {string} Returns the converted string.
* @see camelCase, kebabCase, lowerCase, snakeCase, startCase, upperCase
* @example
*
* upperFirst('fred')
* // => 'Fred'
*
* upperFirst('FRED')
* // => 'FRED'
*/
const upperFirst = createCaseFirst('toUpperCase')
复制代码
hasUnicodeWord
是否包含unicode字符,匹配规则如下:
const hasUnicodeWord = RegExp.prototype.test.bind(
/[a-z][A-Z]|[A-Z]{2}[a-z]|[0-9][a-zA-Z]|[a-zA-Z][0-9]|[^a-zA-Z0-9 ]/
)
复制代码
words方法
调用了reAsciiWord
(返回一个数组包含匹配的字符串)函数,unicodeWords
函数(返回一个数组包含匹配的字符串),hasUnicodeWord
函数(匹配包含aA AAa 0a 0A A0 a0
以及任意非数字字母空格)
/**
* Splits `string` into an array of its words.
*
* @since 3.0.0
* @category String
* @param {string} [string=''] The string to inspect.
* @param {RegExp|string} [pattern] The pattern to match words.
* @returns {Array} Returns the words of `string`.
* @example
*
* words('fred, barney, & pebbles')
* // => ['fred', 'barney', 'pebbles']
*
* words('fred, barney, & pebbles', /[^, ]+/g)
* // => ['fred', 'barney', '&', 'pebbles']
*/
function words(string, pattern) {
if (pattern === undefined) {
const result = hasUnicodeWord(string) ? unicodeWords(string) : asciiWords(string)
return result || []
}
return string.match(pattern) || []
}
复制代码
camelCase
依赖于toString
函数(传入value转换为string类型),words
函数(匹配字符串中的字符并返回一个数组),upperFirst
函数(单词的第一个字母大写)
import upperFirst from './upperFirst.js'
import words from './words.js'
import toString from './toString.js'
/**
* Converts `string` to [camel case](https://en.wikipedia.org/wiki/CamelCase).
* 驼峰转换
*
* @since 3.0.0
* @category String
* @param {string} [string=''] The string to convert.
* @returns {string} Returns the camel cased string.
* @see lowerCase, kebabCase, snakeCase, startCase, upperCase, upperFirst
* @example
*
* camelCase('Foo Bar')
* // => 'fooBar'
*
* camelCase('--foo-bar--')
* // => 'fooBar'
*
* camelCase('__FOO_BAR__')
* // => 'fooBar'
*/
const camelCase = (string) => (
/**
* 1. 先转换成string类型
* 2. replace替换撇号' ’ 为空字符串
* 3. 传给words方法处理
* 4. 处理words返回的数组
*/
words(toString(string).replace(/['\u2019]/g, '')).reduce((result, word, index) => {
// 当前单词转小写
word = word.toLowerCase()
// index 不等于 0时 把当前单词的第一个字母转换成大写
return result + (index ? upperFirst(word) : word)
}, '')
)
复制代码