Python语言总结 4.2. 和字符串(str,unicode等)处理有关的函数
4.2. 和字符串(str,unicode等)处理有关的函数
4.2.7. 去除控制字符:removeCtlChr使得处理后的字符串,在XML都是合法的了。
# remove control character from input string
# otherwise will cause wordpress importer import failed
# for wordpress importer, if contains contrl char, will fail to import wxr
# eg:
# 1.
# content contains some invalid ascii control chars
# 2.
# 165th comment contains invalid control char: ETX
# 3.
# title contains control char:DC1, BS, DLE, DLE, DLE, DC1
def removeCtlChr(inputString) :
validContent = '';
for c in inputString :
asciiVal = ord(c);
validChrList = [
9, # 9=\t=tab
10, # 10=\n=LF=Line Feed=换行
13, # 13=\r=CR=回车
# filter out others ASCII control character, and DEL=delete
isValidChr = True;
if (asciiVal == 0x7F) :
isValidChr = False;
elif ((asciiVal < 32) and (asciiVal not in validChrList)) :
isValidChr = False;
if(isValidChr) :
validContent += c;return validContent;
Example 4.11. removeCtlChr的使用范例
# remove the control char in title:
# eg;
# title contains control char:DC1, BS, DLE, DLE, DLE, DC1
infoDict['title'] = removeCtlChr(infoDict['title']);[Tip] 关于控制字符
