linux - Remove lines with japanese characters from a file -
first question on here- i've searched around put answer have come empty far.
i have multi-line text file cleaning up. part of remove lines include japanese characters. have been using sed
other operations not working in instance.
i under impression using -r
switch , \p{han}
regular expression work (from looking @ other questions of kind), not working in case.
here test string - running returns full string, , not filter out jp characters expecting.
echo 80岁返老还童的处女: 第3话 | sed -r "s/\\p\{han\}//g"
am missing something? there command should using instead?
i think might work you:
echo "80岁返老还童的处女: 第3话" | tr -cd '[:print:]\n'
sed doesn't support unicode classes afaik, , nor support multibyte ranges.
-d
deletes characters in set1, , -c
reverses it.
[:print:]
matches printable characters including space.
\n
newline
the above not remove japanese characters multibyte characters, including control characters.
perl can used:
perlio=:utf8 perl -pe 's/\p{han}//g' file
perlio=:utf8
tells perl tread input , output utf-8
Comments
Post a Comment