linux - Remove lines with japanese characters from a file -

- January 15, 2011

first question on here- i've searched around put answer have come empty far.

i have multi-line text file cleaning up. part of remove lines include japanese characters. have been using sed other operations not working in instance.

i under impression using -r switch , \p{han} regular expression work (from looking @ other questions of kind), not working in case.

here test string - running returns full string, , not filter out jp characters expecting.

echo 80岁返老还童的处女: 第3话 | sed -r "s/\\p\{han\}//g"

am missing something? there command should using instead?

i think might work you:

echo "80岁返老还童的处女: 第3话" | tr -cd '[:print:]\n'

sed doesn't support unicode classes afaik, , nor support multibyte ranges.

-d deletes characters in set1, , -c reverses it.
[:print:] matches printable characters including space.
\n newline

the above not remove japanese characters multibyte characters, including control characters.

perl can used:

perlio=:utf8 perl -pe 's/\p{han}//g' file

perlio=:utf8 tells perl tread input , output utf-8

Search This Blog

To form

linux - Remove lines with japanese characters from a file -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

delphi - Take screenshot in webcam using VFrames in Console Application -

extjs - Set tooltip on click event on the grid cell -