linux - Remove lines with japanese characters from a file -


first question on here- i've searched around put answer have come empty far.

i have multi-line text file cleaning up. part of remove lines include japanese characters. have been using sed other operations not working in instance.

i under impression using -r switch , \p{han} regular expression work (from looking @ other questions of kind), not working in case.

here test string - running returns full string, , not filter out jp characters expecting.

echo 80岁返老还童的处女: 第3话 | sed -r "s/\\p\{han\}//g" 

am missing something? there command should using instead?

i think might work you:

echo "80岁返老还童的处女: 第3话" | tr -cd '[:print:]\n' 

sed doesn't support unicode classes afaik, , nor support multibyte ranges.

-d deletes characters in set1, , -c reverses it.
[:print:] matches printable characters including space.
\n newline

the above not remove japanese characters multibyte characters, including control characters.

perl can used:

perlio=:utf8 perl -pe 's/\p{han}//g' file 

perlio=:utf8 tells perl tread input , output utf-8


Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -