I've got a Perl script that simply parses out HTML from the standard input, and then outputs the result. However, for UTF-8/Unicode text (still not 100% clear on the difference between these encodings...), the output is all garbled. Anyone have any ideas?
Here's the Perl code:
Here's the Perl code:
Code:
#!/usr/bin/perl
$str = "";
while ($line=<STDIN>)
{
$str .= $line;
}
$str =~ s/<script[^>]*>(.*?)<\/script>//gsi; # remove <script>
$str =~ s/<(?:[^>'"]*|(['"]).*?)*>//gsi; # remove html
print $str;