Linux "enc2xs" Command Line Options and Examples

- Perl Encode Module Generator

enc2xs builds a Perl extension for use by Encode from either Unicode Character Mapping files (.ucm) or Tcl Encoding Files (.enc).

Usage:

enc2xs -[options] enc2xs -M ModName mapfiles... enc2xs -C

Command Line Options:

-o

Reading myascii (myascii)Writing compiled form128 bytes in string tables384 bytes (75%) saved spotting duplicates1 bytes (0.775%) saved using substrings....chmod 644 blib/arch/auto/Encode/My/My.bs$The time it takes varies depending on how fast your machine is and how large your encoding is. Unless you are working onsomething big like euc-tw, it won't take too long.5. You can "make install" already but you should test first.$ make testPERL_DL_NONLAZY=1 /usr/local/bin/perl -Iblib/arch -Iblib/lib \


                            enc2xs -o ...

-e

$verbose=0; runtests @ARGV;' t/*.tt/My....okAll tests successful.Files=1, Tests=2, 0 wallclock secs( 0.09 cusr + 0.01 csys = 0.09 CPU)6. If you are content with the test result, just "make install"7. If you want to add your encoding to Encode's demand-loading list (so you don't have to "use Encode::YourEncoding"), runenc2xs -Cto update Encode::ConfigLocal, a module that controls local settings. After that, "use Encode;" is enough to load your encodingson demand.The Unicode Character MapEncode uses the Unicode Character Map (UCM) format for source character mappings. This format is used by IBM's ICU package and wasadopted by Nick Ing-Simmons for use with the Encode module. Since UCM is more flexible than Tcl's Encoding Map and far more user-friendly, this is the recommended format for Encode now.A UCM file looks like this.## Comments#<code_set_name> "US-ascii" # Required<code_set_alias> "ascii" # Optional<mb_cur_min> 1 # Required; usually 1<mb_cur_max> 1 # Max. # of bytes/char<subchar> \x3F # Substitution char#CHARMAP<U0000> \x00 |0 # <control><U0001> \x01 |0 # <control><U0002> \x02 |0 # <control>....<U007C> \x7C |0 # VERTICAL LINE<U007D> \x7D |0 # RIGHT CURLY BRACKET<U007E> \x7E |0 # TILDE<U007F> \x7F |0 # <control>END CHARMAP· Anything that follows "#" is treated as a comment.· The header section continues until a line containing the word CHARMAP. This section has a form of <keyword> value, one pair perline. Strings used as values must be quoted. Barewords are treated as numbers. \xXX represents a byte.Most of the keywords are self-explanatory. subchar means substitution character, not subcharacter. When you decode a Unicodesequence to this encoding but no matching character is found, the byte sequence defined here will be used. For most cases, thevalue here is \x3F; in ASCII, this is a question mark.· CHARMAP starts the character map section. Each line has a form as follows:<UXXXX> \xXX.. |0 # comment^ ^ ^| | +- Fallback flag| +-------- Encoded byte sequence


                            enc2xs -e ...