Server IP : 85.214.239.14 / Your IP : 18.117.91.116 Web Server : Apache/2.4.62 (Debian) System : Linux h2886529.stratoserver.net 4.9.0 #1 SMP Tue Jan 9 19:45:01 MSK 2024 x86_64 User : www-data ( 33) PHP Version : 7.4.18 Disable Function : pcntl_alarm,pcntl_fork,pcntl_waitpid,pcntl_wait,pcntl_wifexited,pcntl_wifstopped,pcntl_wifsignaled,pcntl_wifcontinued,pcntl_wexitstatus,pcntl_wtermsig,pcntl_wstopsig,pcntl_signal,pcntl_signal_get_handler,pcntl_signal_dispatch,pcntl_get_last_error,pcntl_strerror,pcntl_sigprocmask,pcntl_sigwaitinfo,pcntl_sigtimedwait,pcntl_exec,pcntl_getpriority,pcntl_setpriority,pcntl_async_signals,pcntl_unshare, MySQL : OFF | cURL : OFF | WGET : ON | Perl : ON | Python : ON | Sudo : ON | Pkexec : OFF Directory : /proc/self/root/usr/share/doc/libhtml-parser-perl/ |
Upload File : |
TODO - Check how we compare to the HTML5 parsing rules - limit the length of markup elements that never end. Perhaps by configurable limits on the length that markup can have and still be recognized. Report stuff as 'text' when this happens? - remove 255 char limit on literal argspec strings - implement backslash escapes in literal argspec string - <![%app1;[...]]> (parameter entities) - make literal tags configurable. The current list is hardcoded to be "script", "style", "title", "iframe", "textarea", "xmp", and "plaintext". SGML FEATURES WE WILL PROBABLY IGNORE FOREVER - Empty tags: <> </> (repeat previous start tag) - <foo<bar> (same as <foo><bar>) - NET tags <name/.../ MINOR "BUGS" (alias FEATURES) - no way to clear "boolean_attribute_value". - <style> and <script> do not end with the first "</". MSIE bug compatibility - recognize server side includes as comments; <% ... %> if no matching %> found tread "<% ..." as text - skip quoted strings when looking for PIC