Back to Silas S. Brown's home page
Web Access Gateway bugs and problems
- 1. About this file
- 2. SSL switching problems
- 3. some URLs could potentially be mis-handled
- 4. ampersands in linked URLs
- 5. Cyrillic stuff (mainly .tbl sortout)
- 6. Update scripts need fixing
- 7. Non-standard colour settings get lost
- 8. Javascript status lines prevent hover background colour
- 9. xhtml stuff
- 10. Need to handle links (and ALTs) that say "here" or "click here"
- 11. Need something about DOCUMENTS IN CAPITALS
- 12. Accessing MIDI etc
- 13. Flash and strings
- 14. Collapsing newlines
- 15. Embedded stylesheets need URL redirection
- 16. CSSA links "news/news"
- 17. Japan2001 viewer.html is wrong
- 18. Need to move to XML (when translators finished)
- 19. ISO decoding bug
- 20. Need a "preferred image style" option
- 21. Using Chieko's proxy
- 22. Replacing BODY tag interferes with scripts (and colours)
- 23. asahi.com images without HEIGHT and WIDTH
- 24. Image server reliability
- 25. Image server speed
- 26. Image server needs more GIFs
- 27. Unicode stuff
- 28. ".gif" isn't always a GIF (can leave gateway)
- 29. Extracting URLs from OPTION VALUE
- 30. Stylesheets / line spacing etc
- 31. iMode and Access Gateway - Notes
- 32. Stylesheets that say display:none
- 33. lynx -trace
- 34. Multipart form encoding
- 35. Cookies minor things
- 36. border=0 to CSS ?
- 37. Finish/document NONLOCAL_PASSWORD
- 38. proper framesets
- 39. Plugins
- 40. More encodings trouble (dating from 1999)
- 41. Tables/Unicode stuff (old)
- 42. More mapping tables todo (old)
- 43. From later-on.txt
- 44. Grep NEEDATTENTION etc
- 45. Korean etc
- 46. Link to home page should go through gateway
- 47. Misc (old) (some may have been fixed)
- 48. Background hover colour needs help text
- 49. Colours needs an "other" button
- 50. Spacer removal doesn't remove <p>
- 51. Inline help errors
- 52. Doesn't work well with online email providers
- 53. more options compression
- 54. banner split bug
- 55. PC HK/TW symbols
- 56. Segfault on bogus HTML
1. About this file
Back to contentsThe following is a list of some outstanding gateway bugs, in no particular order. It is mostly in terse note form. The numbering is subject to change.
2. SSL switching problems
Back to contentsRedirect to ssl version when TYPE https url - do it with a Location directive (if CAN_SWITCH_SSL is defined in platform.h)
Also, getting images over non-SSL (in an SSL page) is a potential privacy compromise if unauthorised persn is snooping the net (& someone cld compromise _integrity_ of SSL pages by chg the char images) - document or fix
(but no big problem because the browser should warn anyway)
3. some URLs could potentially be mis-handled
Back to contentsprotocol://user:pass@host:port - the user:pass bit might sometimes be incorrectly handled (might matter if someone encodes their links like that)
4. ampersands in linked URLs
Back to contentse.g. on http://www.jython.org/cgi-bin/faqw.py?req=index (try follow a link; doesn't work until you press OK) (should probably re-write them somewhere)
5. Cyrillic stuff (mainly .tbl sortout)
Back to contentsThe gateway does not recognise the ISO designators for Cyrillic, Esc - L and Esc - A. This is because I don't know which ISO designator goes with which code page.
fread /other/nobackup/*.count (and *.freqtbl) into an array of ints; find max, max-1 etc; top N in reverse order (to 4d?)
(tbl's: try using .py prototype to get them into text 1st. vice versa?)
M.T. sent these URLs http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK.html http://web.kyoto-inet.or.jp/people/tomoko-y/biwa/wnn/iso-2022.html
- Get a frequency table for Cyrillic - Improve auto-detect code (maximise chars that fall within the highest-frequency range?) - Rename "DOS Russian" - add Cp866 - IBM - Cp855 - KOI8-R - some errors in the table; see http://koi8.pp.ru/utf-8.koi8-r.htmlu - koi8.pp.ru/koi8-r_unicode.txt - What I will try to do is get the mapping tables into a human-editable form. Then if you like you can edit them. But it may be some time before I can do that. - charset= stuff (need alias table) (modify .tbl files? or do it separately) - [ CU Slavonic & East European Society ] ; [ [CU Yugoslav Society] (about 40 members on soc-cuyu) ]
/usr/share/i18n/charmaps could be useful
6. Update scripts need fixing
Back to contentsThis file (gateway.bugs) is translated to HTML & updated by the website update script; it should be done by the gateway update script (to pageroot) like the help file is.
Maybe have "The latest version is N.N.N" at top of
access.html (use htp.def?) and rsync it (or Makefile - rsync
won't work due to date stamp problems)
7. Non-standard colour settings get lost
Back to contentsIf you put a non-standard colour in the URL and then select the "colours" button, it gets lost because it is not one of the options. Maybe if none of the options match the current value, add a new one that does (quoting the HTML figure or something).
8. Javascript status lines prevent hover background colour
Back to contentsMaybe add something to onMouseOver and onMouseOut
9. xhtml stuff
Back to contentshtml2xhtml ok but script problm (do *after* proc) (ok for now...) (or just hack it - "write out the comments inside script, *maybe* w/out <!-- -->") (lower pri: put <html> </html> in if not already there) (do we get the ?xml? thing, + this, into mytest itself?) Also it would be nice to upgrade the HTML spec to 4 (esp. tables) (lower pri: integrate it with the C++ HTML filter, & remove the code that's made redundant by it)
10. Need to handle links (and ALTs) that say "here" or "click here"
Back to contents(Apart from all those links that say "here" - if a blind person, or a mobile phone user, is trying to get a summary of a page by getting the computer to just output the links, they get "here, here, here". Not to worry - one of these days I'll add an option to my web mediator to handle them.)
Meaningless ALT tags ("Click here!") -
http://www.fujitsu.co.jp Gateway: Need to do something about entire sentences being
in capitals (make them title case instead) (but leave
acronyms etc alone)
Option to strip width & height from embed?
<embed src="x.mid" width=2 height=0 autostart=true loop=true>
swf: As predicted the content was minimal, but I could extract
the links to the other subpages which was sufficient to obtain
the required information.
> Without naming the guilty parties I've encountered a fair number
of such sites on the Univeristy Societies webserver.....
Lois: If you just want to get the bare text out of a Word doc,
the Unix 'strings' command can be useful. At least on the few
that I've tried.
Feb 9 09:45:24 ssb22 /usr/sbin/imgserver: Error 404 on URL "head"
pc358.nmus.pwf.cam.ac.uk - - [09/Feb/2001:09:45:22 +0000] "GET
/cgi-bin/access?Ac=A&Au=http://perch.tripod.co.jp/ HTTP/1.1"
200 11888 "http://ssb22.joh.cam.ac.uk/cgi-bin/access?Ac=A&Au=http://www.nsknet.or.jp/~m-saito/index.htm"
"Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)"
<ul style="line-height: 150%; list-style-image: url('images/headline01.gif');
margin-left: 35%">
Also: "2000-11-22: NEEDATTENTION"
(stuff that may break Shift-JIS & UTF-8)
Before "alternative base URL": "Preferred image style" (default,
Simplified, Traditional, Korean) (if MULTIPLE_STYLES_SUPPORTED)
ENV_PREFERRED_STYLE needs documenting and adding to the UI
For Traditional, Aeus=t
NB a blank value is OK (default)
ALLOW_USER_PROXY_SETTING maybe ?
Bug: body onLoad doesn't get executed on "enable scripts" (since
body is replaced)
See also NEEDATTENTION in access.c++ / "BODY" re colour override
(mixing author's and user's)
imgserver
Might have been an alarm clock - socket had been registered to
listen, so OS accepted it, but waiting for it to get back to
select()
Blocking write etc?
watch the japan2001 imgserver
Your home directory was unavailable (due to a server
upgrade), hence all the messages. You invoke a cron job
once every minute, so your home directory was probably
inaccessible to Nexus for 142 minutes.
Your "cron" job on nexus
./isitup localhost || (pkill imgserver ; ulimit -n 1024; ./imgserver)
produced the following output:
Alarm Clock
Terminated
(is it "isitup" that does this?)
yes, 10sec timeout
(Does it get stuck anywhere? gdb???)
Sometimes runs out of quota
Compress the data file ? (zcat) (careful...) (or include portable
decompression source..)
DONE added expires and last-modified (does make a difference!)
Ignore net/khttpd (buggy & kernel crash!)
Got "ab" - Apache HTTP server benchmarking tool
/usr/sbin/ab -k -t 60 -c 10
Image server:
Requests per second: 249.74
Transfer rate: 65.45 kb/s received
Apache:
Requests per second: 1045.35
Transfer rate: 3452.46 kb/s received
50 times faster !? Get a profile !
/usr/sbin/ab -k -t 60 -c 10 http://ssb22.joh.cam.ac.uk:7080 From flevit:
Image server:
Requests per second: 74.82
Transfer rate: 19.62 kb/s received
Apache:
Requests per second: 53.47
Transfer rate: 176.63 kb/s received
Still 10 times higher transfer rate, but requests/sec not much
higher (other thing could be a localhost thing)
See "Unicode" section re getting them
Unicode imgs - they're proportional!
zcat -f /var/log/syslog*|grep "Error 404"|sed -e "s/.*URL \"//"
-e "s/\"//"|sort|uniq
Chinese stuff: COULD get it from TeX, if can find a way of auto-cropping
the PostScript & cnvt to a bitmap format
gateway & unicode (multiple "spellings" of accent-add etc)
"Filesystem case-sensitivity (was Re: Picking up hermes mail)"
on ucam.comp.linux
[but might be post-Unicode 3.0]
20000..2A719 : 42,778 : CJK Unified Ideographs, Extension B
(These constitute all remaining unencoded ideographs from the
Kangxi
Dictionary, the Han Yu Da Zidian, a set of 6356 characters
from Japan, 908
Hong Kong government characters, 169 characters from Korea,
29,794
characters from TCA in Taiwan, and 4050 characters from Vietnam.)
:
00-Feb-02 Accepted : 00-Sep-25
etc
About the Online Code Charts
These charts are provided as a convenient online
reference to the character contents of the Unicode
Standard, Version 3.0 but do not provide all
the
information needed to fully support individual
scripts
using the Unicode Standard. Proper Unicode support
requires considerably more than providing glyphs
for
characters, and requires consulting the Unicode
Standard and the Unicode Technical Reports.
You may freely use these code charts for personal
or
internal business uses only. You may not incorporate
them into any product or publication, or otherwise
distribute or archive them without express written
permission from the Unicode Consortium.
The information on these pages may be update
from
time to time. The Unicode Consortium is not
liable for
errors or omissions in these charts or the standard
itself.
Blocks
The Unicode Standard divides its codespace into
a
number of blocks.
The chart index contains a table of most of
the blocks;
missing are blocks of unassigned characters,
and
blocks of characters with no visual representation
such
as the surrogate blocks and private use area.
You can
also go to a full character chart for each block
(except
for the Han ideographs and Hangul syllables).
Fonts
The fonts used in these charts were provided
to the
Unicode Consortium by a number of different
font
designers. Note that the glyphs in these charts
are only
representative; there can be wide variation
in the glyphs
used to represent any particular character,
as discussed
in the standard.
SOME mapping tables (Windows): http://oss.software.ibm.com You may embed references to the glyph images on the Unicode site
in your own web pages. For example, to
display a Euro sign (U+20AC) you can use the
following HTML:
<IMG SRC="http://charts.unicode.org The subdirectory to use within the Glyphs/ directory is
the first two hexadecimal digits of the Unicode code
point. The set of glyphs available covers all of Unicode
3.0 with the exception of Han ideographs and Hangul
syllables.
However, you should only make occasional use of
these glyphs. If there is too much web traffic the
Unicode Consortium may be forced to discontinue this
service.
(see source of http://www.unicode.org/charts/web.html for codepoints)
http://charts.unicode.org/unihan/unihan.acgi$0x4E95
(generates links to cached images; not permanent; but URLs quite
regular so hit the main page first and then get the cached images,
only if haven't already got the image)
3400-9FFF and F900-FAFF
ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt
http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html
xmbdfed (package installed)
o Export of XBM files from glyph bitmap editors.
(well, can export to HEX, which can probably be converted)
but mgk25's fonts are wrong sizes
http://czyborra.com/unifont/
(Can we start with a HEAD request if we're using own code?
What about the overhead of having to re-connect if no
keep-alive? etc)
<select name="site" size=1 onChange="javascript:formHandler()">
<option selected value="">Worldwide
sites
<option value="http://www.kingston.com Showcase of Japanese Keitai Culture
http://ssb22.joh.cam.ac.uk/cgi-bin/access?Ac=@& The HTTP User-Agent: header identifies an i-Mode browser with
a string something like DoCoMo/1.0/F50i for the older 501 models,
and something like DoCoMo/2.0/F502i/c10 for the newer models.
The first part of the string says DoCoMo indicating that it is
an i-Mode client. The next part indicates the supported HTML version
number. The third part indicates the device model number
The fourth part, only available on certain 502 models, indicates
the current cache size. As with WAP devices, an i-Mode device
can only accept a certain amount of data in one go. The number
is in kilobytes, and the default size is 5KB.
[Does this include the images??]
Screen size is very small, usually no more than 16 (English) characters
by 6-8 lines.
<HTML> <HEAD> <TITLE>Main MENU</TITLE> </HEAD> <BODY> <FONT COLOR=RED>Main
MENU</FONT> <BR> <IMG SRC=ad_small.gif ALIGN=RIGHT> <A HREF=new.tcl
ACCESSKEY="1">News</a> <BR> <A HREF=addr.tcl ACCESSKEY="2">Directory</a>
</BODY> </HTML>
The ACCESSKEY attribute of a hyperlink provides one-key access
to select and follow the URL, from the phone's numeric keypad.
[but is the number included?]
[Don't have to include this - the recommended implementation does
it by default]
The i-Mode phone terminals do not support HTTP Cookies at this
time.
Note that on i-Mode phones, the password field in the HTTP authentication
dialog box which pops up only supports entry of numeric passwords.
Authorization: header is present. If not, it issues a WWW-Authenticate
challenge
WWW-Authenticate: Basic realm="ACS_iMode"
return 401 "text/html; charset=shift_jis" "please login"
(or "incorrect login")
For compressing HTML (& removing unwanted tags), see http://www.w3.org/TR/1998/NOTE-compactHTML-1998
(table in Appendix A of supported tags & attribs)
Images can be nightmarish (esp. large ones; transferred & scaled
down)
Please ensure each page uses less than 5KB of data volume. (Depending
on the tags being used, some pages cannot be displayed even though
they contain less than 5KB of data.)
We recommend a data volume per page of less than 2KB.
The maximum length of a character string is 200 bytes after URL
encoding.
The maximum length of a URL that can be input directly is 100
bytes.
The maximum length of a URL that can be added to the bookmark
list is 100 bytes.
The maximum length of the title of a page/bookmark is 24 bytes.
i-mode users are responding to banner ads and e-mail advertising
to a far greater extent than standard Web users.
I-mode - the "i" is for information
There is a basic data charge per packet, 0.3 YEN (approx. US-cent
0.3) per data packet transmitted of 128 byte. As an example,
looking at the basic imode-Menu, the standard DoCoMo welcome
screen or user interface, will set you back about 2.7 YEN (i.e.
approx. US-cent 2.7). There are no connection time charges for imode.
In addition there are other charges for using email and for premium
subscription services.
imode emails have to be shorter than 250 Kanji (double byte characters),
or shorter than 500 Roman Characters (single byte characters)
The default email address of imode users is 090xxxxxxxx@docomo.ne.jp,
where "090xxxxxxxx" is the mobile telephone number.
For example  is an icon of a sun shining.
SJIS+imgs; Remove all images (ALT?); disable status line scripts;
don't add "end of web page"; don't put [ ]; don't show date stamp.
Also: Don't add TITLE= to any HR; don't add META tags; compact
space; ’ to '; compress the options; MAYBE compress Au=
in some other way as well (besides removing http://); remove
things like <b> <i> etc that are not supported (don't use colours
instead - it will drive the size up)
geometry is really 16x7, but lynx margins take up 4 more lines
xterm -geometry 16x11 -e lynx -nocolor -nopause -noreverse -nounderline
(formatting can be bad)
Some phones have 20x8 (10 kanji)
xterm -geometry 20x12 -e lynx -nocolor -nopause -noreverse -nounderline
(formatting can be wrong, e.g. centre etc)
in cgilib.c++
CGIEnvironment::tryDecodingMultipart()
See all **** stuff esp. boundary
CONTENT_TYPE=multipart/form-data; boundary=---------------------------10617267281005157210847669114
CONTENT_LENGTH=4339
Input:
-----------------------------10617267281005157210847669114
Content-Disposition: form-data; name="iconid"
5
-----------------------------10617267281005157210847669114
Content-Disposition: form-data; name="message"
test
-----------------------------10617267281005157210847669114
Content-Disposition: form-data; name="A1attachment"; filename="codepoints.html"
Content-Type: text/html
#include .....
-----------------------------10617267281005157210847669114--
Or:
Content-Disposition: form-data; name="A1attachment"; filename="random_seed"
P....\n
Cookies: Need to default the domain (not to everything!) when
setting
(although this would increase the size of the URLs...)
Note: %26 (&) and %3D (=) seem to occur a lot in the cookie -
better % compression system ? (%m for aMpersand and %q for eQuals
? Somehow code all ASCII that must be %-escaped?) (watch we
don't send this fancy stuff to remote servers!)
How did the cookies get so big anyway?
The problem seems to be unique to Yahoo
Edit the cookies on the form??
Gateway cookies: Should be OK, because not getting *image* cookies.
4K URL limit is a worry! (when user not supporting cookies)
(Temp: clear all cookies when reaches maximum size? cut down?)
(really carry cookies when no longer browsing their source
domain? e.g. search engine cookies)
All FORMS: METHOD="post" (careful; some browsers put
warnings up)
Old notes (might no longer need them) -
<EMBED src="$FILENAME$" width=$WIDTH$ height=$HEIGHT$ type="application/x-Sibelius-Score"
alt="$FILENAME$" codebase="http://www.sibelius.com codebase and pluginspage should now be substituted
Netscape: Takes codebase and goes ?application/whatever,
ignores pluginspage, changes msg to "click here after
installing". How do you get (or prevent) the adverts
window?
Chinese table problem
Chinese table in Japanese?
Have commented out the
pinchATable("Cp33722","IBM eucJP/5050",f,0);
- REALLY returns max bytes =3 (in EUC)
Need better decompilation
Cp964 (AIX TW) really is 4 bytes max; need to sort out
& comment back in
Need to sort out
//if(neverBelow127) throw(new IOException("Didn't expect neverBelow127
to be true here"));
TEContainer.h: Implement void set LOG the charsets (and the detected
results) as people use web pages?
Need to find official list, really
Some of these may be MIME charsets:
iso-8859-1
Shift_JIS
big5
gb2312
euc-kr
euc-jp
windows-1250
windows-1251
windows-1253
iso-8859-9
utf-8
x-mac-roman
x-mac-ce
ks_c_5601-1987 ?
x-gb2312-11
x-euc-tw
x-cns11643-1
x-x-big5
...
HZ-GB-2312
o iso-2022-jp (see Section 3.1.3)
o iso-2022-jp-2 (see Section 3.1.3)
o iso-2022-kr (see Section 3.1.4)
o iso-2022-cn (see Section 3.1.5)
o iso-2022-cn-ext (see Section 3.1.5)
o iso-8859-1
ISO- ?
-[0-9]?
- UCS-2 0x6F22 0x5B57
- UCS-4 0x00006F22 0x00005B57
UCS-2: FEFF, also escape sequences
(Level 3 = supports all characters)
UCS-2 Level 1 <ESC> % / @ 0x1B252F40
162
UCS-2 Level 2 <ESC> % / C 0x1B252F43
174
UCS-2 Level 3 <ESC> % / E 0x1B252F45
176
JIS X 0221-1995 == ISO 10646-1:1993
(based on Unicode 1.1)
See ftp://ftp.tiu.ac.jp/jis/
re JIS X 0213-199X etc
Also get ISO sequences for all the other encodings
(+ MIME charset etc)
MISSING: <OPTION VALUE="Cp1125">Ukraine: IBM PC</OPTION>
EXTRA: pinchATable(new CharToByteCp856());
Cp33722 and Cp942 need Yen substitution
<OPTION VALUE="Cp037">Misc: CP 037</OPTION>
<OPTION VALUE="Cp437">Misc: DOS 437</OPTION>
<OPTION VALUE="Cp850">Misc: DOS Latin-1</OPTION>
<OPTION VALUE="Cp500">Misc: EBCDIC 500V1</OPTION>
<OPTION VALUE="Cp1046">Misc: IBM EBCDIC</OPTION>
<OPTION VALUE="Cp285">Misc: IBM UK</OPTION>
<OPTION VALUE="8859_1">Misc: ISO 8859-1</OPTION>
<OPTION VALUE="8859_2">Misc: ISO 8859-2</OPTION>
<OPTION VALUE="8859_3">Misc: ISO 8859-3</OPTION>
<OPTION VALUE="8859_4">Misc: ISO 8859-4</OPTION>
<OPTION VALUE="8859_9">Misc: ISO 8859-9</OPTION>
<OPTION VALUE="MacDingbat">Misc: Macintosh Dingbat</OPTION>
<OPTION VALUE="MacRoman">Misc: Macintosh Roman</OPTION>
<OPTION VALUE="MacSymbol">Misc: Macintosh Symbol</OPTION>
<OPTION VALUE="Cp1252">Misc: Windows Latin-1</OPTION>
// Need to add more encodings
// See ftp://unicode.org/pub/MappingTables/
Cp936 is GB2312 with some corrections
charset=HZ-GB-2312
(NB One class may have several charsets)
/*
Two-byte ISO codes:
JIS X 0208-1990: Esc & @ before JIS X 0208-1983
@ = JIS C 6226-1978
DONE A = GB2312
DONE B = JIS X 0208-1983
DONE C = KSC5601
D = JIS X 0212-1990
E = ISO-IR-165:1992
(Lunde: "ISO-IR-165:1992 can be considered a
superset of GB 2312-80, GB 6345.1-86, and
GB 8565.2-88" [and more - ssb])
DONE G-M = planes for CNS11643 (1-7)
Look at http://www.fontlab.com/download.htm (check legal permissions)
http://www.unicode.org CJK images + readings: Use all the data on http://charts.unicode.org Complete text file ftp://ftp.unicode.org ftp://ftp.unicode.org Blocks index http://charts.unicode.org void set Unihan database also has definitions
Norway needs a different table from Denmark
Also Finnish & Sweden
"Decoding prior to native decoding" thing: Problem
if HTML sequences are *MEANT*!
Decode only if chars in this range are always in
HTML?
Have a "Language for messages" button?
NB Greek&Russian may be in most CJK,
also Jp in C and K, also C in CJK; also UTF-8 etc
(especially Chinese)
JIS no 'escape' thing? (ie. take $B as Esc $B etc)
Stuff to check:
// NEEDATTENTION Check the following!
if(!shiftOutRequired) isoEncodingInUse=-1; // So resets immediate
ones at end of line
// *** Need to sort out misc folder!
// Sort JIS folder
Packages rsync sftp
<EMBED SRC="jsb.mid" HIDDEN=true AUTOSTART=true>
Remove (or don't) HIDDEN and AUTOSTART
-> Allow background music to start automatically
Images: Give the button text as images
before the button!
Also need to write SELECT as RADIO
problemExtentions[] could be more elegant / less
storage etc
Spam trap: Would it be better with sleep?
<HTML lang="fr">
<EM lang="ja">some Japanese</EM>
<P lang="es">...Interpreted as Spanish...
<P>...Interpreted as French again...
<ABBR title="Idaho">ID</ABBR>
<ACRONYM title="World Wide Web">WWW</ACRONYM>
(have an acronyms dictionary?)
Black = #000000 Green = #008000
Silver = #C0C0C0 Lime = #00FF00
Gray = #808080 Olive = #808000
White = #FFFFFF Yellow = #FFFF00
Maroon = #800000 Navy = #000080
Red = #FF0000 Blue = #0000FF
Purple = #800080 Teal = #008080
Fuchsia= #FF00FF Aqua = #00FFFF
In the near future, browsers will display grouped lists with expanding
and
collapsing levels of detail. To group items, use the OPTGROUP
element (with
the SELECT element). For example:
<FORM action="http://somesite.com The new FIELDSET element groups form controls while the LEGEND
element
labels each group. For example,
<FORM action="http://somesite.com/adduser" method="post">
<FIELDSET>
<LEGEND>Personal information</LEGEND>
<LABEL for="firstname">First name:</LABEL>
<INPUT type="text" id="firstname" tabindex="1">
<LABEL for="lastname">Last name:</LABEL>
<INPUT type="text" id="lastname" tabindex="2">
...more personal information...
</FIELDSET>
<FIELDSET>
<LEGEND>Medical History</LEGEND>
...medical history information...
</FIELDSET>
</FORM>
Give each frame a title
IFRAME as well as FRAME
Provide alternative text for all image
submit buttons
<INPUT TYPE="image" SRC="bobbylogo.gif" ALT="The bobby logo" WIDTH=200
HEIGHT=200>
Option for button with show URL
Cache-Control: no-cache
Pragma: no-cache
Expires: 0
<META HTTP-EQUIV="Window-target" CONTENT="_top">
Options:
Content-language: en-GB
Window-target: _top
<META HTTP-EQUIV="Set-Cookie"
CONTENT="cookievalue=xxx;expires=Friday, 31-Dec-99 23:59:59 GMT;
path=/">
Can we put <PRE> around text/plain ?
export LS_COLORS=''
alias ls="ls --color=auto"
export PS1="\h:\W\\$ "
(caps W for last part of dir only)
COMPILER_USES_LSB_MSB_INTS:
Perhaps create alternative versions of data files for
other compilers, add to installation instructions
(plus a test)
Haven't tested that it produces the same output!
Sort L_NO_FREQTBL out
arrows consisting of dashes and greater-than signs
--> etc
Korean:
What about ISO-2022-KR and EUC-KR?
And ISO646? QP? How do they relate to
KS-C-5601?
Unreproducable bug report - Korean pages looking like
Japanese - suspecting wrong language selection
gateway: If <FORM> and </FORM> does not match in "banner",
DO NOT MOVE IT!!!
(eg. http://access.adobe.com HTML4 forms can have "disabled" controls
- option to remove them?
Error: Failed to find help text for option Aefn~ssb22/mytest
<meta http-equiv=Content-typecontent="text/html; charset=utf-8">
(i.e. if there is a missing space before 'content')
11. Need something about DOCUMENTS IN CAPITALS
Back to contents
12. Accessing MIDI etc
Back to contents
13. Flash and strings
Back to contents
tracttext.cc, libz (zlib1g-dev)
Chris Lightfoot (saved in MiscStuff)
non-HTML and plug-in sort out; strings; swf thing (sep CGI? system
command??); http://www.flashgallery.co.uk/ source
14. Collapsing newlines
Back to contents
natwest.com sortout (collapse newlines option) :
<p><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
<p>If you have remained on this page ...
15. Embedded stylesheets need URL redirection
Back to contents
Access gateway bug (embedded stylesheets need URL redirection!)
(plus mention it in the presentation)
16. CSSA links "news/news"
Back to contents
cssa: can we pick up on these and get down to one word?
news/news
about/about
activity/activity
17. Japan2001 viewer.html is wrong
Back to contents
lynx -source http://www.embjapan.org.uk
18. Need to move to XML (when translators finished)
Back to contents
Might be message drift - check carefully (incl. help.htm)
19. ISO decoding bug
Back to contents
This hit the "invalid ISO designator?" thing but it was a space
encoded in ISO-7 or something
27 44 65 32 27 40 66
Esc , A space Esc ( B
20. Need a "preferred image style" option
Back to contents
21. Using Chieko's proxy
Back to contents
22. Replacing BODY tag interferes with scripts (and colours)
Back to contents
23. asahi.com images without HEIGHT and WIDTH
Back to contents
asahi.com: Images without HEIGHT & WIDTH causes Netscape to
load *all* images before displaying any of the page
24. Image server reliability
Back to contents
Check monash & japan2001 img server stats from time to time
25. Image server speed
Back to contents
server.c++ HTTP/1.1 pipelining (o/p buf retry, watch max size
[but could just drop connection when sent current lot], etc)
(How many browsers/proxies/etc implement this anyway?)
(IE *might*)
26. Image server needs more GIFs
Back to contents
(add other gifs 1st; get through Cam proxy;
transformations; remember decompress)
27. Unicode stuff
Back to contents
Unicode has now gone beyond 16-bits (slides need update)
28. ".gif" isn't always a GIF (can leave gateway)
Back to contents
Oh dear, this leaves the gateway:
http://www.askntl.com/adverts/adverts.asp?url=/telephone/great-value-calls/default.asp&image=/adverts/468by60/phone-bill.gif
29. Extracting URLs from OPTION VALUE
Back to contents
kingston.com mirrors navigation:
30. Stylesheets / line spacing etc
Back to contents
line spacing etc (stylesheets? gateway "spacing" button??
with text explaining it's only CSS-aware browsers)
P {word-spacing: 10px}
P {letter-spacing: 5px}
P {line-height: 12pt}
31. iMode and Access Gateway - Notes
Back to contents
"Access" oops: [Access Systems America] - http://www.access-us-inc.com/
Provider of a microbrowser which is used in many I-Mode devices.
32. Stylesheets that say display:none
Back to contents
gateway.bugs: <DIV ID="incoming" STYLE="display:none">
(means don't display; stripping the STYLE will cause it to do
so. + don't strip content if JavaScript enabled.)
(Do we really want to take out this text though? But at least
count
it as a banner? Option???)
33. lynx -trace
Back to contents
lynx -trace: outputs stuff to a file called Lynx.trace
34. Multipart form encoding
Back to contents
---------------------------827779986791670271271312593
-----------------------------827779986791670271271312593
35. Cookies minor things
Back to contents
aftr don't store remote session IDs, have "store remote session
IDs even across servers" (default No)
36. border=0 to CSS ?
Back to contents
border=0 css ? (can it be done? Which browsers need border=0,
do
they all have CSS support, etc)
check "s
37. Finish/document NONLOCAL_PASSWORD
Back to contents
(see localusr.c++)
NB insecure etc (unless using SSL, and even then, watch Location
box,
cache, history, etc)
38. proper framesets
Back to contents
gateway proper framesets (gateway.bugs? fair amnt of coding) (but
jp etc)
If a certain var is present, instead of charset, URL box, date
stamp,
etc (or rewind once done), have "[Expand this frame]" (no BR)
Or just call the string [Options]
It links to the page with the var clear & target=_top
Put var in when doing a FRAMESET
Keep it when doing a link iff name is not _top (& it's already
present)
(may still fail if new NAMEs for new windows, but not to worry
- cn
still get a "expand this frame")
39. Plugins
Back to contents
Plugin: file.swf
[Enable plugins] [Extract text]
(& links; use swf code if necessary)
<P>(plugin: jsb.mid [download] [activate plugins] [hide
plugins])</P>
40. More encodings trouble (dating from 1999)
Back to contents
41. Tables/Unicode stuff (old)
Back to contents
Problem: ¦Y
42. More mapping tables todo (old)
Back to contents
See mapping tables
checkConverter - expand to USE the other mappings
43. From later-on.txt
Back to contents
ssb22:/tmp/brian$ export RSYNC_RSH=ssh
ssb22:/tmp/brian$ rsync -v silas@brian.accu.org:
44. Grep NEEDATTENTION etc
Back to contents
45. Korean etc
Back to contents
Japanese frequency table!
46. Link to home page should go through gateway
Back to contents
The link to the gateway's home page should probably go
through the gateway (but what if the installation is not
working?) Also help.htm links (and it needs more
processing).
47. Misc (old) (some may have been fixed)
Back to contents
favicon.ico redirect to loc of original page ???
48. Background hover colour needs help text
Back to contents
Document AecL (background hover colour) & link into options
(NB say "(read help)")
uses css
In some browsers (e.g. some versions of Konqueror), you have to
also select ``Don't add status line code to links'' (under the
Options button) for this to work.
49. Colours needs an "other" button
Back to contents
- with larger selection of colours? (how organised? rows?)
or some sort of selector?
50. Spacer removal doesn't remove <p>
Back to contents
<p> lots of times is left intact
51. Inline help errors
Back to contents
Error: Failed to find help text for option AeI
52. Doesn't work well with online email providers
Back to contents
email providers PRE, NOBR (zh chars). Also TEXTAREA
53. more options compression
Back to contents
"=on" in the checkbox options can just be "=" in the links,
nothing in the cookies, and "value=1" in hidden form options
54. banner split bug
Back to contents
gateway bug:
http://dmoz.org/cgi-bin/add.cgi?where=Computers
55. PC HK/TW symbols
Back to contents
sometimes detects PC HK/TW rather than Big5 - no great
problem (a few symbols don't display, e.g. cdot (u+2022)
sometimes rendered as u+2027 and image not available).
Might want some kind of detection bias but it won't be easy
(really want a fuzzy logic system of some sort)
56. Segfault on bogus HTML
Back to contents
gateway sig-11 faults in strlen in HttpHeader::readHttpEquivs()
when document HEAD has the following bogus tag:
All material © Silas S. Brown unless otherwise stated.
Apache is a registered trademark of The Apache Software Foundation.
Javascript is a trademark of Oracle Corporation in the US.
Mozilla is a registered trademark of The Mozilla Foundation.
PostScript is a registered trademark of Adobe Systems Inc.
Sibelius is a registered trademark of Avid Technology, Inc. or its subsidiaries.
TeX is a trademark of the American Mathematical Society.
Unicode is a registered trademark of Unicode, Inc. in the United States and other countries.
Unix is a trademark of The Open Group.
Windows is a registered trademark of Microsoft Corp.
Any other trademarks I mentioned without realising are trademarks of their respective holders.