Why VS 2003 keeps changing your HTML and what you can (and cannot) do to it.

Why VS 2003 keeps changing my HTML when I switch beween Design and HTML views? Is it a bug? Can it be fixed in a Service Pack?

We get bugs opened by PSS folks on customer complaints pretty much every other week. What's the problem? Here is a bit of history. Back in Visual InterDev 6.0 (it was version 2 or VID, actually), we started using Internet Explorer core engine (MSHTML.DLL, aka Trident) in editing mode for our Design View. That was IE4 at that time. Why did we do that? Well, HTML rendering is not a simple piece of code, especially when you add styles. Writing our own editor compatible with IE rendering would be prohibitevily expensive. FrontPage Editor did not match IE4 at that time and we needed design surface that would be WYSIWYG with IE. It had to support everything that IE did.

Now lets think about how browser works. It obviously has a parser. Most parsers are based on tokenizer (lexer) and some sort of grammar analyzer (yacc). What is the first thing pretty much every tokenizer does? It discards all the whitespace, indents and carriage returns since for the language syntax they are irrelevant. Even when lexer does keep the whitespace (such as in textual content), tokens typically do not carry information where they were located in the original document. The stream of tokens and the resulting element tree does not keep relation to the original source file. Hence, when you switch views and MSHTML.DLL persists HTML back to the text document, it does not keep the original formatting since knowledge of it is long gone. It simply rewrites the document. Capitalization changes, your formatting and indentation are gone. You can observe the same effect in Web Matrix, which uses MSHTML.DLL as well in its raw form.

VS 2003 actually has a piece of code that attempts to match new HTML to the old one and in many cases it does relatively good job. There are many cases though, when it doesn't. Same piece of code exists in VID6 and VS 2002. It was improved between VID6 and VS 2002, but remains pretty much unchanged in VS 2003. VS 2002/2003 somewhat mitigates the issue a bit by applying pretty formatting each time you switch views. The problem is that the formatting has very few options that you can customize. You do can switch formatting off in Tools | Options, but it will not solve the underlying issue, it will only switch off pretty formatting. You can ease your pain a bit by installing HTML Tidy or any other third party HTML code formatter that is very customizable and tweak it to your taste. You can then hook it up to VS so HTML Editor will use your formatter instead of the internal one. Have a look at this article. Todd works in my team, btw. Even that VS will continue reformatting your code on every view switch, it will be doing it to your taste so you may perceive that formatting actually does not change.

We deliberatey decided to stop tweaking the old code so we changed pretty much nothing in VS 2003. Instead, we abandoned old approach and invested our time and resources into development of completely different piece of code that should be able to preserve user formatting all the time, no matter what. The result is what you see in Whidbey. We hope you like it better. The basic idea was to stop trying to restore formatting in the new HTML and instead detect incremental changes. In Whidbey we never directly use the HTML that MSHTML outputs. Instead, we transfer changes from it to you document. Therefore, if you only changed one attribute, only its new value will be applied to the original file, everything will be left alone. You now can probably guess that it cannot be retrofitted into VS 2003 since it would require significant changes in the code base, well beyound what is typically acceptable for service packs.

Now the last question you might have: why IE team could not fix the issue? They could, but it would hurt browsing performance and would make HTML element tree much larger since it would need to store all the whitespace information. There are multiple issues with whitespace such as figuring out where it belongs and should it be removed when element is deleted or should it be moved with the element when it moves in the tree. Since download size and speed of opening pages were very important, the idea did not get through.

Comments

  • Anonymous
    May 16, 2004
    Guess what, rendering depends on the way html source is formatted. Quick example:

    xx.html:
    <table cellpadding="0" cellspacing="0" border="0">
    <tr><td bgcolor=blue>
    <img width=40 border=0 height=40 src="image.gif">
    </td></tr>
    </table>

    yy.html:
    <table cellpadding="0" cellspacing="0" border="0">
    <tr><td bgcolor=blue><img width=40 border=0 height=40 src="image.gif"></td></tr>
    </table>
  • Anonymous
    May 16, 2004
    I mean, if it is not clear readable in the comment - the second example has its tr element written in one row, and the first does not (seems extra spaces - .text's fault).
  • Anonymous
    May 16, 2004
    That's why Whidbey code formatting options respect whitespace rendering rules and ignore settings that will affect rendering. TD is one good example.

    However, there are cases when people don't care about couple of additional spaces even in TDs so we are still debating if we should add a switch. Something like 'ignore whitespace significance'.
  • Anonymous
    May 16, 2004
    There are tabs or spaces in td and img in my post, they are not supposed to be. The only difference between examples is carriage return in the 1st source. In the second example tr is written in one row.
  • Anonymous
    May 16, 2004
    My bug doesn't have anything to do with the layout. When you switch back from design view to markup view, the editor will attempt to fill in any missing elements it find. For example closing tags. When it fills in the missing tags, they are always in upper-case, even if you have explicitly set "lower case tags" in the HTML options for VS.

    Filling in the missing elements isn't a big deal, well it is if you are creating user controls containing HTML fragments (say the top 1/2 of an HTML table) but the fact that it ignore the user preferences is troubling. What other options is it ignoring? Why is it ignoring the options at all? Is it a bug in the code that reads the options or does the HTML formatter just forget to check the options?

    Changing the formatting is annoying, changing the case can break you HTML if you'r trying to write XHTML or using a doctype like strict-4.0.
  • Anonymous
    May 16, 2004
    May be a bug or, as I said, it might be one of the cases which is not covered in the VS 2003 and earlier formatting preservation. Strictly speaking, if one has to write a bunch of new code to fix an issue, it is dfficult to qualify it as a bug :-(

    VS 2003 is not XHTML compliant anyway, it will 'fix' <BR/> and make it <BR>.

    I would recommend trying to hook up HTML Tidy.
  • Anonymous
    May 16, 2004
    http://cadred.net/blog/archive/2004/05/16/161.aspx
  • Anonymous
    May 16, 2004
    great open post !
  • Anonymous
    May 16, 2004
    Excellent post.

    I've been wondering about this for a long time and already heard the legacy-code-issue reason. Getting the full scoop like this is why I read the Weblogs.

    And yes - this has been fixed in VS.NET 2005? Right? :)

    >S
  • Anonymous
    May 17, 2004
    Right. It should be much, much better now. Please try Whidbey and tell us if that is not so. My team owns the issue, we really want to fix this and there is still time.
  • Anonymous
    May 17, 2004
    I stopped using design view altogether because of this issue. Now the issue I run into is when VS.NET 2003 alters my code on copy/paste.

    #1 It always lowercases the <!DOCTYPE> tag for some reason, no matter what case the original tag was.

    #2 When I copy/paste <table> and <form> tags it automatically inserts and ID into it.

    #3 When I copy/paste code with IDs that are in the current document, it totally overwrites them, instead of squiggly-red underlining them for me to change.

    It'd be nice if VS.Net could read the <!DOCTYPE> and validate according to that instead of using whatever arbitrary way it validates stuff now. And it could use the current <!DOCTYPE> to show the correct attributes for that version (and show all when in quirks mode).
  • Anonymous
    May 17, 2004
    #1 is fixed in Whidbey

    #2 is also fixed. In previous version we used to auto ID everything that is scriptable. Now we only autoid elements that already have ID attribute.

    #3 is still there (see #2). I will file a change request, but there is no guarantee it will get into the product. However, there are two ways we can provide customization: Tools | Options and registry keys. If tools/options won't have the option in UI, will you agree to tweak a registry key?
  • Anonymous
    May 17, 2004
    As far as #1 and #2, nice work.
    #3 Yeah, I'm fine with the registry hacking, just as long as I can find it easily ;-)
  • Anonymous
    May 18, 2004
    Hi

    VS 2003 cannot color .shtml files (as html). In order to make it work I need to patch the registry :(
  • Anonymous
    May 18, 2004
    Anatoly, this is fixed in Whidbey.
  • Anonymous
    May 20, 2004
    The comment has been removed
  • Anonymous
    May 20, 2004
    The comment has been removed
  • Anonymous
    May 20, 2004
    Talking about the way IE handles the HTML. If I create a XHTML page and via client side code, output the body.innerHTML, it is nolonger the same. all attributes that dont have special characters loose their quotations and all tags are capitalized.
  • Anonymous
    May 20, 2004
    Another annoyance is the schizophrenic Page/Register directives that VS simply LOVES to flip flop... causing a lot of UNNECCESSARY VSS check out requests
  • Anonymous
    May 20, 2004
    Does Whidbey bring back the functionality in VS.NET 7.0 where you could view ATL Server stencil files (SRFs) in Design View? That was removed in VS.NET 7.1 due to some unspecified bugs with rendering...I never ran into it but I REALLY miss being able to view my HTML in design view in the SRF files.
  • Anonymous
    May 20, 2004
    To Barry:

    Whidbey is XHTML compatible, so <img /> stays as such. Actually, we generate XHTML by default, so if you drop a button from toolbox, you'll get <input type="button" />. Speaking about validation, we do provide XHTML 1.1 Strict and XHTML 1.0 Transitional schemas. However, another option is to open XHTML file in the new XML editor that is based on System.XML and is able to validate against DTD so you can directly use W3C DTDs if you wish.

    To Eric: I believe this is not an issue anymore (inWhidbey)

    To Todd: no, it doesn't. Unfortunately, ATL Server team has chosen {{ }} syntax that standard HTML parser such as one employed by IE does not recognize. At best you'll see {{ }}, at worst you may lose them. Can you elaborate a bit more how do you expect Design view to render {{ }} expressions?
  • Anonymous
    May 22, 2004
    I'm not asking for it to render what is in the tag handler {{ }}, what I'm asking is for the editor to allow me to view my SRF in Design View. WIth VS.NET 2003 (7.1), the ATL Server team REMOVED the ability to switch from code and design view for SRF files, even though they are standard HTML markup. Before, with VS.NET 2002 (7.0), you could view your SRFs in Design View, and they would just put the tag handler inline with the HTML. But this functionality was removed with 2003. You can't switch to Design view at all anymore, it says that the editor won't allow SRF viewing. Which means I have to create/edit my SRFs outside of VS.NET, which is ridiculous.
  • Anonymous
    May 22, 2004
    The comment has been removed
  • Anonymous
    May 23, 2004
    To Todd: I found VS 2003 bug that caused us to disable Design view for ATL Server files. Actually, it was my team (and me personally who disabled it). Here what the bug says:
    ----------------------------------------------------
    Repro:
    1. open a .srf page in the editor with srf tags that include such things as parameters that include a long database connection string that cannot have newlines inserted into it
    (opens in design view by default)
    2. switch to HTML view. Notice that the file contents have changed
    3. deploy the .srf file and try to access the page from a browser

    Actual:
    not very useful HTTP 500 error message

    Expected:
    Autoformatting not to break .srf page functionality or to be turned off by default.
    ----------------------------------------------------

    However, it seems the issue may not apply anymore since a) we do not autoformat on view switch and b) new formatter is better than the old one. I'll see if we can enable Design view back.

    BTW, VS 2003 workaround is to change SRF extension to HTM or copy/paste content between temporary scratch HTML page and the SRF file.
  • Anonymous
    May 24, 2004
    Yes, the workaround is similar to what I'm doing now (viewing and editing via UltraEdit in this case). Not as nice as with VS.NET, since I lose the IntelliSense. I would really like to have it enabled again in VS.NET.

    Any chance of there being a quick reg fix or something I might be able to do with VS.NET 2003 to get this functionality back, even if it is not supported by Microsoft?
  • Anonymous
    May 24, 2004
    The comment has been removed
  • Anonymous
    June 23, 2004
    Can't you make it possible to hook up Mozilla's project HTML parser if they provide you an interface between the calls you do in your code and their's?

    I mean writing HTML for Mozilla brings lesser problems for IE than the other way around.

    Very interesting post anyway. I'll try the HTMLTidy option in the meanwhile.
  • Anonymous
    June 23, 2004
    The comment has been removed
  • Anonymous
    June 24, 2004
    The comment has been removed
  • Anonymous
    June 24, 2004
    BTW the tidy HTML solution offered cannot apply to VS.NET 2003 Standard C# Edition (which I own) because it misses the possibility to create Extensibility Projects. Is there any workaround for that?
  • Anonymous
    June 26, 2004
    The comment has been removed
  • Anonymous
    June 29, 2004
    Good entry.
    Answers one very often asked question.

    Its a lot better to see the same problem aftee you know the reasons than otherwise :)
  • Anonymous
    July 17, 2004
    I think i see waht you are saying,and it useful for me
  • Anonymous
    July 20, 2004
    The problem where you paste a piece of HTML code and the IDE will reset the ID and NAME tags is a real problem. Its bad enough having the VB VS2003 IDE messing up the indenting on all your VB code indenting, but to actually take a piece of code and change it during paste with no option to stop it doing that is ridiculous. We are taking a direct hit on code quality and productivity. There's lots of circumstances where you need two controls with the same ID and you might just be pasting from an old block to a new block (intending to delete the old block later). The last thing you want is your ID and NAME tags reset, especially to something as useless as "Text1"
  • Anonymous
    July 20, 2004
    What about a "leave my code alone" option ? Where the IDE does not change any code under any circumstances - guaranteed. Thats all we want.
  • Anonymous
    July 20, 2004
    Paul, you can switch off VB intenting and pretty formatting in Tools | Options | Text Editor | Basic | VB Specific.

    Autoid is different in Whidbey, it only changes id when element already have one, it never adds IDs. If you want it completely off, please submit feedback on MSDN product feedback site so it gets filed as a bug or work item.
  • Anonymous
    July 21, 2004
    I've been very frustrated with my html being changed.

    Vs removes from .aspx my style tags where:
    width='70px' style='width: 70px;'> 'for Netscape, etc.

    After I compile the style is gone?
    width='70px'>

    Frustrating.

  • Anonymous
    July 22, 2004
    The comment has been removed
  • Anonymous
    July 22, 2004
    I will forward your notes to the VB team, but please file the request on MSDN Feedback site since then it will be entered as official bug. Thanks!
  • Anonymous
    September 07, 2004
    There have been many online discussions about how Visual Studio messes up the formatting of HTML source code, I must admit I have been involved in a few of these, including Mikhail Arkhipov's Weblog for instance. Which explains that the...
  • Anonymous
    June 14, 2005
    This problem is a reall pain. Rumor has it that it's been fixed in VS 2005, but I don't want to have to upgrade just for a bug fix.

    There must be something easier that can be done...
  • Anonymous
    June 23, 2005
    Here is patent that we filed on the method, which we invented in order to solve that old known problem...
  • Anonymous
    July 03, 2005
    Just fix it - please. We don't want to upgrade. The reformatting kills productivity; there is no way this should have been released. An otherwise great product is definitely marred by this problem.

    Remember: coding for the web is all about editing html.
  • Anonymous
    July 05, 2005
    I like the visual effect of Visual Stuido 2003 on my HTML code. However, I am upset with the fact that VS2003 does not keep my original formatting. I try to program every web page in compliance with XHTML, but VS keeps removing my original formatting.

    For example, VS is taking out my closing slash for tags with no end tags (i.e. it changes <br /> to <br>, and <img /> to <img >).

    I think I'm going to switch to Crimson Editor or even Notepad.
  • Anonymous
    July 21, 2005
    The comment has been removed
  • Anonymous
    July 25, 2005
    The comment has been removed
  • Anonymous
    August 05, 2005
    I need your help. I am using IE 6.0 with Windows XP Home Edition. Trying to use HTML Editor under IE to modify a webpage created by another program. I am able to move object on this html page except to animated objects. (I believe they were created with JAVA). Is there a Tools/Options/Advanced feature that needs to be turn on to enable the editing of animated objects? How can I make it work?

    Please email a copy of your posted reply to aasinc@hotmail.com

    Thank you.
  • Anonymous
    April 27, 2006
    Help!

    I'm facing the same problem as Jibran. In HTML view I changed the code to look like below and now I am not able to go to design view without reverting my changes:

    <link rel="stylesheet" href="<%= (string)Application["CSSBasepath"] + (string)Session["SelectedCss"] %>" type="text/css">

    I tried changing the double-quotes to single quotes but it did not help!

    Please help,

    thanks,

    -Vishal
  • Anonymous
    April 28, 2006
    Inner quotes must be different from outer quotes: either " outside and ' inside or the other way around:

    <link rel="stylesheet" href='<%= (string)Application["CSSBasepath"] + (string)Session["SelectedCss"] %>' type="text/css">
  • Anonymous
    May 29, 2009
    PingBack from http://paidsurveyshub.info/story.php?title=mikhail-arkhipov-msft-s-weblog-why-vs-2003-keeps-changing-your-html