What are schemeless URLs, why did we use them, and why are we getting rid of them? To understand schemeless URLs, you must first understand what the URL scheme is.
The scheme tells the browser how to download the asset the URL is pointing to. If the page has been loaded securely over SSL, then all assets on the page MUST be downloaded over SSL as well, otherwise the browser will load insecure assets during a secure browsing session. This is achieved by using the https scheme. Assets may be linked via https on a page downloaded during an insecure session, but not the other way around. Making sure that the links in a page match the state of the browsing session (secure or insecure) requires extra code on the web server generating the HTML, so many developers opt for schemeless URLs instead. With schemeless URLs the browser decides whether to use http or https to download assets based on the security of the browsing session.
That seemed like a good idea for us to offload that decision making to the browser. After all, the browser would know the security of the browsing session. Then came the headaches…
Schemeless URLs can cause stylesheets to be downloaded twice with Internet Explorer 7 and 8
Despite it being one of the most popular browsers world wide, I still get raised eyebrows when I tell some developers that MindTouch supports Internet Explorer 8. MindTouch takes IE8 compatibility seriously, therefore we were very intrigued when we discovered there was a quirk with IE7, IE8, and schemeless URLs. Both these browsers download stylesheets twice if linked by a schemeless URL, causing a performance hit when loading a MindTouch page. Steve Souders, web performance grandmaster at Google, explains it in this post.
Microsoft Outlook has no idea how to handle schemeless URLs and freezes up
MindTouch sends out page change notification, password change, and new user registration emails with the MindTouch site logo image embedded in the email HTML. Unfortunately, the logo image was also referenced as a schemeless URL. Because it sort of looks like the Universal Naming Convention network path (the \\ prefix), Outlook goes on an expedition to find the image and never recovers.
Some HTTP proxies treat schemeless URLs as relative paths
Both of the previous issues are moot if you can guarantee that your HTML will only be consumed by a modern web browser. However, the nail in the coffin for MindTouch’s use of schemeless URLs resulted from how some HTTP proxies out in the web are handling them. Our DevOps team alerted us to thousands of strange requests being sent to our cloud infrastructure, all of them resulting in a HTTP 404 NOT FOUND response.
Upon further investigation, we discovered that our schemeless CDN URLs were being treated as relative paths to the domain hosted by MindTouch. As a result, the CDN hostname was stuck in the middle of the request URL, pointing to an asset on MindTouch’s infrastructure that couldn’t possibly exist.
Who or what upstream from MindTouch could be doing this? The hypothesis here is that a HTTP proxy cache between the client and our infrastructure may be “cleaning up” what it perceives to be relative URL paths, and prepending them with the request hostname. There is no way to determine with absolute certainty if a visitor is intentionally or unintentionally coming through a proxy when visiting a website. However, its clear that the handling of the schemeless URL is inconsistent enough to make it a serious weakness in our application.
Sayonara to schemeless URLs.