All posts by dotte

Best Practices for Speeding Up Your Web Site

The Exceptional Performance team has identified a number of best practices for making web pages fast. The list includes 35 best practices divided into 7 categories.

Minimize HTTP Requests

tag: content

80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.

One way to reduce the number of components in the page is to simplify the page’s design. But is there a way to build pages with richer content while also achieving fast response times? Here are some techniques for reducing the number of HTTP requests, while still supporting rich page designs.

Combined files are a way to reduce the number of HTTP requests by combining all scripts into a single script, and similarly combining all CSS into a single stylesheet. Combining files is more challenging when the scripts and stylesheets vary from page to page, but making this part of your release process improves response times.

CSS Sprites are the preferred method for reducing the number of image requests. Combine your background images into a single image and use the CSS background-image and background-position properties to display the desired image segment.

Image maps combine multiple images into a single image. The overall size is about the same, but reducing the number of HTTP requests speeds up the page. Image maps only work if the images are contiguous in the page, such as a navigation bar. Defining the coordinates of image maps can be tedious and error prone. Using image maps for navigation is not accessible too, so it’s not recommended.

Inline images use the data: URL scheme to embed the image data in the actual page. This can increase the size of your HTML document. Combining inline images into your (cached) stylesheets is a way to reduce HTTP requests and avoid increasing the size of your pages. Inline images are not yet supported across all major browsers.

Reducing the number of HTTP requests in your page is the place to start. This is the most important guideline for improving performance for first time visitors. As described in Tenni Theurer’s blog post Browser Cache Usage – Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these first time visitors is key to a better user experience.

top | discuss this rule

Use a Content Delivery Network

tag: server

The user’s proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user’s perspective. But where should you start?

As a first step to implementing geographically dispersed content, don’t attempt to redesign your web application to work in a distributed architecture. Depending on the application, changing the architecture could include daunting tasks such as synchronizing session state and replicating database transactions across server locations. Attempts to reduce the distance between users and your content could be delayed by, or never pass, this application architecture step.

Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. This is the Performance Golden Rule. Rather than starting with the difficult task of redesigning your application architecture, it’s better to first disperse your static content. This not only achieves a bigger reduction in response times, but it’s easier thanks to content delivery networks.

A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen.

Some large Internet companies own their own CDN, but it’s cost-effective to use a CDN service provider, such as Akamai Technologies, EdgeCast, or level3. For start-up companies and private web sites, the cost of a CDN service can be prohibitive, but as your target audience grows larger and becomes more global, a CDN is necessary to achieve fast response times. At Yahoo!, properties that moved static content off their application web servers to a CDN (both 3rd party as mentioned above as well as Yahoo’s own CDN) improved end-user response times by 20% or more. Switching to a CDN is a relatively easy code change that will dramatically improve the speed of your web site.

top | discuss this rule

Add an Expires or a Cache-Control Header

tag: server

There are two aspects to this rule:

  • For static components: implement “Never expire” policy by setting far future Expires header
  • For dynamic components: use an appropriate Cache-Control header to help the browser with conditional requests

 

Web page designs are getting richer and richer, which means more scripts, stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.

Browsers (and proxies) use a cache to reduce the number and size of HTTP requests, making web pages load faster. A web server uses the Expires header in the HTTP response to tell the client how long a component can be cached. This is a far future Expires header, telling the browser that this response won’t be stale until April 15, 2010.

      Expires: Thu, 15 Apr 2010 20:00:00 GMT

 

If your server is Apache, use the ExpiresDefault directive to set an expiration date relative to the current date. This example of the ExpiresDefault directive sets the Expires date 10 years out from the time of the request.

      ExpiresDefault "access plus 10 years"

 

Keep in mind, if you use a far future Expires header you have to change the component’s filename whenever the component changes. At Yahoo! we often make this step part of the build process: a version number is embedded in the component’s filename, for example, yahoo_2.0.6.js.

Using a far future Expires header affects page views only after a user has already visited your site. It has no effect on the number of HTTP requests when a user visits your site for the first time and the browser’s cache is empty. Therefore the impact of this performance improvement depends on how often users hit your pages with a primed cache. (A “primed cache” already contains all of the components in the page.) We measured this at Yahoo! and found the number of page views with a primed cache is 75-85%. By using a far future Expires header, you increase the number of components that are cached by the browser and re-used on subsequent page views without sending a single byte over the user’s Internet connection.

top | discuss this rule

Gzip Components

tag: server

The time it takes to transfer an HTTP request and response across the network can be significantly reduced by decisions made by front-end engineers. It’s true that the end-user’s bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.

Starting with HTTP/1.1, web clients indicate support for compression with the Accept-Encoding header in the HTTP request.

      Accept-Encoding: gzip, deflate

 

If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.

      Content-Encoding: gzip

 

Gzip is the most popular and effective compression method at this time. It was developed by the GNU project and standardized by RFC 1952. The only other compression format you’re likely to see is deflate, but it’s less effective and less popular.

Gzipping generally reduces the response size by about 70%. Approximately 90% of today’s Internet traffic travels through browsers that claim to support gzip. If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses mod_gzip while Apache 2.x uses mod_deflate.

There are known issues with browsers and proxies that may cause a mismatch in what the browser expects and what it receives with regard to compressed content. Fortunately, these edge cases are dwindling as the use of older browsers drops off. The Apache modules help out by adding appropriate Vary response headers automatically.

Servers choose what to gzip based on file type, but are typically too limited in what they decide to compress. Most web sites gzip their HTML documents. It’s also worthwhile to gzip your scripts and stylesheets, but many web sites miss this opportunity. In fact, it’s worthwhile to compress any text response including XML and JSON. Image and PDF files should not be gzipped because they are already compressed. Trying to gzip them not only wastes CPU but can potentially increase file sizes.

Gzipping as many file types as possible is an easy way to reduce page weight and accelerate the user experience.

top | discuss this rule

Put Stylesheets at the Top

tag: css

While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages appear to be loading faster. This is because putting stylesheets in the HEAD allows the page to render progressively.

Front-end engineers that care about performance want a page to load progressively; that is, we want the browser to display whatever content it has as soon as possible. This is especially important for pages with a lot of content and for users on slower Internet connections. The importance of giving users visual feedback, such as progress indicators, has been well researched and documented. In our case the HTML page is the progress indicator! When the browser loads the page progressively the header, the navigation bar, the logo at the top, etc. all serve as visual feedback for the user who is waiting for the page. This improves the overall user experience.

The problem with putting stylesheets near the bottom of the document is that it prohibits progressive rendering in many browsers, including Internet Explorer. These browsers block rendering to avoid having to redraw elements of the page if their styles change. The user is stuck viewing a blank white page.

The HTML specification clearly states that stylesheets are to be included in the HEAD of the page: “Unlike A, [LINK] may only appear in the HEAD section of a document, although it may appear any number of times.” Neither of the alternatives, the blank white screen or flash of unstyled content, are worth the risk. The optimal solution is to follow the HTML specification and load your stylesheets in the document HEAD.

top | discuss this rule

Put Scripts at the Bottom

tag: javascript

The problem caused by scripts is that they block parallel downloads. The HTTP/1.1 specification suggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. While a script is downloading, however, the browser won’t start any other downloads, even on different hostnames.

In some situations it’s not easy to move scripts to the bottom. If, for example, the script uses document.write to insert part of the page’s content, it can’t be moved lower in the page. There might also be scoping issues. In many cases, there are ways to workaround these situations.

An alternative suggestion that often comes up is to use deferred scripts. The DEFER attribute indicates that the script does not contain document.write, and is a clue to browsers that they can continue rendering. Unfortunately, Firefox doesn’t support the DEFER attribute. In Internet Explorer, the script may be deferred, but not as much as desired. If a script can be deferred, it can also be moved to the bottom of the page. That will make your web pages load faster.

top | discuss this rule

Avoid CSS Expressions

tag: css

CSS expressions are a powerful (and dangerous) way to set CSS properties dynamically. They were supported in Internet Explorer starting with version 5, but were deprecated starting with IE8. As an example, the background color could be set to alternate every hour using CSS expressions:

      background-color: expression( (new Date()).getHours()%2 ? "#B8D4FF" : "#F08A00" );

 

As shown here, the expression method accepts a JavaScript expression. The CSS property is set to the result of evaluating the JavaScript expression. The expression method is ignored by other browsers, so it is useful for setting properties in Internet Explorer needed to create a consistent experience across browsers.

The problem with expressions is that they are evaluated more frequently than most people expect. Not only are they evaluated when the page is rendered and resized, but also when the page is scrolled and even when the user moves the mouse over the page. Adding a counter to the CSS expression allows us to keep track of when and how often a CSS expression is evaluated. Moving the mouse around the page can easily generate more than 10,000 evaluations.

One way to reduce the number of times your CSS expression is evaluated is to use one-time expressions, where the first time the expression is evaluated it sets the style property to an explicit value, which replaces the CSS expression. If the style property must be set dynamically throughout the life of the page, using event handlers instead of CSS expressions is an alternative approach. If you must use CSS expressions, remember that they may be evaluated thousands of times and could affect the performance of your page.

top | discuss this rule

Make JavaScript and CSS External

tag: javascript, css

Many of these performance rules deal with how external components are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?

Using external files in the real world generally produces faster pages because the JavaScript and CSS files are cached by the browser. JavaScript and CSS that are inlined in HTML documents get downloaded every time the HTML document is requested. This reduces the number of HTTP requests that are needed, but increases the size of the HTML document. On the other hand, if the JavaScript and CSS are in external files cached by the browser, the size of the HTML document is reduced without increasing the number of HTTP requests.

The key factor, then, is the frequency with which external JavaScript and CSS components are cached relative to the number of HTML documents requested. This factor, although difficult to quantify, can be gauged using various metrics. If users on your site have multiple page views per session and many of your pages re-use the same scripts and stylesheets, there is a greater potential benefit from cached external files.

Many web sites fall in the middle of these metrics. For these sites, the best solution generally is to deploy the JavaScript and CSS as external files. The only exception where inlining is preferable is with home pages, such as Yahoo!’s front page and My Yahoo!. Home pages that have few (perhaps only one) page view per session may find that inlining JavaScript and CSS results in faster end-user response times.

For front pages that are typically the first of many page views, there are techniques that leverage the reduction of HTTP requests that inlining provides, as well as the caching benefits achieved through using external files. One such technique is to inline JavaScript and CSS in the front page, but dynamically download the external files after the page has finished loading. Subsequent pages would reference the external files that should already be in the browser’s cache.

top | discuss this rule

Reduce DNS Lookups

tag: content

The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people’s names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server’s IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can’t download anything from this hostname until the DNS lookup is completed.

DNS lookups are cached for better performance. This caching can occur on a special caching server, maintained by the user’s ISP or local area network, but there is also caching that occurs on the individual user’s computer. The DNS information remains in the operating system’s DNS cache (the “DNS Client service” on Microsoft Windows). Most browsers have their own caches, separate from the operating system’s cache. As long as the browser keeps a DNS record in its own cache, it doesn’t bother the operating system with a request for the record.

Internet Explorer caches DNS lookups for 30 minutes by default, as specified by the DnsCacheTimeout registry setting. Firefox caches DNS lookups for 1 minute, controlled by the network.dnsCacheExpiration configuration setting. (Fasterfox changes this to 1 hour.)

When the client’s DNS cache is empty (for both the browser and the operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page’s URL, images, script files, stylesheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups.

Reducing the number of unique hostnames has the potential to reduce the amount of parallel downloading that takes place in the page. Avoiding DNS lookups cuts response times, but reducing parallel downloads may increase response times. My guideline is to split these components across at least two but no more than four hostnames. This results in a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads.

top | discuss this rule

Minify JavaScript and CSS

tag: javascript, css

Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are JSMin and YUI Compressor. The YUI compressor can also minify CSS.

Obfuscation is an alternative optimization that can be applied to source code. It’s more complex than minification and thus more likely to generate bugs as a result of the obfuscation step itself. In a survey of ten top U.S. web sites, minification achieved a 21% size reduction versus 25% for obfuscation. Although obfuscation has a higher size reduction, minifying JavaScript is less risky.

In addition to minifying external scripts and styles, inlined <script> and <style> blocks can and should also be minified. Even if you gzip your scripts and styles, minifying them will still reduce the size by 5% or more. As the use and size of JavaScript and CSS increases, so will the savings gained by minifying your code.

top | discuss this rule

Avoid Redirects

tag: content

Redirects are accomplished using the 301 and 302 status codes. Here’s an example of the HTTP headers in a 301 response:

      HTTP/1.1 301 Moved Permanently
      Location: http://example.com/newuri
      Content-Type: text/html

 

The browser automatically takes the user to the URL specified in the Location field. All the information necessary for a redirect is in the headers. The body of the response is typically empty. Despite their names, neither a 301 nor a 302 response is cached in practice unless additional headers, such as Expires or Cache-Control, indicate it should be. The meta refresh tag and JavaScript are other ways to direct users to a different URL, but if you must do a redirect, the preferred technique is to use the standard 3xx HTTP status codes, primarily to ensure the back button works correctly.

The main thing to remember is that redirects slow down the user experience. Inserting a redirect between the user and the HTML document delays everything in the page since nothing in the page can be rendered and no components can start being downloaded until the HTML document has arrived.

One of the most wasteful redirects happens frequently and web developers are generally not aware of it. It occurs when a trailing slash (/) is missing from a URL that should otherwise have one. For example, going to http://astrology.yahoo.com/astrology results in a 301 response containing a redirect to http://astrology.yahoo.com/astrology/ (notice the added trailing slash). This is fixed in Apache by using Alias or mod_rewrite, or the DirectorySlash directive if you’re using Apache handlers.

Connecting an old web site to a new one is another common use for redirects. Others include connecting different parts of a website and directing the user based on certain conditions (type of browser, type of user account, etc.). Using a redirect to connect two web sites is simple and requires little additional coding. Although using redirects in these situations reduces the complexity for developers, it degrades the user experience. Alternatives for this use of redirects include using Alias and mod_rewrite if the two code paths are hosted on the same server. If a domain name change is the cause of using redirects, an alternative is to create a CNAME (a DNS record that creates an alias pointing from one domain name to another) in combination with Alias or mod_rewrite.

top | discuss this rule

Remove Duplicate Scripts

tag: javascript

It hurts performance to include the same JavaScript file twice in one page. This isn’t as unusual as you might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.

Unnecessary HTTP requests happen in Internet Explorer, but not in Firefox. In Internet Explorer, if an external script is included twice and is not cacheable, it generates two HTTP requests during page loading. Even if the script is cacheable, extra HTTP requests occur when the user reloads the page.

In addition to generating wasteful HTTP requests, time is wasted evaluating the script multiple times. This redundant JavaScript execution happens in both Firefox and Internet Explorer, regardless of whether the script is cacheable.

One way to avoid accidentally including the same script twice is to implement a script management module in your templating system. The typical way to include a script is to use the SCRIPT tag in your HTML page.

      <script type="text/javascript" src="menu_1.0.17.js"></script>

 

An alternative in PHP would be to create a function called insertScript.

      <?php insertScript("menu.js") ?>

 

In addition to preventing the same script from being inserted multiple times, this function could handle other issues with scripts, such as dependency checking and adding version numbers to script filenames to support far future Expires headers.

top | discuss this rule

Configure ETags

tag: server

Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser’s cache matches the one on the origin server. (An “entity” is another word a “component”: images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component’s ETag using the ETag response header.

      HTTP/1.1 200 OK
      Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT
      ETag: "10c24bc-4ab-457e1c1f"
      Content-Length: 12195

 

Later, if the browser has to validate a component, it uses the If-None-Matchheader to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned reducing the response by 12195 bytes for this example.

      GET /i/yahoo.gif HTTP/1.1
      Host: us.yimg.com
      If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT
      If-None-Match: "10c24bc-4ab-457e1c1f"
      HTTP/1.1 304 Not Modified

 

The problem with ETags is that they typically are constructed using attributes that make them unique to a specific server hosting a site. ETags won’t match when a browser gets the original component from one server and later tries to validate that component on a different server, a situation that is all too common on Web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.

The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.

IIS 5.0 and 6.0 have a similar issue with ETags. The format for ETags on IIS is Filetimestamp:ChangeNumber. A ChangeNumber is a counter used to track configuration changes to IIS. It’s unlikely that the ChangeNumberis the same across all IIS servers behind a web site.

The end result is ETags generated by Apache and IIS for the exact same component won’t match from one server to another. If the ETags don’t match, the user doesn’t receive the small, fast 304 response that ETags were designed for; instead, they’ll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn’t a problem. But if you have multiple servers hosting your web site, and you’re using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you’re consuming greater bandwidth, and proxies aren’t caching your content efficiently. Even if your components have a far future Expiresheader, a conditional GET request is still made whenever the user hits Reload or Refresh.

If you’re not taking advantage of the flexible validation model that ETags provide, it’s better to just remove the ETag altogether. The Last-Modified header validates based on the component’s timestamp. And removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests. This Microsoft Support articledescribes how to remove ETags. In Apache, this is done by simply adding the following line to your Apache configuration file:

      FileETag none

 

top | discuss this rule

Make Ajax Cacheable

tag: content

One of the cited benefits of Ajax is that it provides instantaneous feedback to the user because it requests information asynchronously from the backend web server. However, using Ajax is no guarantee that the user won’t be twiddling his thumbs waiting for those asynchronous JavaScript and XML responses to return. In many applications, whether or not the user is kept waiting depends on how Ajax is used. For example, in a web-based email client the user will be kept waiting for the results of an Ajax request to find all the email messages that match their search criteria. It’s important to remember that “asynchronous” does not imply “instantaneous”.

To improve performance, it’s important to optimize these Ajax responses. The most important way to improve the performance of Ajax is to make the responses cacheable, as discussed in Add an Expires or a Cache-Control Header. Some of the other rules also apply to Ajax:

 

 

Let’s look at an example. A Web 2.0 email client might use Ajax to download the user’s address book for autocompletion. If the user hasn’t modified her address book since the last time she used the email web app, the previous address book response could be read from cache if that Ajax response was made cacheable with a future Expires or Cache-Control header. The browser must be informed when to use a previously cached address book response versus requesting a new one. This could be done by adding a timestamp to the address book Ajax URL indicating the last time the user modified her address book, for example, &t=1190241612. If the address book hasn’t been modified since the last download, the timestamp will be the same and the address book will be read from the browser’s cache eliminating an extra HTTP roundtrip. If the user has modified her address book, the timestamp ensures the new URL doesn’t match the cached response, and the browser will request the updated address book entries.

Even though your Ajax responses are created dynamically, and might only be applicable to a single user, they can still be cached. Doing so will make your Web 2.0 apps faster.

top | discuss this rule

Flush the Buffer Early

tag: server

When users request a page, it can take anywhere from 200 to 500ms for the backend server to stitch together the HTML page. During this time, the browser is idle as it waits for the data to arrive. In PHP you have the function flush(). It allows you to send your partially ready HTML response to the browser so that the browser can start fetching components while your backend is busy with the rest of the HTML page. The benefit is mainly seen on busy backends or light frontends.

A good place to consider flushing is right after the HEAD because the HTML for the head is usually easier to produce and it allows you to include any CSS and JavaScript files for the browser to start fetching in parallel while the backend is still processing.

Example:

      ... <!-- css, js -->
    </head>
    <?php flush(); ?>
    <body>
      ... <!-- content -->

 

Yahoo! search pioneered research and real user testing to prove the benefits of using this technique.

top

Use GET for AJAX Requests

tag: server

The Yahoo! Mail team found that when using XMLHttpRequest, POST is implemented in the browsers as a two-step process: sending the headers first, then sending data. So it’s best to use GET, which only takes one TCP packet to send (unless you have a lot of cookies). The maximum URL length in IE is 2K, so if you send more than 2K data you might not be able to use GET.

An interesting side affect is that POST without actually posting any data behaves like GET. Based on the HTTP specs, GET is meant for retrieving information, so it makes sense (semantically) to use GET when you’re only requesting data, as opposed to sending data to be stored server-side.

 

top

Post-load Components

tag: content

You can take a closer look at your page and ask yourself: “What’s absolutely required in order to render the page initially?”. The rest of the content and components can wait.

JavaScript is an ideal candidate for splitting before and after the onload event. For example if you have JavaScript code and libraries that do drag and drop and animations, those can wait, because dragging elements on the page comes after the initial rendering. Other places to look for candidates for post-loading include hidden content (content that appears after a user action) and images below the fold.

Tools to help you out in your effort: YUI Image Loader allows you to delay images below the fold and the YUI Get utility is an easy way to include JS and CSS on the fly. For an example in the wild take a look at Yahoo! Home Page with Firebug’s Net Panel turned on.

It’s good when the performance goals are inline with other web development best practices. In this case, the idea of progressive enhancement tells us that JavaScript, when supported, can improve the user experience but you have to make sure the page works even without JavaScript. So after you’ve made sure the page works fine, you can enhance it with some post-loaded scripts that give you more bells and whistles such as drag and drop and animations.

top

Preload Components

tag: content

Preload may look like the opposite of post-load, but it actually has a different goal. By preloading components you can take advantage of the time the browser is idle and request components (like images, styles and scripts) you’ll need in the future. This way when the user visits the next page, you could have most of the components already in the cache and your page will load much faster for the user.

There are actually several types of preloading:

  • Unconditional preload – as soon as onload fires, you go ahead and fetch some extra components. Check google.com for an example of how a sprite image is requested onload. This sprite image is not needed on the google.com homepage, but it is needed on the consecutive search result page.
  • Conditional preload – based on a user action you make an educated guess where the user is headed next and preload accordingly. On search.yahoo.com you can see how some extra components are requested after you start typing in the input box.
  • Anticipated preload – preload in advance before launching a redesign. It often happens after a redesign that you hear: “The new site is cool, but it’s slower than before”. Part of the problem could be that the users were visiting your old site with a full cache, but the new one is always an empty cache experience. You can mitigate this side effect by preloading some components before you even launched the redesign. Your old site can use the time the browser is idle and request images and scripts that will be used by the new site

top

Reduce the Number of DOM Elements

tag: content

A complex page means more bytes to download and it also means slower DOM access in JavaScript. It makes a difference if you loop through 500 or 5000 DOM elements on the page when you want to add an event handler for example.

A high number of DOM elements can be a symptom that there’s something that should be improved with the markup of the page without necessarily removing content. Are you using nested tables for layout purposes? Are you throwing in more <div>s only to fix layout issues? Maybe there’s a better and more semantically correct way to do your markup.

A great help with layouts are the YUI CSS utilities: grids.css can help you with the overall layout, fonts.css and reset.css can help you strip away the browser’s defaults formatting. This is a chance to start fresh and think about your markup, for example use <div>s only when it makes sense semantically, and not because it renders a new line.

The number of DOM elements is easy to test, just type in Firebug’s console:
document.getElementsByTagName('*').length

And how many DOM elements are too many? Check other similar pages that have good markup. For example the Yahoo! Home Page is a pretty busy page and still under 700 elements (HTML tags).

top

Split Components Across Domains

tag: content

Splitting components allows you to maximize parallel downloads. Make sure you’re using not more than 2-4 domains because of the DNS lookup penalty. For example, you can host your HTML and dynamic content on www.example.org and split static components between static1.example.org and static2.example.org

For more information check “Maximizing Parallel Downloads in the Carpool Lane” by Tenni Theurer and Patty Chi.

top

Minimize the Number of iframes

tag: content

Iframes allow an HTML document to be inserted in the parent document. It’s important to understand how iframes work so they can be used effectively.

<iframe> pros:

  • Helps with slow third-party content like badges and ads
  • Security sandbox
  • Download scripts in parallel

<iframe> cons:

  • Costly even if blank
  • Blocks page onload
  • Non-semantic

top

No 404s

tag: content

HTTP requests are expensive so making an HTTP request and getting a useless response (i.e. 404 Not Found) is totally unnecessary and will slow down the user experience without any benefit.

Some sites have helpful 404s “Did you mean X?”, which is great for the user experience but also wastes server resources (like database, etc). Particularly bad is when the link to an external JavaScript is wrong and the result is a 404. First, this download will block parallel downloads. Next the browser may try to parse the 404 response body as if it were JavaScript code, trying to find something usable in it.

top

tag: cookie

HTTP cookies are used for a variety of reasons such as authentication and personalization. Information about cookies is exchanged in the HTTP headers between web servers and browsers. It’s important to keep the size of cookies as low as possible to minimize the impact on the user’s response time.

For more information check “When the Cookie Crumbles”by Tenni Theurer and Patty Chi. The take-home of this research:

  • Eliminate unnecessary cookies
  • Keep cookie sizes as low as possible to minimize the impact on the user response time
  • Be mindful of setting cookies at the appropriate domain level so other sub-domains are not affected
  • Set an Expires date appropriately. An earlier Expires date or none removes the cookie sooner, improving the user response time

top

tag: cookie

When the browser makes a request for a static image and sends cookies together with the request, the server doesn’t have any use for those cookies. So they only create network traffic for no good reason. You should make sure static components are requested with cookie-free requests. Create a subdomain and host all your static components there.

If your domain is www.example.org, you can host your static components on static.example.org. However, if you’ve already set cookies on the top-level domain example.org as opposed to www.example.org, then all the requests to static.example.org will include those cookies. In this case, you can buy a whole new domain, host your static components there, and keep this domain cookie-free. Yahoo! uses yimg.com, YouTube uses ytimg.com, Amazon uses images-amazon.com and so on.

Another benefit of hosting static components on a cookie-free domain is that some proxies might refuse to cache the components that are requested with cookies. On a related note, if you wonder if you should use example.org or www.example.org for your home page, consider the cookie impact. Omitting www leaves you no choice but to write cookies to *.example.org, so for performance reasons it’s best to use the www subdomain and write the cookies to that subdomain.

top

Minimize DOM Access

tag: javascript

Accessing DOM elements with JavaScript is slow so in order to have a more responsive page, you should:

  • Cache references to accessed elements
  • Update nodes “offline” and then add them to the tree
  • Avoid fixing layout with JavaScript

For more information check the YUI theatre’s “High Performance Ajax Applications” by Julien Lecomte.

top

Develop Smart Event Handlers

tag: javascript

Sometimes pages feel less responsive because of too many event handlers attached to different elements of the DOM tree which are then executed too often. That’s why using event delegation is a good approach. If you have 10 buttons inside a div, attach only one event handler to the div wrapper, instead of one handler for each button. Events bubble up so you’ll be able to catch the event and figure out which button it originated from.

You also don’t need to wait for the onload event in order to start doing something with the DOM tree. Often all you need is the element you want to access to be available in the tree. You don’t have to wait for all images to be downloaded. DOMContentLoaded is the event you might consider using instead of onload, but until it’s available in all browsers, you can use the YUI Event utility, which has an onAvailable method.

For more information check the YUI theatre’s “High Performance Ajax Applications” by Julien Lecomte.

top

tag: css

One of the previous best practices states that CSS should be at the top in order to allow for progressive rendering.

In IE @import behaves the same as using <link> at the bottom of the page, so it’s best not to use it.

top

Avoid Filters

tag: css

The IE-proprietary AlphaImageLoader filter aims to fix a problem with semi-transparent true color PNGs in IE versions < 7. The problem with this filter is that it blocks rendering and freezes the browser while the image is being downloaded. It also increases memory consumption and is applied per element, not per image, so the problem is multiplied.

The best approach is to avoid AlphaImageLoader completely and use gracefully degrading PNG8 instead, which are fine in IE. If you absolutely need AlphaImageLoader, use the underscore hack _filter as to not penalize your IE7+ users.

top

Optimize Images

tag: images

After a designer is done with creating the images for your web page, there are still some things you can try before you FTP those images to your web server.

  • You can check the GIFs and see if they are using a palette size corresponding to the number of colors in the image. Using imagemagick it’s easy to check using
    identify -verbose image.gif
    When you see an image using 4 colors and a 256 color “slots” in the palette, there is room for improvement.
  • Try converting GIFs to PNGs and see if there is a saving. More often than not, there is. Developers often hesitate to use PNGs due to the limited support in browsers, but this is now a thing of the past. The only real problem is alpha-transparency in true color PNGs, but then again, GIFs are not true color and don’t support variable transparency either. So anything a GIF can do, a palette PNG (PNG8) can do too (except for animations). This simple imagemagick command results in totally safe-to-use PNGs:
    convert image.gif image.png
    “All we are saying is: Give PiNG a Chance!”
  • Run pngcrush (or any other PNG optimizer tool) on all your PNGs. Example:
    pngcrush image.png -rem alla -reduce -brute result.png
  • Run jpegtran on all your JPEGs. This tool does lossless JPEG operations such as rotation and can also be used to optimize and remove comments and other useless information (such as EXIF information) from your images.
    jpegtran -copy none -optimize -perfect src.jpg dest.jpg

top

Optimize CSS Sprites

tag: images

  • Arranging the images in the sprite horizontally as opposed to vertically usually results in a smaller file size.
  • Combining similar colors in a sprite helps you keep the color count low, ideally under 256 colors so to fit in a PNG8.
  • “Be mobile-friendly” and don’t leave big gaps between the images in a sprite. This doesn’t affect the file size as much but requires less memory for the user agent to decompress the image into a pixel map. 100×100 image is 10 thousand pixels, where 1000×1000 is 1 million pixels

top

Don’t Scale Images in HTML

tag: images

Don’t use a bigger image than you need just because you can set the width and height in HTML. If you need
<img width="100" height="100" src="mycat.jpg" alt="My Cat" />
then your image (mycat.jpg) should be 100x100px rather than a scaled down 500x500px image.

top

Make favicon.ico Small and Cacheable

tag: images

The favicon.ico is an image that stays in the root of your server. It’s a necessary evil because even if you don’t care about it the browser will still request it, so it’s better not to respond with a 404 Not Found. Also since it’s on the same server, cookies are sent every time it’s requested. This image also interferes with the download sequence, for example in IE when you request extra components in the onload, the favicon will be downloaded before these extra components.

So to mitigate the drawbacks of having a favicon.ico make sure:

  • It’s small, preferably under 1K.
  • Set Expires header with what you feel comfortable (since you cannot rename it if you decide to change it). You can probably safely set the Expires header a few months in the future. You can check the last modified date of your current favicon.ico to make an informed decision.

Imagemagick can help you create small favicons

top

Keep Components under 25K

tag: mobile

This restriction is related to the fact that iPhone won’t cache components bigger than 25K. Note that this is the uncompressed size. This is where minification is important because gzip alone may not be sufficient.

For more information check “Performance Research, Part 5: iPhone Cacheability – Making it Stick” by Wayne Shea and Tenni Theurer.

top

Pack Components into a Multipart Document

tag: mobile

Packing components into a multipart document is like an email with attachments, it helps you fetch several components with one HTTP request (remember: HTTP requests are expensive). When you use this technique, first check if the user agent supports it (iPhone does not).

Avoid Empty Image src

tag: server

Image with empty string srcattribute occurs more than one will expect. It appears in two form:

  1. straight HTML

    <img src=””>

  2. JavaScript

    var img = new Image();
    img.src = “”;

 

Both forms cause the same effect: browser makes another request to your server.

  • Internet Explorer makes a request to the directory in which the page is located.
  • Safari and Chrome make a request to the actual page itself.
  • Firefox 3 and earlier versions behave the same as Safari and Chrome, but version 3.5 addressed this issue[bug 444931] and no longer sends a request.
  • Opera does not do anything when an empty image src is encountered.

 

 

Why is this behavior bad?

  1. Cripple your servers by sending a large amount of unexpected traffic, especially for pages that get millions of page views per day.
  2. Waste server computing cycles generating a page that will never be viewed.
  3. Possibly corrupt user data. If you are tracking state in the request, either by cookies or in another way, you have the possibility of destroying data. Even though the image request does not return an image, all of the headers are read and accepted by the browser, including all cookies. While the rest of the response is thrown away, the damage may already be done.

 

 

The root cause of this behavior is the way that URI resolution is performed in browsers. This behavior is defined in RFC 3986 – Uniform Resource Identifiers. When an empty string is encountered as a URI, it is considered a relative URI and is resolved according to the algorithm defined in section 5.2. This specific example, an empty string, is listed in section 5.4. Firefox, Safari, and Chrome are all resolving an empty string correctly per the specification, while Internet Explorer is resolving it incorrectly, apparently in line with an earlier version of the specification, RFC 2396 – Uniform Resource Identifiers (this was obsoleted by RFC 3986). So technically, the browsers are doing what they are supposed to do to resolve relative URIs. The problem is that in this context, the empty string is clearly unintentional.

HTML5 adds to the description of the tag’s src attribute to instruct browsers not to make an additional request in section 4.8.2:

The src attribute must be present, and must contain a valid URL referencing a non-interactive, optionally animated, image resource that is neither paged nor scripted. If the base URI of the element is the same as the document’s address, then the src attribute’s value must not be the empty string.

Hopefully, browsers will not have this problem in the future. Unfortunately, there is no such clause for <script src=””> and <link href=””>. Maybe there is still time to make that adjustment to ensure browsers don’t accidentally implement this behavior.

 

This rule was inspired by Yahoo!’s JavaScript guru Nicolas C. Zakas. For more information check out his article “Empty image src can destroy your site“.

top

from:http://developer.yahoo.com/performance/rules.html

SSO

http://msdn.microsoft.com/en-us/magazine/ff872350.aspx

WIF

http://stackoverflow.com/questions/44509/single-sign-on-across-multiple-domains?rq=1

There are a number of open source cross-domain SSO packages such as JOSSO, OpenSSO, CAS, Shibboleth and others. If you’re using Microsoft Technology throughout (IIS, AD), you can use microsoft federation (ADFS) instead.

 

http://stackoverflow.com/questions/1043111/transparent-user-session-over-several-sites-single-sign-on-single-sign-off?rq=1t

 

Single sign out  http://www.open-open.com/doc/view/bab4731ca3e14408a44b6b32d0d8b314

 

CAS Single sign out http://www.blogjava.net/xmatthew/archive/2008/07/09/213808.html

 

Single Sign On (SSO) for cross-domain ASP.NET applications

http://www.codeproject.com/Articles/106439/Single-Sign-On-SSO-for-cross-domain-ASP-NET-applic

 

京东SSO分析 http://www.uml.org.cn/zjjs/201308135.asp

 

http://www.cnblogs.com/alee/archive/2007/02/05/641211.html

http://www.kuqin.com/system-analysis/20110901/264158.html

http://www.blogjava.net/Jack2007/archive/2008/04/10/191795.html

 

OWASP Top 10 – 2013, 最新十大安全隐患(ASP.NET解决方法)

OWASP(开放Web软体安全项目- Open Web Application Security Project)是一个开放社群、非营利性组织,目前全球有130个分会近万名会员,其主要目标是研议协助解决Web软体安全之标准、工具与技术文件,长期 致力于协助政府或企业了解并改善网页应用程式与网页服务的安全性。

 

下表左边是2010年的排名,下表右边是2013年的排名,可以看出改变的地方有:

  1. 2010年的Insecure Cryptographic Storage(不安全加密存储)和Insufficient Transport Layer Protection(传输层保护不足),在2013年合并为了一个:Sensitive Data Exposure(敏感数据暴露)
  2. 2010年的Failure to Restrict URL Access(不限制URL访问),在2013年成为了Missing Function Level Access Control(功能级别访问控制缺失)
  3. 2010年中Security Misconfiguration(安全配置错误)的一部分,在2013年单独拉出来,成为了Using Known Vulnerable Components(使用已知易受攻击组件)

 

OWASP Top 10 – 2010 (Previous) OWASP Top 10 – 2013 (New)
A1 – Injection A1 – Injection
A3 – Broken Authentication and Session Management A2 – Broken Authentication and Session Management
A2 – Cross-Site Scripting (XSS) A3 – Cross-Site Scripting (XSS)
A4 – Insecure Direct Object References A4 – Insecure Direct Object References
A6 – Security Misconfiguration A5 – Security Misconfiguration
A7 – Insecure Cryptographic Storage – Merged with A9 à A6 – Sensitive Data Exposure
A8 – Failure to Restrict URL Access – Broadened into à A7 – Missing Function Level Access Control
A5 – Cross-Site Request Forgery (CSRF) A8 – Cross-Site Request Forgery (CSRF)
<buried in A6: Security Misconfiguration> A9 – Using Known Vulnerable Components
A10 – Unvalidated Redirects and Forwards A10 – Unvalidated Redirects and Forwards
A9 – Insufficient Transport Layer Protection Merged with 2010-A7 into new 2013-A6

 

 

A1 – Injection(注入)

原因:代码中的逻辑裸依赖于外部的输入。

分支:SQL注入、OS命令注入、XPATH注入、LDAP注入、JSON注入、URL注入

名称 现象 解决方法
SQL注入 程序把用户输入的一段字符串直接用在了拼凑sql语句上,导致了用户可以控制sql语句,比如加入delete的行为、绕过用户密码验证等 使用参数形式调用sql使用存储过程(存储过程中不要使用动态sql拼语句)使用Linq, EF等框架来写(不要使用里面的直接拼sql语句的方式)
OS命令注入 因为是由程序拼凑命令行(包括参数)来实现调用外部程序的,因此用户也能够通过小计量来突破限制,实现调用其他外部程序 业务逻辑层要验证是否合法输入通过System.Diagnostics.Process来实现调用外部程序
XPATH注入 //Employee[UserName/text()=’aaron’ and password/text()=’password’]à//Employee[UserName/text()=’aaron’ or 1=1 or ‘a’ =’a’ and password/text()=’password’]这个和典型的sql注入一样,呵呵 解决方法和sql类似,也是对查询进行参数化,如下:Declare variable $userName as xs: string external;Declare variable $password as xs: string external;// Employee[UserName/text()=$userName and password/text()=$password]
LDAP注入 LDAP查询和sql查询类似,也是可以通过拼字符串得来的,因此页存在注入漏洞
JSON注入 {‘user’: ‘usera’, ‘sex’, ‘boy’}à{‘user’: ‘usera’, ‘sex’, ‘bo’y’}这样会导致js报错 传统webform下,使用JSON.NET来实现json数据的生成Mvc下,使用JSONResult来生成json数据
URL注入 http://www.a.com/a.aspx?p1=a&p1=b如果还有个cookie,name为:p1, value: c则,最终asp.net获取这个参数的value为:a,b,c 这个认识的还不够深入,而且和服务器端语言有关,只要asp.net会把几个参数value合并起来,其他语言都只取到一个,但是取到的是第一个还是最后一个,就看语言了。这个和业务逻辑有很大的关系

 

 

A2 – Broken Authentication and Session Management (失效的身份认证和会话管理)

原因:Session相关的数据没有被完整替换导致的安全问题

解决关注点:Login通过后,立刻把当前Session(包含Session, Cache, Cookie)失效掉,把需要保存进Session的value重开一个Session保存进;Logout功能中,除了把当前Session失效掉外,还要把Session相关的Cache也remove掉

登录 在login验证事件中,一旦合法身份验证通过后,就要把Session.Abort(),来重新获得新的Session(此时客户端的session cookie value也会被reset成新的)
注销 Session要Abort相关的缓存要clear额外的cookie也要被clear

 

 

A3 – Cross-Site Scripting (XSS) (跨站脚本)

原因:和Injection类似,只不过xss的关注点落在了html, javascript注入上,由于内容比较多,因此单独拉出来,成为了XSS

分支:反射式XSS、存储式XSS、基于DOM的XSS

解决关注点:html的输入输出编码、javascript的编码、url的编码

名称 现象 解决方法
反射式XSS 由于服务器端直接调用了客户端用户输入的数据(没有经过无害化处理),导致了对广大客户端用户的损害比如获取客户端用户在某网站的所有cookie,这样恶意用户就能实现session劫持等更进一步的攻击 对用户输入的数据要过滤特殊字符对输出到客户端的数据也要过滤特殊字符Html, js, url三大领域过滤方法不同,需要区别对待Server.HtmlEncode;Server.HtmlDecode;

Server.UrlEncode;

Server.UrlDecode;

Server.UrlPathEncode;

Js函数如下

存储式XSS 存储式XSS比反射式XSS更加深远,范围更广;因为这种未经处理的代码是保存到数据库中的,因此时间、范围都比较广
基于DOM的XSS AJAX程序中,JS代码没有过滤/转换用户输入的文本,导致了对DOM元素的结构性影响,或者导致了行为性的影响 Js中使用escape函数来过滤特殊字符,包括元素value、元素Attribute,都要encode起来escape,encodeURI,encodeURIComponent的使用参考goody9807的这篇文章:http://www.cnblogs.com/goody9807/archive/2009/01/16/1376913.html
Anti-XSS 脚本过滤库,http://msdn.microsoft.com/zh-cn/library/aa973813.aspx
Anti-XSS SRE SRE: Security Runtime Engine的缩写是一个更智能的过滤系统,具体使用参考Syed的这篇文章:http://blogs.msdn.com/b/syedab/archive/2009/07/08/preventing-cross-site-scripting-attacks-using-microsoft-anti-xss-security-runtime-engine.aspx
ASP.NET MVC 4 <%=Html%>à不会进行转换<%: Html%>à会进行转换[AllowHtml]tag尽量不改动默认的ValidateRequest属性

 

 

A4 – Insecure Direct Object References (不安全的直接对象引用)

原因:http://www.a.com/userdetail.aspx?userId=1,容易认为的进行猜测userId=2等等,如果没有判断权限,就容易出现信息泄露的问题;但是如果url是http://www.a.com/userdetail.aspx?userId=ABAWEFRA则很难进行猜测

解决关注点:url参数的编码和解码

工具类 IndirectReference//根据数据库中的entity id生成UI客户端用于显示的字符串id,这个字符串id类似于散列值,不容易猜测,但是能被还原String GenerateUIID(string/int/guid)//根据UI客户端ID还原成原始的entity id,具体类型由T决定

String FromUIID<T>(string)

Webform开发模式下 Aspx页面中<a href=”product.aspx?productId=<%= IndirectReference.GenerateUIID(this.productID) %>”>产品A</a>Page_Load中this.productId= IndirectReference.FromUIID<int>(Request.QueryString[“productId”]);
MVC开发模式下 为Entity增加IndirectReferenceID,然后ModelBinder就能自动绑定了

 

 

A5 – Security Misconfiguration (安全配置错误)

原则:最少使用模块配置、最小权限配置;适用范围:OS,IIS,数据库

解决关注点:Web.config中的Error节点配置,比如404、403错误的重定向和日志记录、日志文件不能放在网站路径下;web.config文件的加密(aspnet_regiis),具体命令如下:使用命令行,如(run as admin): C:\Windows\Microsoft.NET\Framework\v4.0.30319\aspnet_regiis -site “VulnerableApp” -app “/” -pe “connectionStrings”

 

A6 – Sensitive Data Exposure (敏感数据暴露)

原因:敏感信息需要加密保存(内存、数据库中、客户端)+加密传输(HTTPS)+不缓存(这个只是尽量,具体看情况)

解决关注点:登录、付款这样的页面要用https保护传输

加密方法 密码用单向加密,如MD5
信用卡账号等需要加密后再存储到数据库中(可逆的加密方式)
传输层保护 貌似就https了,其他的不怎么了解
客户端cookie的保护 设置cookie的属性HttpOnlySecure
数据库的数据保护 除了程序中进行加密敏感数据外,数据库级别也要使用数据库加密

 

 

A7 – Missing Function Level Access Control (功能级别访问控制缺失)

原因:UI中显示了当前用户不能进行的操作,比如禁用了某个delete按钮(能被修改成disable: 0即可使用);权限验证是否覆盖到了某功能、UI;服务器端是否进行了权限验证(业务层级别)

解决关注点:权限验证

Sample: 读取文件时,比如下载时,如:download.aspx?file=a.txt,如果被修改成了download.aspx?file=http://www.cnblogs.com/config.xml 就麻烦了。

对UI的处理 导航栏中,如果没有权限访问的,就隐藏掉,不要弄disable之类的东西具体页面中的按钮也是这样的处理方式,隐藏不要禁用(就是用户不能操作的,就不要让用户看到,省的麻烦)在最终页面中要加入权限判断代码,这样即便直接输入了某特权url,由于还会在page中检查权限,因此还是安全的
对主要业务函数的处理
  1. 要有完善的安全系统
  2. 给主要业务函数贴上tag
[PrincipalPermission(SecurityAction.Demand, Role = “Admin”)]public void RemoveUserFromRole(string userName, string role){Roles.RemoveUserFromRole(userName, role);}

 

 

 

A8 – Cross-Site Request Forgery (CSRF) (跨站请求伪造)

原因:利用合法用户的身份,在合法用户的终端调用请求。这些请求可能是转账…

解决关注点:重要操作不要使用get方式,如:delete.aspx?id=1;要使用post方式;为每个能进行post动作的form增加token,并且在服务器端检查token是否合法,合法则进行操作;

Webform传统开发模式 给每个请求的页面加入token的解决方法:使用Anti-CSRF组件可解决,使用方法见:http://anticsrf.codeplex.com自定义ViewState默认的ViewState是没有加密的,很容易被看到具体的value,如通过这个工具就能看到:ViewStateDecoder,urlà http://download.csdn.net/detail/skt90u/3974340

可以通过给ViewState自定义来缓解那么一点点,但是没办法提升到像加入token那样的力度,代码很简单:

this.ViewStateUserKey=Convert.ToString(Session[“UserID”])

如上代码即可实现对ViewState的加密,会根据this.ViewStateUserKey的value对每个ViewState进行Salt类似的加密

MVC开发模式 [HttpPost][ValidateAntiForgeryToken]public ActionResult Login(Usr usr){return View();

}

 

在aspx模版或者Razor 模版中的form中增加如下代码:

<%=Html.AntiForgeryToken() %>

 

具体的方式参考这篇文章:http://www.cnblogs.com/leleroyn/archive/2010/12/30/1921544.html

 

 

A9 – Using Known Vulnerable Components (使用已知易受攻击组件)

原因:由于系统有意无意间使用了组件(自己的组件和第三方的组件,范围太广…),导致了不可预料的问题

解决关注点:对于自己的组件,要加强质量,这个已经和代码没有很多关系了,更多的是质量管理、版本管理方面的了,略;对于第三方的组件,要选择知名的提供商。

 

A10 – Unvalidated Redirects and Forwards(未验证的重定向和转发)

原因:当系统接受重定向参数(login界面居多,如:http://www.a.com/login.aspx?returnUrl=default.aspx),由于这个url会显示在浏览器地址栏中,只要修改这个url为http://www.a.com/login.aspx?returnUrl=http://wwv.a.com/login.aspx,用户输入密码后被重定向到假冒站点的login界面(用户以为输入的密码错误了),用户再次输入密码,此时密码就被假冒站点保存起来了…

解决关注点:对于returnUrl这种参数值进行判断,只要在白名单中的url才能redirect,尽量使用相对路径来redirect

工具类 RedirectForwardDispatcherVoid Config(List<string> whitelist)bool IsRedirectable(string newUrl)
使用方法 String returnUrl=…;If(!RedirectForwardDispatcher.IsRedirectable(returnUrl)){Throw exception(“Unvalidated Redirects and Forwards”);}

 

from:http://www.cnblogs.com/aarond/archive/2013/04/01/OWASP.html

Ref:https://www.owasp.org/images/8/89/OWASP_Top_10_2007_for_JEE.pdf
http://www.owasp.org.cn/OWASP_Conference/2013forum/20138bba575b-53174eac7ad9/04TonyOWASPTOP102013java.pdf
OWASP Top 10 for JavaScript
http://erlend.oftedal.no/blog/?blogid=125

OWASP Top 10 for .NET developers part 1: Injection

There’s a harsh reality web application developers need to face up to; we don’t do security very well. A report from WhiteHat Security last year reported “83% of websites have had a high, critical or urgent issue”. That is, quite simply, a staggeringly high number and it’s only once you start to delve into to depths of web security that you begin to understand just how easy it is to inadvertently produce vulnerable code.

Inevitably a large part of the problem is education. Oftentimes developers are simply either not aware of common security risks at all or they’re familiar with some of the terms but don’t understand the execution and consequently how to secure against them.

Of course none of this should come as a surprise when you consider only 18 percent of IT security budgets are dedicated to web application security yet in 86% of all attacks, a weakness in a web interface was exploited. Clearly there is an imbalance leaving the software layer of web applications vulnerable.

OWASP and the Top 10

Enter OWASP, the Open Web Application Security Project, a non-profit charitable organisation established with the express purpose of promoting secure web application design. OWASP has produced some excellent material over the years, not least of which is The Ten Most Critical Web Application Security Risks – or “Top 10” for short – whose users and adopters include a who’s who of big business.

The Top 10 is a fantastic resource for the purpose of identification and awareness of common security risks. However it’s abstracted slightly from the technology stack in that it doesn’t contain a lot of detail about the execution and required countermeasures at an implementation level. Of course this approach is entirely necessary when you consider the extensive range of programming languages potentially covered by the Top 10.

What I’ve been finding when directing .NET developers to the Top 10 is some confusion about how to comply at the coalface of development so I wanted to approach the Top 10 from the angle these people are coming from. Actually, .NET web applications are faring pretty well in the scheme of things. According to the WhiteHat Security Statistics Report released last week, the Microsoft stack had fewer exploits than the likes of PHP, Java and Perl. But it still had numerous compromised sites so there is obviously still work to be done.

Moving on, this is going to be a 10 part process. In each post I’m going to look at the security risk in detail, demonstrate – where possible – how it might be exploited in a .NET web application and then detail the countermeasures at a code level. Throughout these posts I’m going to draw as much information as possible out of the OWASP publication so each example ties back into an open standard.

Here’s what I’m going to cover:

1. Injection
2. Cross-Site Scripting (XSS)
3. Broken Authentication and Session Management
4. Insecure Direct Object References
5. Cross-Site Request Forgery (CSRF)
6. Security Misconfiguration
7. Insecure Cryptographic Storage
8. Failure to Restrict URL Access
9. Insufficient Transport Layer Protection
10. Unvalidated Redirects and Forwards

Some secure coding fundamentals

Before I start getting into the Top 10 it’s worth making a few fundamentals clear. Firstly, don’t stop securing applications at just these 10 risks. There are potentially limitless exploit techniques out there and whilst I’m going to be talking a lot about the most common ones, this is not an exhaustive list. Indeed the OWASP Top 10 itself continues to evolve; the risks I’m going to be looking at are from the 2010 revision which differs in a few areas from the 2007 release.

Secondly, applications are often compromised by applying a series of these techniques so don’t get too focussed on any single vulnerability. Consider the potential to leverage an exploit by linking vulnerabilities. Also think about the social engineering aspects of software vulnerabilities, namely that software security doesn’t start and end at purely technical boundaries.

Thirdly, the practices I’m going to write about by no means immunise code from malicious activity. There are always new and innovative means of increasing sophistication being devised to circumvent defences. The Top 10 should be viewed as a means of minimising risk rather than eliminating it entirely.

Finally, start thinking very, very laterally and approach this series of posts with an open mind. Experienced software developers are often blissfully unaware of how many of today’s vulnerabilities are exploited and I’m the first to put my hand up and say I’ve been one of these and continue to learn new facts about application security on a daily basis. This really is a serious discipline within the software industry and should not be approached casually.

Worked examples

I’m going to provide worked examples of both exploitable and secure code wherever possible. For the sake of retaining focus on the security concepts, the examples are going to be succinct, direct and as basic as possible.

So here’s the disclaimer: don’t expect elegant code, this is going to be elemental stuff written with the sole intention of illustrating security concepts. I’m not even going to apply basic practices such as sorting SQL statements unless it illustrates a security concept. Don’t write your production ready code this way!

Defining injection

Let’s get started. I’m going to draw directly from the OWASP definition of injection:

Injection flaws, such as SQL, OS, and LDAP injection, occur when untrusted data is sent to an interpreter as part of a command or query. The attacker’s hostile data can trick the interpreter into executing unintended commands or accessing unauthorized data.

The crux of the injection risk centres on the term “untrusted”. We’re going to see this word a lot over coming posts so let’s clearly define it now:

Untrusted data comes from any source – either direct or indirect – where integrity is not verifiable and intent may be malicious. This includes manual user input such as form data, implicit user input such as request headers and constructed user input such as query string variables. Consider the application to be a black box and any data entering it to be untrusted.

OWASP also includes a matrix describing the source, the exploit and the impact to business:

Threat
Agents
Attack
Vectors
Security
Weakness
Technical
Impacts
Business
Impact
  Exploitability
EASY
Prevalence
COMMON
Detectability
AVERAGE
Impact
SEVERE
 
Consider anyone who can send untrusted data to the system, including external users, internal users, and administrators. Attacker sends simple text-based attacks that exploit the syntax of the targeted interpreter. Almost any source of data can be an injection vector, including internal sources. Injection flaws occur when an application sends untrusted data to an interpreter. Injection flaws are very prevalent, particularly in legacy code, often found in SQL queries, LDAP queries, XPath queries, OS commands, program arguments, etc. Injection flaws are easy to discover when examining code, but more difficult via testing. Scanners and fuzzers can help attackers find them. Injection can result in data loss or corruption, lack of accountability, or denial of access. Injection can sometimes lead to complete host takeover. Consider the business value of the affected data and the platform running the interpreter. All data could be stolen, modified, or deleted. Could your reputation be harmed?

Most of you are probably familiar with the concept (or at least the term) of SQL injection but the injection risk is broader than just SQL and indeed broader than relational databases. As the weakness above explains, injection flaws can be present in technologies like LDAP or theoretically in any platform which that constructs queries from untrusted data.

Anatomy of a SQL injection attack

Let’s jump straight into how the injection flaw surfaces itself in code. We’ll look specifically at SQL injection because it means working in an environment familiar to most .NET developers and it’s also a very prevalent technology for the exploit. In the SQL context, the exploit needs to trick SQL Server into executing an unintended query constructed with untrusted data.

For the sake of simplicity and illustration, let’s assume we’re going to construct a SQL statement in C# using a parameter passed in a query string and bind the output to a grid view. In this case it’s the good old Northwind database driving a product page filtered by the beverages category which happens to be category ID 1. The web application has a link directly to the page where the CategoryID parameter is passed through in a query string. Here’s a snapshot of what the Products and Customers (we’ll get to this one) tables look like:

image

Here’s what the code is doing:

var catID = Request.QueryString["CategoryID"];
var sqlString = "SELECT * FROM Products WHERE CategoryID = " + catID;
var connString = WebConfigurationManager.ConnectionStrings
["NorthwindConnectionString"].ConnectionString;
using (var conn = new SqlConnection(connString))
{
  var command = new SqlCommand(sqlString, conn);
  command.Connection.Open();
  grdProducts.DataSource = command.ExecuteReader();
  grdProducts.DataBind();
}

And here’s what we’d normally expect to see in the browser:

image

In this scenario, the CategoryID query string is untrusted data. We assume it is properly formed and we assume it represents a valid category and we consequently assume the requested URL and the sqlString variable end up looking exactly like this (I’m going to highlight the untrusted data in red and show it both in the context of the requested URL and subsequent SQL statement):

Products.aspx?CategoryID=1
SELECT * FROM Products WHERE CategoryID = 1

Of course much has been said about assumption. The problem with the construction of this code is that by manipulating the query string value we can arbitrarily manipulate the command executed against the database. For example:

Products.aspx?CategoryID=1 or 1=1
SELECT * FROM Products WHERE CategoryID = 1 or 1=1

Obviously 1=1 always evaluates to true so the filter by category is entirely invalidated. Rather than displaying only beverages we’re now displaying products from all categories. This is interesting, but not particularly invasive so let’s push on a bit:

Products.aspx?CategoryID=1 or name=''
SELECT * FROM Products WHERE CategoryID = 1 or name=''

When this statement runs against the Northwind database it’s going to fail as the Products table has no column called name. In some form or another, the web application is going to return an error to the user. It will hopefully be a friendly error message contextualised within the layout of the website but at worst it may be a yellow screen of death. For the purpose of where we’re going with injection, it doesn’t really matter as just by virtue of receiving some form of error message we’ve quite likely disclosed information about the internal structure of the application, namely that there is no column called name in the table(s) the query is being executed against.

Let’s try something different:

Products.aspx?CategoryID=1 or productname=''
SELECT * FROM Products WHERE CategoryID = 1 or productname=''

This time the statement will execute successfully because the syntax is valid against Northwind so we have therefore confirmed the existence of the ProductName column. Obviously it’s easy to put this example together with prior knowledge of the underlying data schema but in most cases data models are not particularly difficult to guess if you understand a little bit about the application they’re driving. Let’s continue:

Products.aspx?CategoryID=1 or 1=(select count(*) from products)
SELECT * FROM Products WHERE CategoryID = 1 or 1=(select count(*) from products)

With the successful execution of this statement we have just verified the existence of the Products tables. This is a pretty critical step as it demonstrates the ability to validate the existence of individual tables in the database regardless of whether they are used by the query driving the page or not. This disclosure is starting to become serious information leakage we could potentially leverage to our advantage.

So far we’ve established that SQL statements are being arbitrarily executed based on the query string value and that there is a table called Product with a column called ProductName. Using the techniques above we could easily ascertain the existence of the Customers table and the CompanyName column by fairly assuming that an online system facilitating ordering may contain these objects. Let’s step it up a notch:

Products.aspx?CategoryID=1;update products set productname = productname
SELECT * FROM Products WHERE CategoryID = 1;update products set productname = productname

The first thing to note about the injection above is that we’re now executing multiple statements. The semicolon is terminating the first statement and allowing us to execute any statement we like afterwards. The second really important observation is that if this page successfully loads and returns a list of beverages, we have just confirmed the ability to write to the database. It’s about here that the penny usually drops in terms of understanding the potential ramifications of injection vulnerabilities and why OWASP categorises the technical impact as “severe”.

All the examples so far have been non-destructive. No data has been manipulated and the intrusion has quite likely not been detected. We’ve also not disclosed any actual data from the application, we’ve only established the schema. Let’s change that.

Products.aspx?CategoryID=1;insert into products(productname) select companyname from customers
SELECT * FROM Products WHERE CategoryID = 1;insert into products (productname) select companyname from customers

So as with the previous example, we’re terminating the CategoryID parameter then injecting a new statement but this time we’re populating data out of the Customers table. We’ve already established the existence of the tables and columns we’re dealing with and that we can write to the Products table so this statement executes beautifully. We can now load the results back into the browser:

Products.aspx?CategoryID=500 or categoryid is null
SELECT * FROM Products WHERE CategoryID = 500 or categoryid is null

The unfeasibly high CategoryID ensures existing records are excluded and we are making the assumption that the ID of new records defaults to null (obviously no default value on the column in this case). Here’s what the browser now discloses – note the company name of the customer now being disclosed in the ProductName column:

image

Bingo. Internal customer data now disclosed.

What made this possible?

The above example could only happen because of a series of failures in the application design. Firstly, the CategoryID query string parameter allowed any value to be assigned and executed by SQL Server without any parsing whatsoever. Although we would normally expect an integer, arbitrary strings were accepted.

Secondly, the SQL statement was constructed as a concatenated string and executed without any concept of using parameters. The CategoryID was consequently allowed to perform activities well outside the scope of its intended function.

Finally, the SQL Server account used to execute the statement had very broad rights. At the very least this one account appeared to have data reader and data writer rights. Further probing may have even allowed the dropping of tables or running of system commands if the account had the appropriate rights.

Validate all input against a whitelist

This is a critical concept not only this post but in the subsequent OWASP posts that will follow so I’m going to say it really, really loud:

All input must be validated against a whitelist of acceptable value ranges.

As per the definition I gave for untrusted data, the assumption must always be made that any data entering the system is malicious in nature until proven otherwise. The data might come from query strings like we just saw, from form variables, request headers or even file attributes such as the Exif metadata tags in JPG images.

In order to validate the integrity of the input we need to ensure it matches the pattern we expect. Blacklists looking for patterns such as we injected earlier on are hard work both because the list of potentially malicious input is huge and because it changes as new exploit techniques are discovered.

Validating all input against whitelists is both far more secure and much easier to implement. In the case above, we only expected a positive integer and anything outside that pattern should have been immediate cause for concern. Fortunately this is a simple pattern that can be easily validated against a regular expression. Let’s rewrite that first piece of code from earlier on with the help of whitelist validation:

var catID = Request.QueryString["CategoryID"];
var positiveIntRegex = new Regex(@"^0*[1-9][0-9]*$");
if(!positiveIntRegex.IsMatch(catID))
{
  lblResults.Text = "An invalid CategoryID has been specified.";
  return;
}

Just this one piece of simple validation has a major impact on the security of the code. It immediately renders all the examples further up completely worthless in that none of the malicious CategoryID values match the regex and the program will exit before any SQL execution occurs.

An integer is a pretty simple example but the same principal applies to other data types. A registration form, for example, might expect a “first name” form field to be provided. The whitelist rule for this field might specify that it can only contain the letters a-z and common punctuation characters (be careful with this – there are numerous characters outside this range that commonly appear in names), plus it must be within 30 characters of length. The more constraints that can be placed around the whitelist without resulting in false positives, the better.

Regular expression validators in ASP.NET are a great way to implement field level whitelists as they can easily provide both client side (which is never sufficient on its own) and server side validation plus they tie neatly into the validation summary control. MSDN has a good overview of how to use regular expressions to constrain input in ASP.NET so all you need to do now is actually understand how to write a regex.

Finally, no input validation story is complete without the infamous Bobby Tables:

image

Parameterised stored procedures

One of the problems we had above was that the query was simply a concatenated string generated dynamically at runtime. The account used to connect to SQL Server then needed broad permissions to perform whatever action was instructed by the SQL statement.

Let’s take a look at the stored procedure approach in terms of how it protects against SQL injection. Firstly, we’ll put together the SQL to create the procedure and grant execute rights to the user.

CREATE PROCEDURE GetProducts
  @CategoryID INT
AS
SELECT *
FROM dbo.Products
WHERE CategoryID = @CategoryID
GO
GRANT EXECUTE ON GetProducts TO NorthwindUser
GO

There are a couple of native defences in this approach. Firstly, the parameter must be of integer type or a conversion error will be raised when the value is passed. Secondly, the context of what this procedure – and by extension the invoking page – can do is strictly defined and secured directly to the named user. The broad reader and writer privileges which were earlier granted in order to execute the dynamic SQL are no longer needed in this context.

Moving on the .NET side of things:

var conn = new SqlConnection(connString);
using (var command = new SqlCommand("GetProducts", conn))
{
  command.CommandType = CommandType.StoredProcedure;
  command.Parameters.Add("@CategoryID", SqlDbType.Int).Value = catID;
  command.Connection.Open();
  grdProducts.DataSource = command.ExecuteReader();
  grdProducts.DataBind();
}

This is a good time to point out that parameterised stored procedures are an additional defence to parsing untrusted data against a whitelist. As we previously saw with the INT data type declared on the stored procedure input parameter, the command parameter declares the data type and if the catID string wasn’t an integer the implicit conversion would throw a System.FormatException before even touching the data layer. But of course that won’t do you any good if the type is already a string!

Just one final point on stored procedures; passing a string parameter and then dynamically constructing and executing SQL within the procedure puts you right back at the original dynamic SQL vulnerability. Don’t do this!

Named SQL parameters

One of problems with the code in the original exploit is that the SQL string is constructed in its entirety in the .NET layer and the SQL end has no concept of what the parameters are. As far as it’s concerned it has just received a perfectly valid command even though it may in fact have already been injected with malicious code.

Using named SQL parameters gives us far greater predictability about the structure of the query and allowable values of parameters. What you’ll see in the following code block is something very similar to the first dynamic SQL example except this time the SQL statement is a constant with the category ID declared as a parameter and added programmatically to the command object.

const string sqlString = "SELECT * FROM Products WHERE CategoryID =
@CategoryID";
var connString = WebConfigurationManager.ConnectionStrings
["NorthwindConnectionString"].ConnectionString;
using (var conn = new SqlConnection(connString))
{
  var command = new SqlCommand(sqlString, conn);
  command.Parameters.Add("@CategoryID", SqlDbType.Int).Value = catID;
  command.Connection.Open();
  grdProducts.DataSource = command.ExecuteReader();
  grdProducts.DataBind();
}

What this will give us is a piece of SQL that looks like this:

exec sp_executesql N'SELECT * FROM Products WHERE CategoryID =
@CategoryID',N'@CategoryID int',@CategoryID=1

There are two key things to observe in this statement:

  1. The sp_executesql command is invoked
  2. The CategoryID appears as a named parameter of INT data type

This statement is only going to execute if the account has data reader permissions to the Products table so one downside of this approach is that we’re effectively back in the same data layer security model as we were in the very first example. We’ll come to securing this further shortly.

The last thing worth noting with this approach is that the sp_executesql command also provides some query plan optimisations which although are not related to the security discussion, is a nice bonus.

LINQ to SQL

Stored procedures and parameterised queries are a great way of seriously curtailing the potential damage that can be done by SQL injection but they can also become pretty unwieldy. The case for using ORM as an alternative has been made many times before so I won’t rehash it here but I will look at this approach in the context of SQL injection. It’s also worthwhile noting that LINQ to SQL is only one of many ORMs out there and the principals discussed here are not limited purely to one of Microsoft’s interpretation of object mapping.

Firstly, let’s assume we’ve created a Northwind DBML and the data layer has been persisted into queryable classes. Things are now pretty simple syntax wise:

var dc = new NorthwindDataContext();
var catIDInt = Convert.ToInt16(catID);
grdProducts.DataSource = dc.Products.Where(p => p.CategoryID == catIDInt);
grdProducts.DataBind();

From a SQL injection perspective, once again the query string should have already been assessed against a whitelist and we shouldn’t be at this stage if it hasn’t passed. Before we can use the value in the “where” clause it needs to be converted to an integer because the DBML has persisted the INT type in the data layer and that’s what we’re going to be performing our equivalency test on. If the value wasn’t an integer we’d get that System.FormatException again and the data layer would never be touched.

LINQ to SQL now follows the same parameterised SQL route we saw earlier, it just abstracts the query so the developer is saved from being directly exposed to any SQL code. The database is still expected to execute what from its perspective, is an arbitrary statement:

exec sp_executesql N'SELECT [t0].[ProductID], [t0].[ProductName], [t0].
[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice],
[t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].
[Discontinued]

FROM [dbo].[Products] AS [t0]

WHERE [t0].[CategoryID] = @p0',N'@p0 int',@p0=1

There was some discussion about the security model in the early days of LINQ to SQL and concern expressed in terms of how it aligned to the prevailing school of thought regarding secure database design. Much of the reluctance related to the need to provide accounts connecting to SQL with reader and writer access at the table level. Concerns included the risk of SQL injection as well as from the DBA’s perspective, authority over the context a user was able to operate in moved from their control – namely within stored procedures – to the application developer’s control. However with parameterised SQL being generated and the application developers now being responsible for controlling user context and access rights it was more a case of moving cheese than any new security vulnerabilities.

Applying the principle of least privilege

The final flaw in the successful exploit above was that the SQL account being used to browse products also had the necessary rights to read from the Customers table and write to the Products table, neither of which was required for the purposes of displaying products on a page. In short, the principle of least privilege had been ignored:

In information security, computer science, and other fields, the principle of least privilege, also known as the principle of minimal privilege or just least privilege, requires that in a particular abstraction layer of a computing environment, every module (such as a process, a user or a program on the basis of the layer we are considering) must be able to access only such information and resources that are necessary to its legitimate purpose.

This was achievable because we took the easy way out and used a single account across the entire application to both read and write from the database. Often you’ll see this happen with the one SQL account being granted db_datareader and db_datawriter roles:

image

This is a good case for being a little more selective about the accounts we’re using and the rights they have. Quite frequently, a single SQL account is used by the application. The problem this introduces is that the one account must have access to perform all the functions of the application which most likely includes reading and writing data from and to tables you simply don’t want everyone accessing.

Let’s go back to the first example but this time we’ll create a new user with only select permissions to the Products table. We’ll call this user NorthwindPublicUser and it will be used by activities intended for the general public, i.e. not administrative activates such as managing customers or maintaining products.

image

Now let’s go back to the earlier request attempting to validate the existence of the Customers table:

Products.aspx?CategoryID=1 or 1=(select count(*) from customers)

image

In this case I’ve left custom errors off and allowed the internal error message to surface through the UI for the purposes of illustration. Of course doing this in a production environment is never a good thing not only because it’s information leakage but because the original objective of verifying the existence of the table has still been achieved. Once custom errors are on there’ll be no external error message hence there will be no verification the table exists. Finally – and most importantly – once we get to actually trying to read or write unauthorised data the exploit will not be successful.

This approach does come with a cost though. Firstly, you want to be pragmatic in the definition of how many logons are created. Ending up with 20 different accounts for performing different functions is going to drive the DBA nuts and be unwieldy to manage. Secondly, consider the impact on connection pooling. Different logons mean different connection strings which mean different connection pools.

On balance, a pragmatic selection of user accounts to align to different levels of access is a good approach to the principle of least privilege and shuts the door on the sort of exploit demonstrated above.

Getting more creative with HTTP request headers

On a couple of occasions above I’ve mentioned parsing input other than just the obvious stuff like query strings and form fields. You need to consider absolutely anything which could be submitted to the server from an untrusted source.

A good example of the sort of implicit untrusted data submission you need to consider is the accept-language attribute in the HTTP request headers. This is used to specify the spoken language preference of the user and is passed along with every request the browser makes. Here’s how the headers look after inspecting them with Fiddler:

image

Note the preference Firefox has delivered in this case is “en-gb”. The developer can now access this attribute in code:

var language = HttpContext.Current.Request.UserLanguages[0];
lblLanguage.Text = "The browser language is: " + language;

And the result:

image

The language is often used to localise content on the page for applications with multilingual capabilities. The variable we’ve assigned above may be passed to SQL Server – possibly in a concatenated SQL string – should language variations be stored in the data layer.

But what if a malicious request header was passed? What if, for example, we used the Fiddler Request Builder to reissue the request but manipulated the header ever so slightly first:

image

It’s a small but critical change with a potentially serious result:

image

We’ve looked enough at where an exploit can go from here already, the main purpose of this section was to illustrate how injection can take different attack vectors in its path to successful execution. In reality, .NET has far more efficient ways of doing language localisation but this just goes to prove that vulnerabilities can be exposed through more obscure channels.

Summary

The potential damage from injection exploits is indeed, severe. Data disclosure, data loss, database object destruction and potentially limitless damage to reputation.

The thing is though, injection is a really easy vulnerability to apply some pretty thorough defences against. Fortunately it’s uncommon to see dynamic, parameterless SQL strings constructed in .NET code these days. ORMs like LINQ to SQL are very attractive from a productivity perspective and the security advantages that come with it are eradicating some of those bad old practices.

Input parsing, however, remains a bit more elusive. Often developers are relying on type conversion failures to detect rogue values which, of course, won’t do much good if the expected type is already a string and contains an injection payload. We’re going to come back to input parsing again in the next part of the series on XSS. For now, let’s just say that not parsing input has potential ramifications well beyond just injection vulnerabilities.

I suspect securing individual database objects to different accounts is not happening very frequently at all. The thing is though, it’s the only defence you have at the actual data layer if you’ve moved away from stored procedures. Applying the least privilege principle here means that in conjunction with the other measures, you’ve now erected injection defences on the input, the SQL statement construction and finally at the point of its execution. Ticking all these boxes is a very good place to be indeed.

References

  1. SQL Injection Attacks by Example
  2. SQL Injection Cheat Sheet
  3. The Curse and Blessings of Dynamic SQL
  4. LDAP Injection Vulnerabilities

from:http://www.troyhunt.com/2010/05/owasp-top-10-for-net-developers-part-1.html