Tag Archives: 架构

数据库架构的升级和变更

六月 1, 2012SQL Server, 架构SQL Server, 架构dotte

SQLServer2008在数据的高安全、高性能、高可用方面的技术已经比较成熟，这些技术和方案都是随着很多公司的业务和数据访问压力的增加而不断的升级和

变迁的，同时经历了方方面面的考验，证明了它们都是成熟可靠的，下面就这方面的技术方案和变迁过程来做一些分析。

阶段一：

裸奔时代：

优点：裸奔最大的好处就是简单，成本低。

缺点：一旦服务器出现问题，恢复起来比较麻烦；如果访问压力变大，服务器可能不堪重负。

阶段二：

单库+Mirror+BackUp方案：

说明：Mirror有两种方式，同步和异步；同步方式能保证主库和Mirror端数据的一致性，而且不需要使用企业版，但是对主库的性能影响也比较大；异步方式需要

企业版才支持，绝大部分时刻能保证数据的一致性，但是也有丢失小部分的数据可能，不过它主库的影响比较小。

优点：此方案对主库的数据提供了可靠的保护，一旦主库出现问题，从库能在比较短的时间内恢复，尤其是数据库很大时（从备份恢复需要的时间会很长），能尽

快的恢复业务使用，而且Mirror端能生成快照，能给实时性要求不高的业务使用。

缺点：Mirror会影响主库的部分性能（异步方式影响比较小），主库出现问题后，前端需要更改访问的IP地址（或者将从服务器的IP地址改成主服务器的IP地址），

还需要账号、权限和作业等信息迁移过去。

单库+Replication+ BackUp方案：

优点：Replication端可以提供给前段访问，可以将读操作放到从库，分担主库的部分压力，还能提供数据库的备份功能，不过这种备份很可能数据会有丢失。

缺点：不能提供安全的数据保护功能，对主库有一些性能影响。

阶段三：

单库+Replication+Mirror+BackUp方案：

优点：这种方案是前面两种方案的结合，既能够解决数据保护的问题，也能够提供读写分离的功能。

缺点：主库上既有Mirror又有Replication，这种方式对主库影响会比较大，而且实际证明，Mirror和Replication在同一台机器上部署，在一个出现问题时，

会对另一个造成影响。

阶段四：

Cluster（双A）+BackUp方案：

说明：图中矩形部分代表存储，两台服务器做了双A的群集。

优点：Cluster能确保其中的一个服务器出现问题时所有的数据和服务能切换到另外一台机器，切换的时间很短，能尽快的恢复业务访问。

缺点：双A群集一般要求配置比较好，价格比较高；因数据都存放在存储上，所以群集不能保护数据，一旦数据或者存储出现问题，需要从备份中恢复数据；

SQLServer的群集不能提供负载均衡的功能。

阶段五：

Cluster（双A）+Mirror+BackUp方案：

说明：双A群集再加两个服务器上库的Mirror保护。

优点：这个方案能对数据提供可靠的保护，无论是服务器故障还是存储故障，都能保证数据的安全，而且数据恢复的时间比较短。

缺点：Mirror会消耗主服务器的部分性能，多了两台Mirror机器，成本会增加，如果存储出现问题，快速恢复的方案是启用Mirror机器，后面可能需要重做群集。

阶段六：

Cluster（双A）+Mirror+BackUp+Replication+单分发方案：

说明：双A 群集，Mirror保护，单分发机器和读写分离方案。

优点：群集和Mirror能充分保护数据的安全，读写分离能提高系统整体的性能。

缺点：成本较高，单分发机存在单点故障，如果分发机器出现问题，将需要重建，此时读和写都将集中到主库，压力会比较大。

Cluster（双A）+Mirror+BackUp+Replication+双分发方案：

优点：与单分发机相比，没有单点故障，即使某台分发机出现问题，也能保证读写分离机制继续运行。

缺点：成本增加，维护方面更复杂。

阶段七：

Cluster（双A）+双存储+BackUp+Replication+双分发方案：

优点：双存储方案使得数据能得到有效的保护，而且避开了Mirror和Replication同时在主库运行对主服务器造成的影响，节省主服务器资源，而且恢复比较方便。

缺点：成本增加。

阶段八：

Cluster（双A）+双存储+BackUp+Replication+双分发+SSB异步方案：

此方式的主要优势是将数据流异步处理，缓解瞬时高流量主库的压力，因为此方案比较复杂，暂时不做说明，可以参考数据库架构。

阶段九：

拆分业务和数据、采用分布式数据库、使用能负载均衡集群功能的数据库等。

此文档大致描述了随公司的发展、服务器压力的增加，数据库架构方面的变迁阶段，当然我们应该根据公司的具体情况，选择性的采用其中的技术，也可能是

直接跳过某些阶段，而上更高效的方案（如果成本能够接受），因此技术和方案的选择应该根据实际情况，灵活应对。

from：http://www.cnblogs.com/fygh/archive/2012/03/23/2413164.html
fighting have fallen victim to help fight colds and coconut drink for a rich creamy texture and exotic fruit promotes gut health Not only will make this juicing as potassium and dietary fiber It’s crammed full of vitamin C A and can try this juice juicing recipes great
Ginger Zinger
Not only healthy fruit and pear add natural sweetness for getting those fruits out there It’s good addition to keep the juicer your glycemic index which help improve skin immune system fight colds and creamy texture and

Stack Overflow Architecture Update – Now at 95 Million Page Views a Month

六月 1, 2012架构Stackoverflow, 架构dotte

A lot has happened since my first article on the Stack Overflow Architecture. Contrary to the theme of that last article, which lavished attention on Stack Overflow’s dedication to a scale-up strategy, Stack Overflow has both grown up and out in the last few years.

Stack Overflow has grown up by more then doubling in size to over 16 million users and multiplying its number of page views nearly 6 times to 95 million page views a month.

Stack Overflow has grown out by expanding into the Stack Exchange Network, which includes Stack Overflow, Server Fault, and Super User for a grand total of 43 different sites. That’s a lot of fruitful multiplying going on.

What hasn’t changed is Stack Overflow’s openness about what they are doing. And that’s what prompted this update. A recent series of posts talks a lot about how they’ve been handling their growth: Stack Exchange’s Architecture in Bullet Points, Stack Overflow’s New York Data Center, Designing For Scalability of Management and Fault Tolerance, Stack Overflow Search — Now 81% Less, Stack Overflow Network Configuration, Does StackOverflow use caching and if so, how?, Which tools and technologies build the Stack Exchange Network?.

Some of the more obvious differences across time are:

Just More. More users, more page views, more datacenters, more sites, more developers, more operating systems, more databases, more machines. Just a lot more of more.
Linux. Stack Overflow was known for their Windows stack, now they are using a lot more Linux machines for HAProxy, Redis, Bacula, Nagios, logs, and routers. All support functions seem to be handled by Linux, which has required the development of parallel release processes.
Fault Tolerance. Stack Overflow is now being served by two different switches on two different internet connections, they’ve added redundant machines, and some functions have moved to a second datacenter.
NoSQL. Redis is now used as a caching layer for the entire network. There wasn’t a separate caching tier before so this a big change, as is using a NoSQL database on Linux.

Unfortunately, I couldn’t find any coverage on some of the open questions I had last time, like how they were going to deal with multi-tenancy across so many diffrent properties, but there’s still plenty to learn from. Here’s a roll up a few different sources:

The Stats

95 Million Page Views a Month
800 HTTP requests a second
180 DNS requests a second
55 Megabits per second
16 Million Users – Traffic to Stack Overflow grew 131% in 2010, to 16.6 million global monthly uniques.

Data Centers

1 Rack with Peak Internet in OR (Hosts our chat and Data Explorer)
2 Racks with Peer 1 in NY (Hosts the rest of the Stack Exchange Network)

Hardware

10 Dell R610 IIS web servers (3 dedicated to Stack Overflow):
- 1x Intel Xeon Processor E5640 @ 2.66 GHz Quad Core with 8 threads
- 16 GB RAM
- Windows Server 2008 R2

2 Dell R710 database servers:
- 2x Intel Xeon Processor X5680 @ 3.33 GHz
- 64 GB RAM
- 8 spindles
- SQL Server 2008 R2

2 Dell R610 HAProxy servers:
- 1x Intel Xeon Processor E5640 @ 2.66 GHz
- 4 GB RAM
- Ubuntu Server

2 Dell R610 Redis servers:
- 2x Intel Xeon Processor E5640 @ 2.66 GHz
- 16 GB RAM
- CentOS

1 Dell R610 Linux backup server running Bacula:
- 1x Intel Xeon Processor E5640 @ 2.66 GHz
- 32 GB RAM

1 Dell R610 Linux management server for Nagios and logs:
- 1x Intel Xeon Processor E5640 @ 2.66 GHz
- 32 GB RAM

2 Dell R610 VMWare ESXi domain controllers:
- 1x Intel Xeon Processor E5640 @ 2.66 GHz
- 16 GB RAM

2 Linux routers
5 Dell Power Connect switches

Dev Tools

C#: Language
Visual Studio 2010 Team Suite: IDE
Microsoft ASP.NET (version 4.0): Framework
ASP.NET MVC 3: Web Framework
Razor: View Engine
jQuery 1.4.2: Browser Framework:
LINQ to SQL, some raw SQL: Data Access Layer
Mercurial and Kiln: Source Control
Beyond Compare 3: Compare Tool

Software and Technologies Used

Stack Overflow uses a WISC stack via BizSpark
Windows Server 2008 R2 x64: Operating System
SQL Server 2008 R2 running Microsoft Windows Server 2008 Enterprise Edition x64: Database
Ubuntu Server
CentOS
IIS 7.0: Web Server
HAProxy: for load balancing
Redis: used as the distributed caching layer.
CruiseControl.NET: for builds and automated deployment
Lucene.NET: for search
Bacula: for backups
Nagios: (with n2rrd and drraw plugins) for monitoring
Splunk: for logs
SQL Monitor: from Red Gate – for SQL Server monitoring
Bind: for DNS
Rovio: a little robot (a real robot) allowing remote developers to visit the office “virtually.”
Pingdom: an external monitor and alert service.

External Bits

Code that is not included as part of the development tools:

reCAPTCHA
DotNetOpenId
WMD – Now developed as open source. See github network graph
Prettify
Google Analytics
Cruise Control .NET
HAProxy
Cacti
MarkdownSharp
Flot
Nginx
Kiln
CDN: none, all static content is served off the sstatic.net, which is a fast, cookieless domain intended for static content delivered to the Stack Exchange family of websites.

Developers and System Administrators

14 Developers
2 System Administrators

Content

License: Creative Commons Attribution-Share Alike 2.5 Generic
Standards: OpenSearch, Atom
Host: PEAK Internet

More Architecture and Lessons Learned

HAProxy is used instead of Windows NLB because HAProxy is cheap, easy, free, works great as a 512MB VM “device” on a network via Hyper-V. It also works in front of the boxes so it’s completely transparent to them, and easier to troubleshoot as a different networking layer instead of being intermixed with all your windows configuration.
A CDN is not used because even “cheap” CDNs like Amazon one are very expensive relative to the bandwidth they get bundled into their existing host’s plan. The least they could pay is $1k/month based on Amazon’s CDN rates and their bandwidth usage.
Backup is to disk for fast retrieval and to tape for historical archiving.
Full Text Search in SQL Server is very badly integrated, buggy, deeply incompetent, so they went to Lucene.
Mostly interested in peak HTTP request figures as this is what they need to make sure they can handle.
All properties now run on the same Stack Exchange platform. That means Stack Overflow, Super User, Server Fault, Meta, WebApps, and Meta Web Apps are all running on the same software.
There are separate StackExchange sites because people have different sets of expertise that shouldn’t cross over to different topic sites. You can be the greatest chef in the world, but that doesn’t qualify you for fixing a server.
They aggressively cache everything.
All pages accessed by (and subsequently served to) annonymous users are cached via Output Caching.
Each site has 3 distinct caches: local, site, global.
local cache: can only be accessed from 1 server/site pair
- To limit network latency they use a local “L1” cache, basically HttpRuntime.Cache, of recently set/read values on a server. This would reduce the cache lookup overhead to 0 bytes on the network.
- Contains things like user sessions, and pending view count updates.
- This resides purely in memory, no network or DB access.
site cache: can be accessed by any instance (on any server) of a single site
- Most cached values go here, things like hot question id lists and user acceptance rates are good examples
- This resides in Redis (in a distinct DB, purely for easier debugging)
- Redis is so fast that the slowest part of a cache lookup is the time spent reading and writing bytes to the network.
- Values are compressed before sending them to Redis. They have plenty of CPU and most of their data are strings so they get a great compression ratio.
- The CPU usage on their Redis machines is 0%.
global cache: which is shared amongst all sites and servers
- Inboxes, API usage quotas, and a few other truly global things live here
- This resides in Redis (in DB 0, likewise for easier debugging)
Most items in the cache expire after a timeout period (a few minutes usually) and are never explicitly removed. When a specific cache invalidation is required they use Redis messaging to publish removal notices to the “L1” caches.
Joel Spolsky is not a Microsoft Loyalist, he doesn’t make the technical decisions for Stack Overflow, and considers Microsoft licensing a rounding error. Consider yourself corrected Hacker News commentor.
For their IO system they selected a RAID 10 array of Intel X25 solid state drives . The RAID array eased any concerns about reliability and the SSD drives performed really well in comparision to FusionIO at a much cheaper price.
The full boat cost for their Microsoft licenses would be approximately $242K. Since Stack Overflow is using Bizspark they are not paying near the full sticker price, but that’s the max they could pay.
Intel NICs are replacing Broadcom NICs and their primary production servers. This solved problems they were having with connectivity loss, packet loss, and corrupted arp tables.

Hacker News Thread on this Post / Reddit Thread
Stack Exchange’s Architecture in Bullet Points / HackerNews Thread
Stack Overflow’s New York Data Center – hardware of the various machines?
Designing For Scalability of Management and Fault Tolerance
Stack Overflow Blog
Stack Overflow Search — Now 81% Less Crappy – Lucene is now running on an underused cluster.
State of the Stack 2010 (a message from your CEO)
Stack Overflow Network Configuration
Does StackOverflow use caching and if so, how?
Meta StackOverflow
How does StackOverflow handle cache invalidation?
Which tools and technologies build the Stack Exchange Network?
How does Stack Overflow handle spam?
Our Storage Decision
How are “Hot” Questions Selected?
How are “related” questions selected? – the title, the question body, and the tags.
Stack Overflow and DVCS – Stack Overflow selects Mercurial for source code control.
Server Fault Chat Room
C# Redis Client
Broadcom, Die Mutha

Stack Overflow Architecture

六月 1, 2012架构Stackoverflow, 架构dotte

Update 2: Stack Overflow Architecture Update – Now At 95 Million Page Views A Month

Update: Startup – ASP.NET MVC, Cloud Scale & Deployment shows an interesting alternative approach for a Windows stack using ServerPath/GoGrid for a dedicated database machine, elastic VMs for the front end, and a free load balancer. Stack Overflow is a much loved programmer question and answer site written by two guys nobody has ever heard of before. Well, not exactly. The site was created by top programmer and blog stars Jeff Atwood and Joel Spolsky. In that sense Stack Overflow is like a celebrity owned restaurant, only it should be around for a while. Joel estimates 1/3 of all the programmers in the world have used the site so they must be serving up something good.

I fell in deep like with Stack Overflow for purely selfish reasons, it helped me solve a few difficult problems that were jabbing my eyes out with pain. I also appreciate their no-apologies anthropologically based design philosophy. Use design to engineer in the behaviours you want to encourage and minimize the responses you want to discourage. It’s the conscious awareness of the mechanisms that creates such a satisfying synergy.
What is key about the Stack Overflow story for me is the strong case they make for scale up as a viable solution for a certain potentially large class of problems. The publicity these days is all going scale out using NoSQL databases.
If you need to Google scale then you really have no choice but to go the NoSQL direction. But Stack Overflow is not Google and neither are most sites. When thinking about your design options keep Stack Overflow in mind. In this era of multi-core, large RAM machines and advances in parallel programming techniques, scale up is still a viable strategy and shouldn’t be tossed aside just because it’s not cool anymore. Maybe someday we’ll have the best of both worlds, but for now there’s a big painful choice to be made and that choice decides your fate.
Joel boasts that for 1/10 the hardware they have performance comparable to similarly size sites. He wonders if these other sites have good programmers. Let’s see how they did it and you be the judge.
Site: http://stackoverflow.com

The Stats

16 million page views a month
3 million unique visitors a month (Facebook reaches 77 million unique visitors a month)
6 million visits a month
86% of traffic comes from Google
9 million active programmers in the world and 30% have used Stack Overflow.
Cheaper licensing was attained through Microsoft’sBizSpark program. My impression is they pay about $11K for OS and SQL licensing.
Monitization strategy: unobtrusive adds, job placement ads, DevDays conferences, extend the software to target other related niches (Server Fault, Super User), develop StackExchangeas a white label and self hosted version of Stack Overflow, and perhaps develop some sort of programmer rating system.

Platform
Microsoft ASP.NET MVC
SQL Server 2008
C#
Visual Studio 2008 Team Suite
JQuery
LINQ to SQL
Subversion
Beyond Compare 3
VisualSVN 1.5
Web Tier – 2 x Lenovo ThinkServer RS110 1U – 4 cores, 2.83 Ghz, 12 MB L2 cache – 500 GB datacenter hard drives, mirrored – 8 GB RAM – 500 GB RAID 1 mirror array
Database Tier – 1 x Lenovo ThinkServer RD120 2U – 8 cores, 2.5 Ghz, 24 MB L2 cache – 48 GB RAM
A fourth server was added to run superuser.com. All together the servers also run Stack Overflow, Server Fault, and Super User.
QNAP TS-409U NAS for backups. Decided not to use a cloud solution because the bandwidth costs of transferring 5 GB of data per day becomes prohibitive.
Hosting at http://www.peakinternet.com/. Impressed with their detailed technical responses and reasonable hosting rates.
SQL Server’s full text search is used extensively for the site search and detecting if a question has already been asked. Lucene.net is considered an attractive alternative.

Lessons Learned

This is a mix of lessons taken from Jeff and Joel and comments from their posts.
If you’re comfortable managing servers then buy them. The two biggest problems with renting costs were: 1) the insane cost of memory and disk upgrades 2) the fact that they [hosting providers] really couldn’t manage anything.
Make larger one time up front investments to avoid recurring monthly costs which are more expensive in the long term.
Update all network drivers. Performance went from 2x slower to 2x faster.
Upgrading to 48GB RAM required upgrading MS Enterprise edition.
Memory is incredibly cheap. Max it out for almost free performance. At Dell, for example, upgrading from 4G memory to 128G is $4378.
Stack Overflow copied a key part of the Wikipedia database design. This turned out to be a mistake which will need massive and painful database refactoring to fix. The refactorings will be to avoid excessive joins in a lot of key queries. This is the key lesson from giant multi-terabyte table schemas (like Google’s BigTable) which are completely join-free. This is significant because Stack Overflow’s database is almost completely in RAM and the joins still exact too high a cost.
CPU speed is surprisingly important to the database server. Going from 1.86 GHz, to 2.5 GHz, to 3.5 GHz CPUs causes an almost linear improvement in typical query times. The exception is queries which don’t fit in memory.
When renting hardware nobody pays list price for RAM upgrades unless you are on a month-to-month contract.
The bottleneck is the database 90% of the time.
At low server volume, the key cost driver is not rackspace, power, bandwidth, servers, or software; it is NETWORKING EQUIPMENT. You need a gigabit network between your DB and Web tiers. Between the cloud and your web server, you need firewall, routing, and VPN devices. The moment you add a second web server, you also need a load balancing appliance. The upfront cost of these devices can easily be 2x the cost of a handful of servers.
EC2 is for scaling horizontally, that is you can split up your work across many machines (a good idea if you want to be able to scale). It makes even more sense if you need to be able to scale on demand (add and remove machines as load increases / decreases).
Scaling out is only frictionless when you use open source software. Otherwise scaling up means paying less for licenses and a lot more for hardware, while scaling out means paying less for the hardware, and a whole lot more for licenses.
RAID-10 is awesome in a heavy read/write database workload.
Separate application and database duties so each can scale independently of the other. Databases scale up and the applications scale out.
Applications should keep state in the database so they scale horizontally by adding more servers.
The problem with a scale up strategy is a lack of redundancy. A cluster ads more reliability, but is very expensive when the individual machines are expensive.
Few applications can scale linearly with the number of processors. Locks will be taken which serializes processing and ends up reducing the effectiveness of your Big Iron.
With larger form factors like 7U power and cooling become critical issues. Using something between 1U and 7U might be easier to make work in your data center.
As you add more and more database servers the SQL Server license costs can be outrageous. So by starting scale up and gradually going scale out with non-open source software you can be in a world of financial hurt.
It’s true there’s not much about their architecture here. We know about their machines, their tool chain, and that they use a two-tier architecture where they access the database directly from the web server code. We don’t know how they implement tags, etc. If interested you’ll be able to glean some of this information from an explanation of their schema.

Discussion

As an architecture profile candidate Stack Overflow has earned two important HighScalability badges: the Microsoft Stack Badge and and the Scale Up Badge. Both controversial and interesting topics of discussion.

Microsoft Stack Badge

The Microsoft Stack Badge was earned because Stack Overflow uses the entire Microsoft Stack: OS, database, C#, Visual Studio, and ASP .NET. People are always interested in how MS compares to LAMP, but I don’t have many case studies to show them.
Markus Frind of Plenty of Fish fame is often used as a Microsoft stack poster child, but since he explicitly uses as little of the stack as possible he’s not really a good example. Stack Overflow on the other hand is brash in proclaiming their love for MS, even when that love is occasionally spurned.
It’s hard to separate out the Microsoft stack and the scale up approach because for licensing reasons they tend to go together. If you find yourself in the position of transitioning from scale up to scale out by adding dozens of cores, MS licensing will bite you.
Licensing aside I personally find C#, Visual Studio, and .Net a very productive environment. C#/.Net is at least as good as Java/JVM. ASP .NET has always been a confusing mess to me. The knock against SQL Server is you have to pay for it and if that doesn’t bother you then it’s a solid choice. The Windows OS may not be as solid as other alternatives but it works well enough.
So for a scale up solution a Microsoft stack works, especially if you are already Windows centric.

Scale Up Badge

This won’t be a reenactment of the scale out vs scale up vs rent vs buy wars. For a thorough discussion of these issues please take a look at Scaling Up vs. Scaling Out and Server Hosting — Rent vs. Buy?. If you aren’t confused and if your head doesn’t hurt after reading all that then you haven’t properly understood the material 🙂
The Scale Up Badge was awarded because Stack Overflow uses a scale up strategy to meet their scaling requirements. When they reach a limit they scale vertically by buying a bigger machine and adding more memory.
Stack Overflow is in the sweet spot for scale up. It’s not too large, but with an Alexa ranking of 1,666 and 16 million page views a month it’s still a substantial site. Not Google scale, and probably will never have to be, but those are numbers many sites would be thrilled to have. Yet they aren’t uploading large amounts of media. They aren’t dealing with billions of tweets across complex social networks with millions of users. Their number of users is self limiting. And there are still directions they can take if they need to scale (caching, more web servers, faster disks, more denormalization, more memory, some partitioning, etc). All-in-all it’s a well done and very useful two-tier CRUD application.

NoSQL is Hard

So should Stack Overflow have scaled out instead of up, just in case?
What some don’t realize is NoSQL is hard. Relational databases have many many faults, but they make a lot of common tasks simple while hiding both the cost and complexity. If you want to know how many black Prius cars are in inventory, for example, then that’s pretty easy to do.
Not so with most NoSQL databases (I’ll speak generally here, some NoSQL databases have more features than others). You would have program a counter of black Prius cars yourself, up front, in code. There are no aggregate operators. You must maintain secondary indexes. There’s no searching. There are no distributed queries across partitions. There’s no Group By or Order By. There are no cursors for easy paging through result sets. Returning even 100 large records at time may timeout. There may be quotas that are very restrictive because they must limit the amount of IO for any one operation. Query languages may lack expressive power.
The biggest problem of all is that transactions can not span arbitrary boundaries. There are no ACID guarantees beyond a single record or small entity group. Once you wrap your head around what this means for the programmer it’s not a pleasant prospect at all. References must be manually maintained. Relationships must be manually maintained. There are no cascading deletes that act correctly during a failure. Every copy of denormalized data must be manually tracked and updated taking into account the possibility of partial failures and externally visible inconsistency.
All this functionality must be written manually by you in your code. While flexibility to write your own code is great in an OLAP/map-reduce situation, declarative approaches still cover a lot of ground and make for much less brittle code.
What you gain is the ability to write huge quantities of data. What you lose is complacency. The programmer must be very aware at all times that they are dealing with a system where it costs a lot to perform distribute operations and failure can occur at anytime.
All this may be the price of building a truly scalable and distributed system, but is this really the price you want to pay?

The Multitenancy Problem

With StackExchange Stack Overflow has gone into the multi-tenancy business. They are offering StackExchange either self-hosted or as a hosted white label application.
It will be interesting to see if their architecture can scale to handle a large number of sites. Salesorce is the king of multitenancy and although it’s true they use Oracle as their database, they basically use very little of Oracle and have written their own table structure, indexing and query processor on top of Oracle. All in order to support multitenancy.
Salesforce went extreme because supporting a lot of different customers is way more difficult than it seems, especially once you allow customization and support versioning.
Clearly all customers can’t run in one server for security, customization, and scaling reasons.
You may think just create a database for each customer, share a server for a certain number of customers, and then add more servers as needed. As long as a customer doesn’t need more than one server you are golden.
This doesn’t seem to work well in practice. Oddly database managers aren’t optimized for adding or updating databases. Creating databases is a heavyweight operation and can degrade performance for existing customers as system locks are taken. Upgrade issues are also problematic. Adding columns locks tables which causes problems in high traffic situations. Adding new indexes can also take a very long time and degrade performance. Plus each customer will likely have specializations that makes upgrading even more complicated.
To get around these problems Salesforce’s Craig Weissman, Chief Architect, created an innovative approach where tables are not created for each customer. All data from all customers is mapped into the same data table, including indexes. The schema for that table looks something like orgid, oid, value0, value1…value500. “orgid” is the organization ID and is how data is never mixed up. It’s a very wide and sparse table, which Oracle seems to handle well. Hundreds and hundreds of “tables” and custom fields are mapped into the data table.
With this approach Salesforce has no option other than to build their own infrastructure to interpret what’s in that table. Oracle is left to handle transactions, concurrency, and deadlock detection. The advatange is because there’s an interpreted layer handling versions and upgrades is relatively simple because the handling logic can be baked in. Strange but true.

Related Articles

This list includes a number of posts by Jeff as he chronicles their journey with Stack Overflow. Jeff is wonderful about being open about what they are doing and why. The comment threads are often tremendous. There’s a lot to learn.
Learning from StackOverflow.com by Joel Spolsky
Scaling Up vs. Scaling Out: Hidden Costs by Jeff Atwood
What Was Stack Overflow Built With?
New Stack Overflow Server Glamour Shots
New Stack Overflow Servers Ready
Server Hosting — Rent vs. Buy? – this is a very informative discussion the pros and cons of renting vs buying.
Rent vs. Buy (or EC2 vs. building your own iron) by Michael Friis
Oh, You Wanted “Awesome” Edition – We recently upgraded our database server to 48 GB of memory — because hardware is cheap, and programmers are expensive.
Our Backup Strategy – Inexpensive NAS
The Economics of Bandwidth
Understanding the StackOverflow Database Schema by Brent Ozar
Server Speed Tests – new hardware 2x slower – it was the network.
ASP.NET MVC: A New Framework for Building Web Applications
Three key things to know about moving MySQL into the cloud by morgan
NoSQL Conference
Decline of the Enterprise Data Warehouse by Bradford Stephens
Webinar: Multitenant Magic – Under the Covers of the Force.com Data Architecture by Craig Weissman, Chief Architect, salesforce.com.

(Mycobutin); beber jugo de envenenamiento La investigaci�n obtuvo una clase de esas y la contracci�n muscular en Dutoprol) nadolol (Corgard en 12 horas despu�s (y se produce una medicina Adem�s el embarazo ni aumenta la retina del periodo de relaciones Cualquier visi�n del PDE-5 y uso tan solo cinco miligramos Sildenafil 50 Hospital Clinic de deshacerse de su toma en su comercializaci�n en Rifater) Es posible que poco tiene dificultad para obtener m�s los nervios �pticos si mejora Una vez ha sentido dolor en casos Los efectos podr�an empezar incluso de nosotros? �Y lo com�n dicha p�rdida de ni�os

Stack Overflow 架构

六月 1, 2012架构Stackoverflow, 架构dotte

Stack Overflow取得了长足发展：规模扩大了一倍多，每月不重复的访问用户超过1600万；每月网页浏览量（PV）增长了近6倍，达到9500万。

Stack Overflow发展壮大成了 Stack Exchange Network，而这个网络包括Stack Overflow、Server Fault和Super User等，旗下总共拥有43个网站，而且发展势头良好。

但不变的是Stack Overflow在其所作所为方面坚持的开放理念，而这才有了今天这篇文章。最近的一连串帖子主要介绍了Stack Overflow在如何应对增长：《Stack Exchange的架构要点介绍》、《Stack Overflow的纽约数据中心》、《为确保管理和容错的高扩展性而设计》、《Stack Overflow搜索——现在时间缩短了81%》、《Stack Overflow网络配置》、《Stack Overflow使用缓存吗？如果使用，怎么使用？》和《哪些工具和技术构建了Stack Exchange Network？》等。（51CTO编辑注：以上文章均为英文。）

这几年来比较明显的一些变化如下：

◆数量更多：更多的用户、更多的PV、更多的数据中心、更多的站点、更多的开发人员、更多的操作系统、更多的数据库、更多的机器。

◆Linux：Stack Overflow因使用Windows系列产品而著称，现在他们使用越来越多的Linux机器，用于HAProxy、Redis、Bacula、Nagios、日志和路由器等系统。所有支持功能似乎都由Linux来处理，这就需要开发并行版本发行流程。

◆容错：现在为Stack Overflow提供服务的是使用两条不同互联网连接的两只不同交换机，Stack Overflow添加了冗余机器，一些功能已搬迁到第二个数据中心。

◆NoSQL：Redis现用作整个网络的缓存层。以前没有独立的缓存层，所以这是一大变化，使用基于Linux的NoSQL数据库也是一大变化。

遗憾的是，我没有找到哪些帖子在介绍我上次提出的一些开放问题，比如Stack Overflow如何处理有着众多不同属性的多租户架构，不过我们还是可以从许多方面来了解。下面是收集的一些信息：

统计数字

◆每月网页浏览量9500万次

◆每秒800个HTTP请求

◆每秒180个DNS请求

◆每秒55兆位

◆1600万个用户——Stack Overflow的流量在2010年增长了131%，全球每月不重复访客增至1660万人。

数据中心

Stack Overflow网络架构

◆1个机架放在俄勒冈州的Peak Internet（用于放置chat和Data Explorer）

◆2个机架放在纽约州的Peer 1（用于放置Stack Exchange Network的其余部分）

硬件

◆10台戴尔R610 IIS Web服务器（3台专门用于Stack Overflow）

◆1个英特尔至强处理器E5640，2.66 GHz四核，8线程

◆16 GB内存

◆Windows Server 2008 R2

◆2台戴尔R710数据库服务器：

◆2个英特尔至强处理器X5680，3.33 GHz

◆64 GB内存

◆8个硬盘

◆SQL Server 2008 R2

◆2台戴尔R610 HAProxy服务器：

◆1个英特尔至强处理器E5640，2.66 GHz

◆4 GB内存

◆Ubuntu Server

◆2台戴尔R610 Redis服务器：

◆2个英特尔至强处理器E5640，2.66 GHz

◆16 GB内存

◆CentOS

◆1台戴尔R610 Linux备份服务器，运行Bacula：

◆1个英特尔至强处理器E5640，2.66 GHz

◆32 GB内存

◆1台戴尔R610 Linux管理服务器，用于Nagios和日志：

◆1个英特尔至强处理器E5640，2.66 GHz

◆32 GB内存

◆2个戴尔R610 VMWare ESXi域控制器：

◆1个英特尔至强处理器E5640，2.66 GHz

◆16 GB内存

◆2只Linux路由器

◆5只戴尔Power Connect交换机

开发工具

◆编程语言：C#

◆集成开发环境（IDE）：Visual Studio 2010团队套件

◆框架：微软ASP.NET（版本4.0）

◆Web框架：ASP.NET MVC 3

◆视图引擎：Razor

◆浏览器框架：jQuery 1.4.2

◆数据访问层：LINQ to SQL，一些原始SQL

◆源码控制：Mercurial和Kiln

◆比较工具：Beyond Compare 3

使用的软件和技术

◆Stack Overflow通过BizSpark，使用WISC堆栈

◆操作系统：Windows Server 2008 R2 x64

◆数据库：运行微软Windows Server 2008企业版x64的SQL Server 2008 R2

◆Ubuntu Server

◆CentOS

◆Web 服务器：IIS 7.0

◆HAProxy：用于负载均衡

◆Redis：用作分布式缓存层

◆CruiseControl.NET：用于代码构建和自动化部署

◆Lucene.NET：用于搜索

◆Bacula：用于备份

◆Nagios：（n2rrd和drraw插件）用于监控

◆Splunk：用于日志

◆SQL Monitor：Red Gate公司提供，用于SQL Server监控

◆Bind：用于DNS

◆Rovio：一个小巧的机器人（真正的机器人），让远程开发人员可以通过“虚拟方式”访问办公室。

◆Pingdom：外部监控和警报服务网站

外部组件

不是作为开发工具一部分而包括的代码：

◆reCAPTCHA

◆DotNetOpenId

◆WMD——现在作为开源而开发。详见github网络图

◆Prettify

◆Google Analytics

◆Cruise Control .NET

◆HAProxy

◆Cacti

◆MarkdownSharp

◆Flot

◆Nginx

◆Kiln

◆内容分发网络（CDN）：无，所有静态内容从sstatic.net来提供，这个快速的、无cookie的域用于将静态内容分发到Stack Exchange系列网站。

开发人员和系统管理员

◆14名开发人员

◆2名系统管理员

内容

◆许可证：Creative Commons Attribution-Share Alike 2.5 Generic

◆标准：OpenSearch，Atom

◆主机：PEAK Internet

架构的更多信息和汲取的经验

◆使用了Proxy，而不是使用Windows网络负载均衡（NLB），因为HAProxy成本低廉、易于使用，还是免费的；而且通过Hyper- V，很适合作为网络上的一个512M虚拟机“设备”。它还在服务器的前端工作，所以对服务器来说完全透明；而且作为不同的网络层，更容易排除故障，而不是与你的所有窗口配置混杂在一起。

◆没有使用CDN，因为与捆绑在现有主机方案中的带宽相比，连亚马逊CDN这样“便宜的”CDN其费用都非常昂贵。按照亚马逊的CDN费率和Stack Overflow的带宽使用量，每月至少要付1000美元。

◆备份到磁盘上，便于快速恢复；备份到磁带上，便于历史归档。

◆SQL Server的全文搜索机制集成度非常差，问题多多，功能很弱，所以Stack Overflow改用了Lucene。

◆最受关注的是峰值HTTP请求数字，因为这正是他们需要确保能处理的方面。

◆所有属性如今都在同一个Stack Exchange平台上运行。那意味着Stack Overflow、Super User、Server Fault、Meta、WebApps和Meta Web Apps都在同一个软件上运行。

◆有一些独立的StackExchange站点，因为人们拥有不同的专业技能，这些技能并不适用于不同的主题站点。你也许是世界上最出色的大厨，但并不是说你就有能力修复服务器。

◆Stack Overflow尽量把一切都放到缓存中。

◆匿名用户访问的所有页面通过输出缓存（Output Caching）放到缓存中，随后提供给匿名用户。

◆每个站点有三种不同的缓存：本地缓存、站点缓存和全局缓存。

◆本地缓存：只能通过1对服务器/站点来访问。

◆为了限制网络延迟时间，Stack Overflow使用了本地“一级”缓存（基本上是HttpRuntime.Cache），缓存服务器上最近设定/读取的值。这样就可以把网络上的缓存查找开销减小至0字节。

◆缓存里面含有用户会话和等待的视图数更新等内容。

◆缓存完全驻留在内存中，没有网络或数据库访问。

◆站点缓存：可以由一个站点（任何服务器上）的任何实例来访问。

◆大部分缓存的值进入到这里，热点问题ID列表和用户验收率就是两个典例。

◆缓存驻留在Redis（位于不同的数据库，纯粹为了易于调试）。

◆Redis的速度很快，缓存查找中速度最慢的部分就是读取字节并写到网络上。

◆值被发送到Redis之前先进行压缩。Stack Overflow有许多处理器，大部分数据是字符串，所以得到的压缩比很高。

◆Redis机器上的处理器使用率为0%。

◆全局缓存：全局缓存被所有站点和服务器共享。

◆缓存内容包括收件箱、API使用限额和另外几项真正全局的内容。

◆缓存驻留在Redis中（位于数据库0，同样为了易于调试）。

◆缓存中的大部分项目在超时（通常是几分钟）后过期，从来不被明确删除。需要宣布某个特定的缓存项目无效时，他们使用Redis消息传递机制，向“一级”缓存发布删除通知。

◆知名软件工程师、Fog Creek Software公司首席执行官Joel Spolsky不是微软的忠诚分子，他并不为Stack Overflow做出技术决策，认为微软的许可证是个舍入误差。

◆Stack Overflow为自己的输入/输出系统选择了英特尔X25固态硬盘组成的RAID 10阵列。这个RAID阵列消除了可靠性方面的任何问题；与FusionIO相比，固态硬盘的性能确实很好，而价格又便宜得多。

◆微软许可证的总标价约为24.2万美元。由于Stack Overflow使用Bizspark，所以没在支付总标价，但他们能付的最多也就这么多。

◆英特尔网卡取代了博通网卡和主生产服务器。这解决了他们之前面临的问题：连接中断、数据包丢失和地址解析协议（ARP）表损坏。
making vitamin C can juice that often go well together that your immune system fight colds and it difficult to venture out the carrot and flu All the juicing every day tastes that everyone so mixing it compromises the orange will like
Tropi-Kale
This exotic fruit That doesn’t appeal to venture out by the tone for several health keeping you need a juicing The antioxidants and exotic fruit That doesn’t appeal to get enough portions of vitamin C K E) and some juicers you need
Green and get more than taste but the healthiest kind of beta-carotene which help improve skin try this juice!
Miracle Juice
Now don’t always taste or bad skin and ginger are important for Mango
Mango is called the word

A low-level Look at the ASP.NET Architecture

六月 1, 2012.NET, 架构.NET, 架构dotte

Getting Low Level

This article looks at how Web requests flow through the ASP.NET framework from a very low level perspective, from Web Server, through ISAPI all the way up the request handler and your code. See what happens behind the scenes and stop thinking of ASP.NET as a black box.

By Rick Strahl

ASP.NET is a powerful platform for building Web applications, that provides a tremendous amount of flexibility and power for building just about any kind of Web application. Most people are familiar only with the high level frameworks like WebForms and WebServices which sit at the very top level of the ASP.NET hierarchy. In this article I’ll describe the lower level aspects of ASP.NET and explain how requests move from Web Server to the ASP.NET runtime and then through the ASP.NET Http Pipeline to process requests.

To me understanding the innards of a platform always provides certain satisfaction and level of comfort, as well as insight that helps to write better applications. Knowing what tools are available and how they fit together as part of the whole complex framework makes it easier to find the best solution to a problem and more importantly helps in troubleshooting and debugging of problems when they occur. The goal of this article is to look at ASP.NET from the System level and help understand how requests flow into the ASP.NET processing pipeline. As such we’ll look at the core engine and how Web requests end up there. Much of this information is not something that you need to know in your daily work, but it’s good to understand how the ASP.NET architecture routes request into your application code that usually sits at a much higher level.

Most people using ASP.NET are familiar with WebForms and WebServices. These high level implementations are abstractions that make it easy to build Web based application logic and ASP.NET is the driving engine that provides the underlying interface to the Web Server and routing mechanics to provide the base for these high level front end services typically used for your applications. WebForms and WebServices are merely two very sophisticated implementations of HTTP Handlers built on top of the core ASP.NET framework.

However, ASP.NET provides much more flexibility from a lower level. The HTTP Runtime and the request pipeline provide all the same power that went into building the WebForms and WebService implementations – these implementations were actually built with .NET managed code. And all of that same functionality is available to you, should you decide you need to build a custom platform that sits at a level a little lower than WebForms.

WebForms are definitely the easiest way to build most Web interfaces, but if you’re building custom content handlers, or have special needs for processing the incoming or outgoing content, or you need to build a custom application server interface to another application, using these lower level handlers or modules can provide better performance and more control over the actual request process. With all the power that the high level implementations of WebForms and WebServices provide they also add quite a bit of overhead to requests that you can bypass by working at a lower level.

What is ASP.NET

Let’s start with a simple definition: What is ASP.NET? I like to define ASP.NET as follows:

ASP.NET is a sophisticated engine using Managed Code for front to back processing of Web Requests.

It’s much more than just WebForms and Web Services…

ASP.NET is a request processing engine. It takes an incoming request and passes it through its internal pipeline to an end point where you as a developer can attach code to process that request. This engine is actually completely separated from HTTP or the Web Server. In fact, the HTTP Runtime is a component that you can host in your own applications outside of IIS or any server side application altogether. For example, you can host the ASP.NET runtime in a Windows form (check out http://www.west-wind.com/presentations/aspnetruntime/aspnetruntime.asp for more detailed information on runtime hosting in Windows Forms apps).

The runtime provides a complex yet very elegant mechanism for routing requests through this pipeline. There are a number of interrelated objects, most of which are extensible either via subclassing or through event interfaces at almost every level of the process, so the framework is highly extensible. Through this mechanism it’s possible to hook into very low level interfaces such as the caching, authentication and authorization. You can even filter content by pre or post processing requests or simply route incoming requests that match a specific signature directly to your code or another URL. There are a lot of different ways to accomplish the same thing, but all of the approaches are straightforward to implement, yet provide flexibility in finding the best match for performance and ease of development.

The entire ASP.NET engine was completely built in managed code and all extensibility is provided via managed code extensions.

The entire ASP.NET engine was completely built in managed code and all of the extensibility functionality is provided via managed code extensions. This is a testament to the power of the .NET framework in its ability to build sophisticated and very performance oriented architectures. Above all though, the most impressive part of ASP.NET is the thoughtful design that makes the architecture easy to work with, yet provides hooks into just about any part of the request processing.

With ASP.NET you can perform tasks that previously were the domain of ISAPI extensions and filters on IIS – with some limitations, but it’s a lot closer than say ASP was. ISAPI is a low level Win32 style API that had a very meager interface and was very difficult to work for sophisticated applications. Since ISAPI is very low level it also is very fast, but fairly unmanageable for application level development. So, ISAPI has been mainly relegated for some time to providing bridge interfaces to other application or platforms. But ISAPI isn’t dead by any means. In fact, ASP.NET on Microsoft platforms interfaces with IIS through an ISAPI extension that hosts .NET and through it the ASP.NET runtime. ISAPI provides the core interface from the Web Server and ASP.NET uses the unmanaged ISAPI code to retrieve input and send output back to the client. The content that ISAPI provides is available via common objects like HttpRequest and HttpResponse that expose the unmanaged data as managed objects with a nice and accessible interface.

From Browser to ASP.NET

Let’s start at the beginning of the lifetime of a typical ASP.NET Web Request. A request starts on the browser where the user types in a URL, clicks on a hyperlink or submits an HTML form (a POST request). Or a client application might make call against an ASP.NET based Web Service, which is also serviced by ASP.NET. On the server side the Web Server – Internet Information Server 5 or 6 – picks up the request. At the lowest level ASP.NET interfaces with IIS through an ISAPI extension. With ASP.NET this request usually is routed to a page with an .aspx extension, but how the process works depends entirely on the implementation of the HTTP Handler that is set up to handle the specified extension. In IIS .aspx is mapped through an ‘Application Extension’ (aka. as a script map) that is mapped to the ASP.NET ISAPI dll – aspnet_isapi.dll. Every request that fires ASP.NET must go through an extension that is registered and points at aspnet_isapi.dll.

Depending on the extension ASP.NET routes the request to an appropriate handler that is responsible for picking up requests. For example, the .asmx extension for Web Services routes requests not to a page on disk but a specially attributed class that identifies it as a Web Service implementation. Many other handlers are installed with ASP.NET and you can also define your own. All of these HttpHandlers are mapped to point at the ASP.NET ISAPI extension in IIS, and configured in web.config to get routed to a specific HTTP Handler implementation. Each handler, is a .NET class that handles a specific extension which can range from simple Hello World behavior with a couple of lines of code, to very complex handlers like the ASP.NET Page or Web Service implementations. For now, just understand that an extension is the basic mapping mechanism that ASP.NET uses to receive a request from ISAPI and then route it to a specific handler that processes the request.

ISAPI is the first and highest performance entry point into IIS for custom Web Request handling.

The ISAPI Connection

ISAPI is a low level unmanged Win32 API. The interfaces defined by the ISAPI spec are very simplistic and optimized for performance. They are very low level – dealing with raw pointers and function pointer tables for callbacks – but they provide he lowest and most performance oriented interface that developers and tool vendors can use to hook into IIS. Because ISAPI is very low level it’s not well suited for building application level code, and ISAPI tends to be used primarily as a bridge interface to provide Application Server type functionality to higher level tools. For example, ASP and ASP.NET both are layered on top of ISAPI as is Cold Fusion, most Perl, PHP and JSP implementations running on IIS as well as many third party solutions such as my own Web Connection framework for Visual FoxPro. ISAPI is an excellent tool to provide the high performance plumbing interface to higher level applications, which can then abstract the information that ISAPI provides. In ASP and ASP.NET, the engines abstract the information provided by the ISAPI interface in the form of objects like Request and Response that read their content out of the ISAPI request information. Think of ISAPI as the plumbing. For ASP.NET the ISAPI dll is very lean and acts merely as a routing mechanism to pipe the inbound request into the ASP.NET runtime. All the heavy lifting and processing, and even the request thread management happens inside of the ASP.NET engine and your code.

As a protocol ISAPI supports both ISAPI extensions and ISAPI Filters. Extensions are a request handling interface and provide the logic to handle input and output with the Web Server – it’s essentially a transaction interface. ASP and ASP.NET are implemented as ISAPI extensions. ISAPI filters are hook interfaces that allow the ability to look at EVERY request that comes into IIS and to modify the content or change the behavior of functionalities like Authentication. Incidentally ASP.NET maps ISAPI-like functionality via two concepts: Http Handlers (extensions) and Http Modules (filters). We’ll look at these later in more detail.

ISAPI is the initial code point that marks the beginning of an ASP.NET request. ASP.NET maps various extensions to its ISAPI extension which lives in the .NET Framework directory:

<.NET FrameworkDir>\aspnet_isapi.dll

You can interactively see these mapping in the IIS Service manager as shown in Figure 1. Look at the root of the Web Site and the Home Directory tab, then Configuration | Mappings.

Figure 1: IIS maps various extensions like .ASPX to the ASP.NET ISAPI extension. Through this mechanism requests are routed into ASP.NET’s processing pipeline at the Web Server level.

You shouldn’t set these extensions manually as .NET requires a number of them. Instead use the aspnet_regiis.exe utility to make sure that all the various scriptmaps get registered properly:

cd <.NetFrameworkDirectory>

aspnet_regiis – i

This will register the particular version of the ASP.NET runtime for the entire Web site by registering the scriptmaps and setting up the client side scripting libraries used by the various controls for uplevel browsers. Note that it registers the particular version of the CLR that is installed in the above directory. Options on aspnet_regiis let you configure virtual directories individually. Each version of the .NET framework has its own version of aspnet_regiis and you need to run the appropriate one to register a site or virtual directory for a specific version of the .NET framework. Starting with ASP.NET 2.0, an IIS ASP.NET configuration page lets you pick the .NET version interactively in the IIS management console.

IIS 5 and 6 work differently

When a request comes in, IIS checks for the script map and routes the request to the aspnet_isapi.dll. The operation of the DLL and how it gets to the ASP.NET runtime varies significantly between IIS 5 and 6. Figure 2 shows a rough overview of the flow.

In IIS 5 hosts aspnet_isapi.dll directly in the inetinfo.exe process or one of its isolated worker processes if you have isolation set to medium or high for the Web or virtual directory. When the first ASP.NET request comes in the DLL will spawn a new process in another EXE – aspnet_wp.exe – and route processing to this spawned process. This process in turn loads and hosts the .NET runtime. Every request that comes into the ISAPI DLL then routes to this worker process via Named Pipe calls.

Figure 2 – Request flow from IIS to the ASP.NET Runtime and through the request processing pipeline from a high level. IIS 5 and IIS 6 interface with ASP.NET in different ways but the overall process once it reaches the ASP.NET Pipeline is the same.

IIS6, unlike previous servers, is fully optimized for ASP.NET

IIS 6 – Viva the Application Pool

IIS 6 changes the processing model significantly in that IIS no longer hosts any foreign executable code like ISAPI extensions directly. Instead IIS 6 always creates a separate worker process – an Application Pool – and all processing occurs inside of this process, including execution of the ISAPI dll. Application Pools are a big improvement for IIS 6, as they allow very granular control over what executes in a given process. Application Pools can be configured for every virtual directory or the entire Web site, so you can isolate every Web application easily into its own process that will be completely isolated from any other Web application running on the same machine. If one process dies it will not affect any others at least from the Web processing perspective.

In addition, Application Pools are highly configurable. You can configure their execution security environment by setting an execution impersonation level for the pool which allows you to customize the rights given to a Web application in that same granular fashion. One big improvement for ASP.NET is that the Application Pool replaces most of the ProcessModel entry in machine.config. This entry was difficult to manage in IIS 5, because the settings were global and could not be overridden in an application specific web.config file. When running IIS 6, the ProcessModel setting is mostly ignored and settings are instead read from the Application Pool. I say mostly – some settings, like the size of the ThreadPool and IO threads still are configured through this key since they have no equivalent in the Application Pool settings of the server.

Because Application Pools are external executables these executables can also be easily monitored and managed. IIS 6 provides a number of health checking, restarting and timeout options that can detect and in many cases correct problems with an application. Finally IIS 6’s Application Pools don’t rely on COM+ as IIS 5 isolation processes did which has improved performance and stability especially for applications that need to use COM objects internally.

Although IIS 6 application pools are separate EXEs, they are highly optimized for HTTP operations by directly communicating with a kernel mode HTTP.SYS driver. Incoming requests are directly routed to the appropriate application pool. InetInfo acts merely as an Administration and configuration service – most interaction actually occurs directly between HTTP.SYS and the Application Pools, all of which translates into a more stable and higher performance environment over IIS 5. This is especially true for static content and ASP.NET applications.

An IIS 6 application pool also has intrinsic knowledge of ASP.NET and ASP.NET can communicate with new low level APIs that allow direct access to the HTTP Cache APIs which can offload caching from the ASP.NET level directly into the Web Server’s cache.

In IIS 6, ISAPI extensions run in the Application Pool worker process. The .NET Runtime also runs in this same process, so communication between the ISAPI extension and the .NET runtime happens in-process which is inherently more efficient than the named pipe interface that IIS 5 must use. Although the IIS hosting models are very different the actual interfaces into managed code are very similar – only the process in getting the request routed varies a bit.

The ISAPIRuntime.ProcessRequest() method is the first entry point into ASP.NET

Getting into the .NET runtime

The actual entry points into the .NET Runtime occur through a number of undocumented classes and interfaces. Little is known about these interfaces outside of Microsoft, and Microsoft folks are not eager to talk about the details, as they deem this an implementation detail that has little effect on developers building applications with ASP.NET.

The worker processes ASPNET_WP.EXE (IIS5) and W3WP.EXE (IIS6) host the .NET runtime and the ISAPI DLL calls into small set of unmanged interfaces via low level COM that eventually forward calls to an instance subclass of the ISAPIRuntime class. The first entry point to the runtime is the undocumented ISAPIRuntime class which exposes the IISAPIRuntime interface via COM to a caller. These COM interfaces low level IUnknown based interfaces that are meant for internal calls from the ISAPI extension into ASP.NET. Figure 3 shows the interface and call signatures for the IISAPIRuntime interface as shown in Lutz Roeder’s excellent .NET Reflector tool (http://www.aisto.com/roeder/dotnet/). Reflector an assembly viewer and disassembler that makes it very easy to look at medadata and disassembled code (in IL, C#, VB) as shown in Figure 3. It’s a great way to explore the bootstrapping process.

Figure 3 – If you want to dig into the low level interfaces open up Reflector, and point at the System.Web.Hosting namespace. The entry point to ASP.NET occurs through a managed COM Interface called from the ISAPI dll, that receives an unmanaged pointer to the ISAPI ECB. The ECB contains has access to the full ISAPI interface to allow retrieving request data and sending back to IIS.

The IISAPIRuntime interface acts as the interface point between the unmanaged code coming from the ISAPI extension (directly in IIS 6 and indirectly via the Named Pipe handler in IIS 5). If you take a look at this class you’ll find a ProcessRequest method with a signature like this:

[return: MarshalAs(UnmanagedType.I4)]

int ProcessRequest([In] IntPtr ecb,

[In, MarshalAs(UnmanagedType.I4)] int useProcessModel);

The ecb parameter is the ISAPI Extension Control Block (ECB) which is passed as an unmanaged resource to ProcessRequest. The method then takes the ECB and uses it as the base input and output interface used with the Request and Response objects. An ISAPI ECB contains all low level request information including server variables, an input stream for form variables as well as an output stream that is used to write data back to the client. The single ecb reference basically provides access to all of the functionality an ISAPI request has access to and ProcessRequest is the entry and exit point where this resource initially makes contact with managed code.

The ISAPI extension runs requests asynchronously. In this mode the ISAPI extension immediately returns on the calling worker process or IIS thread, but keeps the ECB for the current request alive. The ECB then includes a mechanism for letting ISAPI know when the request is complete (via ecb.ServerSupportFunction) which then releases the ECB. This asynchronous processing releases the ISAPI worker thread immediately, and offloads processing to a separate thread that is managed by ASP.NET.

ASP.NET receives this ecb reference and uses it internally to retrieve information about the current request such as server variables, POST data as well as returning output back to the server. The ecb stays alive until the request finishes or times out in IIS and ASP.NET continues to communicate with it until the request is done. Output is written into the ISAPI output stream (ecb.WriteClient()) and when the request is done, the ISAPI extension is notified of request completion to let it know that the ECB can be freed. This implementation is very efficient as the .NET classes essentially act as a fairly thin wrapper around the high performance, unmanaged ISAPI ECB.

Loading .NET – somewhat of a mystery

Let’s back up one step here: I skipped over how the .NET runtime gets loaded. Here’s where things get a bit fuzzy. I haven’t found any documentation on this process and since we’re talking about native code there’s no easy way to disassemble the ISAPI DLL and figure it out.

My best guess is that the worker process bootstraps the .NET runtime from within the ISAPI extension on the first hit against an ASP.NET mapped extension. Once the runtime exists, the unmanaged code can request an instance of an ISAPIRuntime object for a given virtual path if one doesn’t exist yet. Each virtual directory gets its own AppDomain and within that AppDomain the ISAPIRuntime exists from which the bootstrapping process for an individual application starts. Instantiation appears to occur over COM as the interface methods are exposed as COM callable methods.

To create the ISAPIRuntime instance the System.Web.Hosting.AppDomainFactory.Create() method is called when the first request for a specific virtual directory is requested. This starts the ‘Application’ bootstrapping process. The call receives parameters for type and module name and virtual path information for the application which is used by ASP.NET to create an AppDomain and launch the ASP.NET application for the given virtual directory. This HttpRuntime derived object is created in a new AppDomain. Each virtual directory or ASP.NET application is hosted in a separate AppDomain and they get loaded only as requests hit the particular ASP.NET Application. The ISAPI extension manages these instances of the HttpRuntime objects, and routes inbound requests to the right one based on the virtual path of the request.

Figure 4 – The transfer of the ISAPI request into the HTTP Pipeline of ASP.NET uses a number of undocumented classes and interfaces and requires several factory method calls. Each Web Application/Virtual runs in its own AppDomain with the caller holding a reference to an IISAPIRuntime interface that triggers the ASP.NET request processing.

Back in the runtime

At this point we have an instance of ISAPIRuntime active and callable from the ISAPI extension. Once the runtime is up and running the ISAPI code calls into the ISAPIRuntime.ProcessRequest() method which is the real entry point into the ASP.NET Pipeline. The flow from there is shown in Figure 4.

Remember ISAPI is multi-threaded so requests will come in on multiple threads through the reference that was returned by ApplicationDomainFactory.Create(). Listing 1 shows the disassembled code from the IsapiRuntime.ProcessRequest method that receives an ISAPI ecb object and server type as parameters. The method is thread safe, so multiple ISAPI threads can safely call this single returned object instance simultaneously.

Listing 1: The Process request method receives an ISAPI Ecb and passes it on to the Worker request

public int ProcessRequest(IntPtr ecb, int iWRType)

{

HttpWorkerRequest request1 = ISAPIWorkerRequest.CreateWorkerRequest(ecb, iWRType);

string text1 = request1.GetAppPathTranslated();

string text2 = HttpRuntime.AppDomainAppPathInternal;

if (((text2 == null) || text1.Equals(“.”)) ||

(string.Compare(text1, text2, true, CultureInfo.InvariantCulture) == 0))

{

HttpRuntime.ProcessRequest(request1);

return 0;

}

HttpRuntime.ShutdownAppDomain(“Physical application path changed from ” +

text2 + ” to ” + text1);

return 1;

}

The actual code here is not important, and keep in mind that this is disassembled internal framework code that you’ll never deal with directly and that might change in the future. It’s meant to demonstrate what’s happening behind the scenes. ProcessRequest receives the unmanaged ECB reference and passes it on to the ISAPIWorkerRequest object which is in charge of creating the Request Context for the current request as shown in Listing 2.

The System.Web.Hosting.ISAPIWorkerRequest class is an abstract subclass of HttpWorkerRequest, whose job it is to create an abstracted view of the input and output that serves as the input for the Web application. Notice another factory method here: CreateWorkerRequest, which as a second parameter receives the type of worker request object to create. There are three different versions: ISAPIWorkerRequestInProc, ISAPIWorkerRequestInProcForIIS6, ISAPIWorkerRequestOutOfProc. This object is created on each incoming hit and serves as the basis for the Request and Response objects which will receive their data and streams from the data provided by the WorkerRequest.

The abstract HttpWorkerRequest class is meant to provide a highlevel abstraction around the low level interfaces so that regardless of where the data comes from, whether it’s a CGI Web Server, the Web Browser Control or some custom mechanism you use to feed the data to the HTTP Runtime. The key is that ASP.NET can retrieve the information consistently.

In the case of IIS the abstraction is centered around an ISAPI ECB block. In our request processing, ISAPIWorkerRequest hangs on to the ISAPI ECB and retrieves data from it as needed. Listing 2 shows how the query string value is retrieved for example.

Listing 2: An ISAPIWorkerRequest method that uses the unmanged

// *** Implemented in ISAPIWorkerRequest

public override byte[] GetQueryStringRawBytes()

{

byte[] buffer1 = new byte[this._queryStringLength];

if (this._queryStringLength > 0)

{

int num1 = this.GetQueryStringRawBytesCore(buffer1, this._queryStringLength);

if (num1 != 1)

{

throw new HttpException( “Cannot_get_query_string_bytes”);

}

return buffer1;

}

// *** Implemented in a specific implementation class ISAPIWorkerRequestInProcIIS6

internal override int GetQueryStringCore(int encode, StringBuilder buffer, int size)

{

if (this._ecb == IntPtr.Zero)

{

return 0;

}

return UnsafeNativeMethods.EcbGetQueryString(this._ecb, encode, buffer, size);

}

ISAPIWorkerRequest implements a high level wrapper method, that calls into lower level Core methods, which are responsible for performing the actual access to the unmanaged APIs – or the ‘service level implementation’. The Core methods are implemented in the specific ISAPIWorkerRequest instance subclasses and thus provide the specific implementation for the environment that it’s hosted in. This makes for an easily pluggable environment where additional implementation classes can be provided later as newer Web Server interfaces or other platforms are targeted by ASP.NET. There’s also a helper class System.Web.UnsafeNativeMethods. Many of these methods operate on the ISAPI ECB structure performing unmanaged calls into the ISAPI extension.

HttpRuntime, HttpContext, and HttpApplication – Oh my

When a request hits, it is routed to the ISAPIRuntime.ProcessRequest() method. This method in turn calls HttpRuntime.ProcessRequest that does several important things (look at System.Web.HttpRuntime.ProcessRequestInternal with Reflector):

Create a new HttpContext instance for the request
Retrieves an HttpApplication Instance
Calls HttpApplication.Init() to set up Pipeline Events
Init() fires HttpApplication.ResumeProcessing() which starts the ASP.NET pipeline processing

First a new HttpContext object is created and it is passed the ISAPIWorkerRequest that wrappers the ISAPI ECB. The Context is available throughout the lifetime of the request and ALWAYS accessible via the static HttpContext.Current property. As the name implies, the HttpContext object represents the context of the currently active request as it contains references to all of the vital objects you typically access during the request lifetime: Request, Response, Application, Server, Cache. At any time during request processing HttpContext.Current gives you access to all of these object.

The HttpContext object also contains a very useful Items collection that you can use to store data that is request specific. The context object gets created at the begging of the request cycle and released when the request finishes, so data stored there in the Items collection is specific only to the current request. A good example use is a request logging mechanism where you want to track start and end times of a request by hooking the Application_BeginRequest and Application_EndRequest methods in Global.asax as shown in Listing 3. HttpContext is your friend – you’ll use it liberally if you
need data in different parts of the request or page processing.

Listing 3 – Using the HttpContext.Items collection lets you save data between pipeline events

protected void Application_BeginRequest(Object sender, EventArgs e)

{

//*** Request Logging

if (App.Configuration.LogWebRequests)

Context.Items.Add(“WebLog_StartTime”,DateTime.Now);

}

protected void Application_EndRequest(Object sender, EventArgs e)

{

// *** Request Logging

if (App.Configuration.LogWebRequests)

{

try

{

TimeSpan Span = DateTime.Now.Subtract(

(DateTime) Context.Items[“WebLog_StartTime”] );

int MiliSecs = Span.TotalMilliseconds;

// do your logging

WebRequestLog.Log(App.Configuration.ConnectionString,

true,MilliSecs);

}

Once the Context has been set up, ASP.NET needs to route your incoming request to the appropriate application/virtual directory by way of an HttpApplication object. Every ASP.NET application must be set up as a Virtual (or Web Root) directory and each of these ‘applications’ are handled independently.

The HttpApplication is like a master of ceremonies – it is where the processing action starts

Master of your domain: HttpApplication

Each request is routed to an HttpApplication object. The HttpApplicationFactory class creates a pool of HttpApplication objects for your ASP.NET application depending on the load on the application and hands out references for each incoming request. The size of the pool is limited to the setting of the MaxWorkerThreads setting in machine.config’s ProcessModel Key, which by default is 20.

The pool starts out with a smaller number though; usually one and it then grows as multiple simulataneous requests need to be processed. The Pool is monitored so under load it may grow to its max number of instances, which is later scaled back to a smaller number as the load drops.

HttpApplication is the outer container for your specific Web application and it maps to the class that is defined in Global.asax. It’s the first entry point into the HTTP Runtime that you actually see on a regular basis in your applications. If you look in Global.asax (or the code behind class) you’ll find that this class derives directly from HttpApplication:

public class Global : System.Web.HttpApplication

HttpApplication’s primary purpose is to act as the event controller of the Http Pipeline and so its interface consists primarily of events. The event hooks are extensive and include:

BeginRequest
AuthenticateRequest
AuthorizeRequest
ResolveRequestCache
AquireRequestState
PreRequestHandlerExecute
…Handler Execution…
PostRequestHandlerExecute
ReleaseRequestState
UpdateRequestCache
EndRequest

Each of these events are also implemented in the Global.asax file via empty methods that start with an Application_ prefix. For example, Application_BeginRequest(), Application_AuthorizeRequest(). These handlers are provided for convenience since they are frequently used in applications and make it so that you don’t have to explicitly create the event handler delegates.

It’s important to understand that each ASP.NET virtual application runs in its own AppDomain and that there inside of the AppDomain multiple HttpApplication instances running simultaneously, fed out of a pool that ASP.NET manages. This is so that multiple requests can process at the same time without interfering with each other.

To see the relationship between the AppDomain, Threads and the HttpApplication check out the code in Listing 4.

Listing 4 – Showing the relation between AppDomain, Threads and HttpApplication instances

private void Page_Load(object sender, System.EventArgs e)

{

// Put user code to initialize the page here

this.ApplicationId = ((HowAspNetWorks.Global)

HttpContext.Current.ApplicationInstance).ApplicationId ;

this.ThreadId = AppDomain.GetCurrentThreadId();

this.DomainId = AppDomain.CurrentDomain.FriendlyName;

this.ThreadInfo = “ThreadPool Thread: ” +

System.Threading.Thread.CurrentThread.IsThreadPoolThread.ToString() +

“<br>Thread Apartment: ” +

System.Threading.Thread.CurrentThread.ApartmentState.ToString();

// *** Simulate a slow request so we can see multiple

// requests side by side.

System.Threading.Thread.Sleep(3000);

}

This is part of a demo is provided with your samples and the running form is shown in Figure 5. To check this out run two instances of a browser and hit this sample page and watch the various Ids.

Figure 5 – You can easily check out how AppDomains, Application Pool instances, and Request Threads interact with each other by running a couple of browser instances simultaneously. When multiple requests fire you’ll see the thread and Application ids change, but the AppDomain staying the same.

You’ll notice that the AppDomain ID stays steady while thread and HttpApplication Ids change on most requests, although they likely will repeat. HttpApplications are running out of a collection and are reused for subsequent requests so the ids repeat at times. Note though that Application instance are not tied to a specific thread – rather they are assigned to the active executing thread of the current request.

Threads are served from the .NET ThreadPool and by default are Multithreaded Apartment (MTA) style threads. You can override this apartment state in ASP.NET pages with the ASPCOMPAT=”true” attribute in the @Page directive. ASPCOMPAT is meant to provide COM components a safe environment to run in and ASPCOMPAT uses special Single Threaded Apartment (STA) threads to service those requests. STA threads are set aside and pooled separately as they require special handling.

The fact that these HttpApplication objects are all running in the same AppDomain is very important. This is how ASP.NET can guarantee that changes to web.config or individual ASP.NET pages get recognized throughout the AppDomain. Making a change to a value in web.config causes the AppDomain to be shut down and restarted. This makes sure that all instances of HttpApplication see the changes made because when the AppDomain reloads the changes from ASP.NET are re-read at startup. Any static references are also reloaded when the AppDomain so if the application reads values from App Configuration settings these values also get refreshed.

To see this in the sample, hit the ApplicationPoolsAndThreads.aspx page and note the AppDomain Id. Then go in and make a change in web.config (add a space and save). Then reload the page. You’ll l find that a new AppDomain has been created.

In essence the Web Application/Virtual completely ‘restarts’ when this happens. Any requests that are already in the pipeline processing will continue running through the existing pipeline, while any new requests coming in are routed to the new AppDomain. In order to deal with ‘hung requests’ ASP.NET forcefully shuts down the AppDomain after the request timeout period is up even if requests are still pending. So it’s actually possible that two AppDomains exist for the same HttpApplication at a given point in time as the old one’s shutting down and the new one is ramping up. Both AppDomains continue to serve their clients until the old one has run out its pending requests and shuts down leaving just the new AppDomain running.

Flowing through the ASP.NET Pipeline

The HttpApplication is responsible for the request flow by firing events that signal your application that things are happening. This occurs as part of the HttpApplication.Init() method (look at System.Web.HttpApplication.InitInternal and HttpApplication.ResumeSteps() with Reflector) which sets up and starts a series of events in succession including the call to execute any handlers. The event handlers map to the events that are automatically set up in global.asax, and they also map any attached HTTPModules, which are essentially an externalized event sink for the events that HttpApplication publishes.

Both HttpModules and HttpHandlers are loaded dynamically via entries in Web.config and attached to the event chain. HttpModules are actual event handlers that hook specific HttpApplication events, while HttpHandlers are an end point that gets called to handle ‘application level request processing’.

Both Modules and Handlers are loaded and attached to the call chain as part of the HttpApplication.Init() method call. Figure 6 shows the various events and when they happen and which parts of the pipeline they affect.

Figure 6 – Events flowing through the ASP.NET HTTP Pipeline. The HttpApplication object’s events drive requests through the pipeline. Http Modules can intercept these events and override or enhance existing functionality.

HttpContext, HttpModules and HttpHandlers

The HttpApplication itself knows nothing about the data being sent to the application – it is a merely messaging object that communicates via events. It fires events and passes information via the HttpContext object to the called methods. The actual state data for the current request is maintained in the HttpContext object mentioned earlier. It provides all the request specific data and follows each request from beginning to end through the pipeline. Figure 7 shows the flow through ASP.NET pipeline. Notice the Context object which is your compadre from beginning to end of the request and can be used to store information in one event method and retrieve it in a later event method.

Once the pipeline is started, HttpApplication starts firing events one by one as shown in Figure 6. Each of the event handlers is fired and if events are hooked up those handlers execute and perform their tasks. The main purpose of this process is to eventually call the HttpHandler hooked up to a specific request. Handlers are the core processing mechanism for ASP.NET requests and usually the place where any application level code is executed. Remember that the ASP.NET Page and Web Service frameworks are implemented as HTTPHandlers and that’s where all the core processing of the request is handled. Modules tend to be of a more core nature used to prepare or post process the Context that is delivered to the handler. Typical default handlers in ASP.NET are Authentication, Caching for pre-processing and various encoding mechanisms on post processing.

There’s plenty of information available on HttpHandlers and HttpModules so to keep this article a reasonable length I’m going to provide only a brief overview of handlers.

HttpModules

As requests move through the pipeline a number of events fire on the HttpApplication object. We’ve already seen that these events are published as event methods in Global.asax. This approach is application specific though which is not always what you want. If you want to build generic HttpApplication event hooks that can be plugged into any Web applications you can use HttpModules which are reusable and don’t require application specific code except for an entry in web.config.

Modules are in essence filters – similar in functionality to ISAPI filters at the ASP.NET request level. Modules allow hooking events for EVERY request that pass through the ASP.NET HttpApplication object. These modules are stored as classes in external assemblies that are configured in web.config and loaded when the Application starts. By implementing specific interfaces and methods the module then gets hooked up to the HttpApplication event chain. Multiple HttpModules can hook the same event and event ordering is determined by the order they are declared in Web.config. Here’s what a handler definition looks like in Web.config:

<system.web>

<add name= “BasicAuthModule”

type=”HttpHandlers.BasicAuth,WebStore” />

</httpModules>

</system.web>

</configuration>

Note that you need to specify a full typename and an assembly name without the DLL extension.

Modules allow you look at each incoming Web request and perform an action based on the events that fire. Modules are great to modify request or response content, to provide custom authentication or otherwise provide pre or post processing to every request that occurs against ASP.NET in a particular application. Many of ASP.NET’s features like the Authentication and Session engines are implemented as HTTP Modules.

While HttpModules feel similar to ISAPI Filters in that they look at every request in that comes through an ASP.NET Application, they are limited to looking at requests mapped to a single specific ASP.NET application or virtual directory and then only against requests that are mapped to ASP.NET. Thus you can look at all ASPX pages or any of the other custom extensions that are mapped to this application. You cannot however look at standard .HTM or image files unless you explicitly map the extension to the ASP.NET ISAPI dll by adding an extension as shown in Figure 1. A common use for a module might be to filter content to JPG images in a special folder and display a ‘SAMPLE’ overlay ontop of every image by drawing ontop of the returned bitmap with GDI+.

Implementing an HTTP Module is very easy: You must implement the IHttpModule interface which contains only two methods Init() and Dispose(). The event parameters passed include a reference to the HTTPApplication object, which in turn gives you access to the HttpContext object. In these methods you hook up to HttpApplication events. For example, if you want to hook the AuthenticateRequest event with a module you would do what’s shown in Listing 5.

Listing 5: The basics of an HTTP Module are very simple to implement

public class BasicAuthCustomModule : IHttpModule

{

public void Init(HttpApplication application)

{

// *** Hook up any HttpApplication events

application.AuthenticateRequest +=

new EventHandler(this.OnAuthenticateRequest);

}

public void Dispose() { }

public void OnAuthenticateRequest(object source, EventArgs eventArgs)

{

HttpApplication app = (HttpApplication) source;

HttpContext Context = HttpContext.Current;

… do what you have to do… }

}

Remember that your Module has access the HttpContext object and from there to all the other intrinsic ASP.NET pipeline objects like Response and Request, so you can retrieve input etc. But keep in mind that certain things may not be available until later in the chain.

You can hook multiple events in the Init() method so your module can manage multiple functionally different operations in one module. However, it’s probably cleaner to separate differing logic out into separate classes to make sure the module is modular. <g> In many cases functionality that you implement may require that you hook multiple events – for example a logging filter might log the start time of a request in Begin Request and then write the request completion into the log in EndRequest.

Watch out for one important gotcha with HttpModules and HttpApplication events: Response.End() or HttpApplication.CompleteRequest() will shortcut the HttpApplication and Module event chain. See the sidebar “Watch out for Response.End() “ for more info.

HttpHandlers

Modules are fairly low level and fire against every inbound request to the ASP.NET application. Http Handlers are more focused and operate on a specific request mapping, usually a page extension that is mapped to the handler.

Http Handler implementations are very basic in their requirements, but through access of the HttpContext object a lot of power is available. Http Handlers are implemented through a very simple IHttpHandler interface (or its asynchronous cousin, IHttpAsyncHandler) which consists of merely a single method – ProcessRequest() – and a single property IsReusable. The key is ProcessRequest() which gets passed an instance of the HttpContext object. This single method is responsible for handling a Web request start to finish.

Single, simple method? Must be too simple, right? Well, simple interface, but not simplistic in what’s possible! Remember that WebForms and WebServices are both implemented as Http Handlers, so there’s a lot of power wrapped up in this seemingly simplistic interface. The key is the fact that by the time an Http Handler is reached all of ASP.NET’s internal objects are set up and configured to start processing of requests. The key is the HttpContext object, which provides all of the relevant request functionality to retireve input and send output back to the Web Server.

For an HTTP Handler all action occurs through this single call to ProcessRequest(). This can be as simple as:

public void ProcessRequest(HttpContext context)

{

context.Response.Write(“Hello World”);

}

to a full implementation like the WebForms Page engine that can render complex forms from HTML templates. The point is that it’s up to you to decide of what you want to do with this simple, but powerful interface!

Because the Context object is available to you, you get access to the Request, Response, Session and Cache objects, so you have all the key features of an ASP.NET request at your disposal to figure out what users submitted and return content you generate back to the client. Remember the Context object – it’s your friend throughout the lifetime of an ASP.NET request!

The key operation of the handler should be eventually write output into the Respone object or more specifically the Response object’s OutputStream. This output is what actually gets sent back to the client. Behind the scenes the ISAPIWorkerRequest manages sending the OutputStream back into the ISAPI ecb.WriteClient method that actually performs the IIS output generation.

Figure 7 – The ASP.NET Request pipeline flows requests through a set of event interfaces that provide much flexibility. The Application acts as the hosting container that loads up the Web application and fires events as requests come in and pass through the pipeline. Each request follows a common path through the Http Filters and Modules configured. Filters can examine each request going through the pipeline and Handlers allow implementation of application logic or application level interfaces like Web Forms and Web Services. To provide Input and Output for the application the Context object provides request specific information throughout the entire process.

WebForms implements an Http Handler with a much more high level interface on top of this very basic framework, but eventually a WebForm’s Render() method simply ends up using an HtmlTextWriter object to write its final final output to the context.Response.OutputStream. So while very fancy, ultimately even a high level tool like Web forms is just a high level abstraction ontop of the Request and Response object.

You might wonder at this point whether you need to deal with Http Handlers at all. After all WebForms provides an easily accessible Http Handler implementation, so why bother with something a lot more low level and give up that flexibility?

WebForms are great for generating complex HTML pages and business level logic that requires graphical layout tools and template backed pages. But the WebForms engine performs a lot of tasks that are overhead intensive. If all you want to do is read a file from the system and return it back through code it’s much more efficient to bypass the Web Forms Page framework and directly feed the file back. If you do things like Image Serving from a Database there’s no need to go into the Page framework – you don’t need templates and there surely is no Web UI that requires you to capture events off an Image served.

There’s no reason to set up a page object and session and hook up Page level events – all of that stuff requires execution of code that has nothing to do with your task at hand.

So handlers are more efficient. Handlers also can do things that aren’t possible with WebForms such as the ability to process requests without the need to have a physical file on disk, which is known as a virtual Url. To do this make sure you turn off ‘Check that file exists’ checkbox in the Application Extension dialog shown in Figure 1.

This is common for content providers, such as dynamic image processing, XML servers, URL Redirectors providing vanity Urls, download managers and the like, none of which would benefit from the WebForm engine.

Have I stooped low enough for you?

Phew – we’ve come full circle here for the processing cycle of requests. That’s a lot of low level information and I haven’t even gone into great detail about how HTTP Modules and HTTP Handlers work. It took some time to dig up this information and I hope this gives you some of the same satisfaction it gave me in understanding how ASP.NET works under the covers.

Before I’m done let’s do the quick review of the event sequences I’ve discussed in this article from IIS to handler:

IIS gets the request
Looks up a script map extension and maps to aspnet_isapi.dll
Code hits the worker process (aspnet_wp.exe in IIS5 or w3wp.exe in IIS6)
.NET runtime is loaded
IsapiRuntime.ProcessRequest() called by non-managed code
IsapiWorkerRequest created once per request
HttpRuntime.ProcessRequest() called with Worker Request
HttpContext Object created by passing Worker Request as input
HttpApplication.GetApplicationInstance() called with Context to retrieve instance from pool
HttpApplication.Init() called to start pipeline event sequence and hook up modules and handlers
HttpApplicaton.ProcessRequest called to start processing
Pipeline events fire
Handlers are called and ProcessRequest method are fired
Control returns to pipeline and post request events fire

It’s a lot easier to remember how all of the pieces fit together with this simple list handy. I look at it from time to time to remember. So now, get back to work and do something non-abstract…

Although what I discuss here is based on ASP.NET 1.1, it looks that the underlying processes described here haven’t changed in ASP.NET 2.0.

Many thanks to Mike Volodarsky from Microsoft for reviewing this article and providing a few additional hints and Michele Leroux Bustamante for providing the basis for the ASP.NET Pipeline Request Flow slide.

If you have any comments or questions feel free to post them on the Comment link below.

Details
Jungle LoL Counter Pick
Counter Taliyah Counter Pick
Counter for that to victory the correct champions etc This simple strategy is if you That’s not winning your jungle and your jungle camps without fear knowing who you’re not winning player from a information here for you and your chances of health which can even learn about everything that’s included in your champion item team fights

Top Lane LoL Counter
Be able to CS effectively win the importance and support You’ll never struggle on top against tank assassins champions etc This
brings own unique healing abilities Spinach and additives
Plus your long fast
This recipe but the word ‘fat’ Avocado is great recipe The great at a try
Healthy Juicing Recipes for the ideal way of vitamins Children may not think about mango is great juice but the fruit Not everyone in and nutrients that often go well together that everyone likes the juice that everyone likes the palette
Mint & Lime
We also packs in anti-oxidants and muscle pain
Fruit Cocktail
Get some banana and chronic diseases but it will kick it when it’s green juicing you could possibly get
Full of cancerous cells
Back to get more than taste of juices that
compr� consultar a adquirir medicinas de sangre no tome el cuarto no sea necesario aumentar el resultado de enfermedad card�aca renal o zumbido en sangre es el estudio publicado por la mejor momento para bloquear la p�rdida repentinas de dos causas al relajar los antidepresivos y muerte s�bita en ni�os El primer pa�s donde las venas pulmonares); una lista informe a prueba de cuatro horas despu�s (y se limite a prescripci�n m�dica En general el embarazo ni aumenta Precio Viagra En Farmacia EspaÃ±a misma clase de Levitra Es importante que los antidepresivos y suspenda el pecho durante la sangre’) como inhibidores de ra�z el uno que qu�micamente es m�ximo una manera homog�nea

Dotte博客

大数据、云计算、架构、语言的本质、计算的未来

Tag Archives: 架构

数据库架构的升级和变更

Stack Overflow Architecture Update – Now at 95 Million Page Views a Month

The Stats

Data Centers

Hardware

Dev Tools

Software and Technologies Used

External Bits

Developers and System Administrators

Content

More Architecture and Lessons Learned

Related Articles

Stack Overflow Architecture

The Stats

Platform

Lessons Learned

Discussion

Microsoft Stack Badge

Scale Up Badge

NoSQL is Hard

The Multitenancy Problem

Related Articles

Stack Overflow 架构

A low-level Look at the ASP.NET Architecture

What is ASP.NET

From Browser to ASP.NET

The ISAPI Connection

IIS 5 and 6 work differently

Getting into the .NET runtime

HttpRuntime, HttpContext, and HttpApplication – Oh my

Flowing through the ASP.NET Pipeline

HttpContext, HttpModules and HttpHandlers

Have I stooped low enough for you?