Category Archives: 架构

ArchSummit北京2014大会(PPT)

1219日幻灯片资料下载(部分)
时间 2014年12月19日
9:30 论高度Web化之影响
Dave Arel
10:30 冯诺依曼等技术先贤留下的API规模经济启示
Mike Amundsen
11:30 大数据的十个技术前沿
英特尔中国研究院 吴甘沙
专题 转型中的 研发体系构建 互联网金融 疯狂双十一的全栈技术路线图 云平台技术全景剖析
13:30 大数据时代的feed流架构 互联网企业制胜利器——选对人、速度快 证券交易系统架构设计挑战与实施经验分享 京东商城交易系统双11总结 云计算架构的实战案例
新浪微博 杨卫华 奇虎360 谭晓生 上交所 陈晨 京东商城 王晓钟 阿里云 黄湘龙(龙觉)
14:30 Nice服务端架构变迁 远距离条件下的康威定律——分布式世界中实现团 站在电商平台上的互联网金融系统架构实践 双11——淘宝的下一代架构的成人礼 阿里云虚拟化技术研发之路
Nice技术合伙人 程㭎 Mike Amundsen 京东 赵海峰 阿里技术保障 梁耀斌 阿里云 张献涛(旭卿)
15:45 移动互联网海量访问系统设计 Tower团队24个月的远程协作实践 互联网金融系统中的资金正确性保障 OceanBase-支付宝交易O2O最佳实践 阿里云CDN技术演进
微信红包 张晋铭 彩程科技 沈学良 维金 陆怡 阿里巴巴 师文汇(虞舜) 阿里云 朱照远(叔度)
16:45 知乎从0到100 技术驱动型公司的精益实践 互联网金融下的智能客户服务探索 从双十一看天猫订单数据对接及订单流程优化 分析数据库ADS的产品化、服务化
知乎 李申申 LeanCloud CTO 丰俊文 蚂蚁金服 阮征 小米科技 杨大伟 阿里巴巴占超群(离哲)
1220日幻灯片资料下载(部分)
时间 2014年12月20日
专题 智能硬件,更懂你 电商,不是搭个平台就能赢 移动互联网,随时随地 云计算与大数据,从技术选型说起 真实的云计算服务
出品人 豌豆机器小组罗未 京东吴博 淘宝庄卓然 小米冯宏华 InfoQ 包研
9:30 开源硬件的智能之芯 重构再重构,当当网架构演进及规划实现 手机QQ的移动网络实践之路 豌豆荚分布式Redis的设计与实现 NoSQL数据库的事务机制实现
北京融汇科艺 肖文鹏 当当网 史海峰 腾讯手Q 范瑞彬 豌豆荚 刘奇 巨杉数据库 王涛
10:30 智能热潮推进家庭数字娱乐变革 Qunar酒店交易系统架构实践 HTML5移动Web:数百万用户的 MiCloud 挑战和架构演进 一个移动互联网数据分析公司的架构实践
小米 李创奇 去哪儿网 莫德友 Dave Arel 小米科技 张弘强/刘绍辉 友盟 叶谦
11:30 无人机的科幻与现实 千万级电商物流系统是怎么练成的— 来自京东青龙系统的最佳实践 去哪儿SPA HTML应用架构——让HTML应用体验更贴近于Native应用 大数据统计分析平台架构故事—TalkingData数据库架构变迁 七牛云存储产品的三年演进之路
极飞CEO 彭斌 京东 李鹏涛 去哪儿网 蔡欢 TalkingData 周海鹏 七牛云存储 韩拓
13:30 互联网企业做智能硬件的那些坑 多IDC部署的电商网站的缓存管理 手机淘宝架构演化实践 京东实时数据平台技术实践 Aurora、Lambda与亚马逊AWS高可用性架构设计最佳实践
暴风影音 毕先春 一号店 张珺 手机淘宝 李敏 京东 刘彦伟 AWS 张荣典
14:30 智能驾驶:过去、现在和未来 天猫“五化”战略下的电商架构 下一代移动云服务架构 大数据应用中数据安全和数据分析 如何在云上构建复杂的企业级应用
中国无人车 张天雷 天猫 何崚(大少) 雅虎 朱凌 英特尔中国研究院 周鑫 青云QingCloud 甘泉
15:45 智能硬件云平台的发展之路与展望 唯品会供应链技术架构的实践 移动音乐服务的竞争力 HBase上搭建广告实时数据处理平台 基于Ceph的UOSCloud 块存储方案
京东 任强 唯品会 梁德伟 虾米 郭彦儒 腾讯 李锐 UnitedStack 王豪迈
16:45 圆桌论坛 从技术细节看美团的架构 猿题库的技术演化 百度集群操作系统Matrix 从漏洞修复看各国网络战防御能力
谁最有可能成为智能硬件的BAT 美团网 夏华夏 猿题库 唐巧 百度 吕毅 知道创宇 杨冀龙

GUI Architectures

There have been many different ways to organize the code for a rich client system. Here I discuss a selection of those that I feel have been the most influential and introduce how they relate to the patterns.

Last significant update: 18 Jul 06


Graphical user interfaces have become a familiar part of our software landscape, both as users and as developers. Looking at it from a design perspective they represent a particular set of problems in system design – problems that have led to a number of different but similar solutions.

My interest is identifying common and useful patterns for application developers to use in rich-client development. I’ve seen various designs in project reviews and also various designs that have been written in a more permanent way. Inside these designs are the useful patterns, but describing them is often not easy. Take Model-View-Controller as an example. It’s often referred to as a pattern, but I don’t find it terribly useful to think of it as a pattern because it contains quite a few different ideas. Different people reading about MVC in different places take different ideas from it and describe these as ‘MVC’. If this doesn’t cause enough confusion you then get the effect of misunderstandings of MVC that develop through a system of Chinese whispers.

In this essay I want to explore a number of interesting architectures and describe my interpretation of their most interesting features. My hope is that this will provide a context for understanding the patterns that I describe.

To some extent you can see this essay as a kind of intellectual history that traces ideas in UI design through multiple architectures over the years. However I must issue a caution about this. Understanding architectures isn’t easy, especially when many of them change and die. Tracing the spread of ideas is even harder, because people read different things from the same architecture. In particular I have not done an exhaustive examination of the architectures I describe. What I have done is referred to common descriptions of the designs. If those descriptions miss things out, I’m utterly ignorant of that. So don’t take my descriptions as an authoritative description of the architectures I describe. Furthermore there things I’ve left out or simplified if I didn’t think they were particularly relevant. Remember my primary interest is the underlying patterns, not in the history of these designs.

(There is something of an exception here, in that I did have access to a running Smalltalk-80 to examine MVC. Again I wouldn’t describe my examination of it as exhaustive, but it did reveal things that common descriptions of it failed to – which even further makes me cautious about descriptions of other architectures that I have here. If you are familiar with one of these architectures and you see I have something important that is incorrect and missing I’d like to know about it. I also thing that a more exhaustive survey of this territory would be a good object of academic study.)


Forms and Controls

I shall begin this exploration with an architecture that is both simple and familiar. It doesn’t have a common name, so for the purposes of this essay I shall call it “Forms and Controls”. It’s a familiar architecture because it was the one encouraged by client-server development environments in the 90’s – tools like Visual Basic, Delphi, and Powerbuilder. It continues to be commonly used, although also often vilified by design geeks like me.

To explore it, and indeed the other architectures, I’ll use a common example. In New England, where I live, there is a government program that monitors the amount of ice-cream particulate in the atmosphere. If the concentration is too low, this indicates that we aren’t eating enough ice-cream – which poses a serious risk to our economy and public order. (I like to use examples that are no less realistic as you usually find in books like this.)

To monitor our ice-cream health, the government has set up monitoring stations all over the New England states. Using complex atmospheric modeling the department sets a target for each monitoring station. Every so often staffers go out on an assessment where they go to various stations and note the actual ice-cream particulate concentrations. This UI allows them to select a station, and enter the date and actual value. The system then calculates and displays the variance from the target. Furthermore if the actual is more than 10% below the target, the variance is highlighted in red, if above by more than 5% it’s highlighted in green.

Figure 1: The UI I’ll use as an example.

As we look at this screen we can see there is an important division as we put it together. The form is specific to our application, but it uses controls that are generic. Most GUI environments come with a hefty bunch of common controls that we can just use in our application. We can build new controls ourselves, and often it’s a good idea to do so, but there is still a distinction between generic reusable controls and specific forms. Even specially written controls can be reused across multiple forms.

The form contains two main responsibilities:

  • Screen layout: defining the arrangement of the controls on the screen, together with their hierarchic structure with other.
  • Form Logic: behavior that cannot be easily programmed into the controls themselves.

Most GUI development environments allow the developer to define screen layout with a graphical editor that allows you to drag and drop the controls onto a space for the form. This pretty much handles the form layout. This way it’s easy to setup a pleasing layout of controls on the form (although it isn’t always the best way to do it – we’ll come to that later.)

The controls display data – in this case about the reading. This data will pretty much always come from somewhere else, in this case let’s assume a SQL database as that’s the environment that most of these client server tools assume. In most situations there are three copies of the data involved:

  • One copy of data lies in the database itself. This copy is the lasting record of the data, so I call it the record state. The record state is usually shared and visible to multiple people via various mechanisms.
  • A further copy lies inside in-memory Record Sets within the application. Most client-server environments provided tools which made this easy to do. This data was only relevant for one particular session between the application and the database, so I call it session state. Essentially this provides a temporary local version of the data that the user works on until they save, or commit it, back to the database – at which point it merges with the record state. I won’t worry about the issues around coordinating record state and session state here: I did go into various techniques in [P of EAA].
  • The final copy lies inside the GUI components themselves. This, strictly, is the data they see on the screen, hence I call it the screen state. It is important to the UI how screen state and session state are kept synchronized.

Keeping screen state and session state synchronized is an important task. A tool that helped make this easier was Data Binding. The idea was that any change to either the control data, or the underlying record set was immediately propagated to the other. So if I alter the actual reading on the screen, the text field control effectively updates the correct column in the underlying record set.

In general data binding gets tricky because if you have to avoid cycles where a change to the control, changes the record set, which updates the control, which updates the record set…. The flow of usage helps avoid these – we load from the session state to the screen when the screen is opened, after that any changes to the screen state propagate back to the session state. It’s unusual for the session state to be updated directly once the screen is up. As a result data binding might not be entirely bi-directional – just confined to initial upload and then propagating changes from the controls to the session state.

Data Binding handles much of the functionality of a client sever application pretty nicely. If I change the actual value the column is updated, even changing the selected station alters the currently selected row in the record set, which causes the other controls to refresh.

Much of this behavior is built in by the framework builders, who look at common needs and make it easy to satisfy them. In particular this is done by setting values, usually called properties, on the controls. The control binds to a particular column in a record set by having its column name set through a simple property editor.

Using data binding, with the right kind of parameterization, can take you a long way. However it can’t take you all the way – there’s almost always some logic that won’t fit with the parameterization options. In this case calculating the variance is an example of something that doesn’t fit in this built in behavior – since it’s application specific it usually lies in the form.

In order for this to work the form needs to be alerted whenever the value of the actual field changes, which requires the generic text field to call some specific behavior on the form. This is a bit more involved than taking a class library and using it through calling it as Inversion of Control is involved.

There are various ways of getting this kind of thing to work – the common one for client-server tool-kits was the notion of events. Each control had a list of events it could raise. Any external object could tell a control that it was interested in an event – in which case the control would call that external object when the event was raised. Essentially this is just a rephrasing of the Observer pattern where the form is observing the control. The framework usually provided some mechanism where the developer of the form could write code in a subroutine that would be invoked when the event occurred. Exactly how the link was made between event and routine varied between platform and is unimportant for this discussion – the point is that some mechanism existed to make it happen.

Once the routine in the form has control, it can then do whatever it needed. It can carry out the specific behavior and then modify the controls as necessary, relying on data binding to propagate any of these changes back to the session state.

This is also necessary because data binding isn’t always present. There is a large market for windows controls, not all of them do data binding. If data binding isn’t present then it’s up to the form to carry out the synchronization. This could work by pulling data out of the record set into the widgets initially, and copying the changed data back to the record set when the save button was pressed.

Let’s examine our editing of the actual value, assuming that data binding is present. The form object holds direct references to the generic controls. There’ll be one for each control on the screen, but I’m just interested in the actual, variance, and target fields here.

Figure 2: Class diagram for forms and controls

The text field declares an event for text changed, when the form assembles the screen during initialization it subscribes itself to that event, binding it a method on itself – here actual_textChanged.

Figure 3: Sequence diagram for changing a genre with forms and controls.

When the user changes the actual value, the text field control raises its event and through the magic of framework binding the actual_textChanged is run. This method gets the text from the actual and target text fields, does the subtraction, and puts the value into the variance field. It also figures out what color the value should be displayed with and adjusts the text color appropriately.

We can summarize the architecture with a few soundbites:

  • Developers write application specific forms that use generic controls.
  • The form describes the layout of controls on it.
  • The form observes the controls and has handler methods to react to interesting events raised by the controls.
  • Simple data edits are handled through data binding.
  • Complex changes are done in the form’s event handling methods.

Model View Controller

Probably the widest quoted pattern in UI development is Model View Controller (MVC) – it’s also the most misquoted. I’ve lost count of the times I’ve seen something described as MVC which turned out to be nothing like it. Frankly a lot of the reason for this is that parts of classic MVC don’t really make sense for rich clients these days. But for the moment we’ll take a look at its origins.

As we look at MVC it’s important to remember that this was one of the first attempts to do serious UI work on any kind of scale. Graphical User Interfaces were not exactly common in the 70’s. The Forms and Controls model I’ve just described came after MVC – I’ve described it first because it’s simpler, not always in a good way. Again I’ll discuss Smalltalk 80’s MVC using the assessment example – but be aware that I am taking a few liberties with the actual details of Smalltalk 80 to do this – for start it was monochrome system.

At the heart of MVC, and the idea that was the most influential to later frameworks, is what I call Separated Presentation. The idea behind Separated Presentation is to make a clear division between domain objects that model our perception of the real world, and presentation objects that are the GUI elements we see on the screen. Domain objects should be completely self contained and work without reference to the presentation, they should also be able to support multiple presentations, possibly simultaneously. This approach was also an important part of the Unix culture, and continues today allowing many applications to be manipulated through both a graphical and command-line interface.

In MVC, the domain element is referred to as the model. Model objects are completely ignorant of the UI. To begin discussing our assessment UI example we’ll take the model as a reading, with fields for all the interesting data upon it. (As we’ll see in a moment the presence of the list box makes this question of what is the model rather more complex, but we’ll ignore that list box for a little bit.)

In MVC I’m assuming a Domain Model of regular objects, rather than the Record Set notion that I had in Forms and Controls. This reflects the general assumption behind the design. Forms and Controls assumed that most people wanted to easily manipulate data from a relational database, MVC assumes we are manipulating regular Smalltalk objects.

The presentation part of MVC is made of the two remaining elements: view and controller. The controller’s job is to take the user’s input and figure out what to do with it.

At this point I should stress that there’s not just one view and controller, you have a view-controller pair for each element of the screen, each of the controls and the screen as a whole. So the first part of reacting to the user’s input is the various controllers collaborating to see who got edited. In the case that’s it’s the actuals text field so that text field controller would now handle what happens next.

Figure 4: Essential dependencies between model, view, and controller. (I call this essential because in fact the view and controller do link to each other directly, but developers mostly don’t use this fact.)

Like later environments, Smalltalk figured out that you wanted generic UI components that could be reused. In this case the component would be the view-controller pair. Both were generic classes, so needed to be plugged into the application specific behavior. There would be an assessment view that would represent the whole screen and define the layout of the lower level controls, in that sense similar to a form in Forms and Controllers. Unlike the form, however, MVC has no event handlers on the assessment controller for the lower level components.

Figure 5: Classes for an MVC version of an ice-cream monitor display

The configuration of the text field comes from giving it a link to its model, the reading, and telling it what what method to invoke when the text changes. This is set to ‘#actual:’ when the screen is initialized (a leading ‘#’ indicates a symbol, or interned string, in Smalltalk). The text field controller then makes a reflective invocation of that method on the reading to make the change. Essentially this is the same mechanism as occurs for Data Binding, the control is linked to the underlying object (row) and told which method (column) it manipulates.

Figure 6: Changing the actual value for MVC.

So there is no overall object observing low level widgets, instead the low level widgets observe the model, which itself handles many of the decision that would be made by the form. In this case, when it comes to figuring out the variance, the reading object itself is the natural place to do that.

Observers do occur in MVC, indeed it’s one of the ideas credited to MVC. In this case all the views and controllers observe the model. When the model changes, the views react. In this case the actual text field view is notified that the reading object has changed, and invokes the method defined as the aspect for that text field – in this case #actual – and sets its value to the result. (It does something similar for the color, but this raises its own specters that I’ll get to in a moment.)

You’ll notice that the text field controller didn’t set the value in the view itself, it updated the model and then just let the observer mechanism take care of the updates. This is quite different to the forms and controls approach where the form updates the control and relies on data binding to update the underlying record-set. These two styles I describe as patterns: Flow Synchronization and Observer Synchronization. These two patterns describe alternative ways of handling the triggering of synchronization between screen state and session state. Forms and Controls do it through the flow of the application manipulating the various controls that need to be updated directly. MVC does it by making updates on the model and then relying of the observer relationship to update the views that are observing that model.

Flow Synchronization is even more apparent when data binding isn’t present. If the application needs to do synchronization itself, then it was typically done at important point in the application flow – such as when opening a screen or hitting the save button.

One of the consequences of Observer Synchronization is that the controller is very ignorant of what other widgets need to change when the user manipulates a particular widget. While the form needs to keep tabs on things and make sure the overall screen state is consistent on a change, which can get pretty involved with complex screens, the controller in Observer Synchronization can ignore all this.

This useful ignorance becomes particularly handy if there are multiple screens open viewing the same model objects. The classic MVC example was a spreadsheet like screen of data with a couple of different graphs of that data in separate windows. The spreadsheet window didn’t need to be aware of what other windows were open, it just changed the model and Observer Synchronization took care of the rest. With Flow Synchronization it would need some way of knowing which other windows were open so it tell them to refresh.

While Observer Synchronization is nice it does have a downside. The problem with Observer Synchronization is the core problem of the observer pattern itself – you can’t tell what is happening by reading the code. I was reminded of this very forcefully when trying to figure out how some Smalltalk 80 screens worked. I could get so far by reading the code, but once the observer mechanism kicked in the only way I could see what was going on was via a debugger and trace statements. Observer behavior is hard to understand and debug because it’s implicit behavior.

While the different approaches to synchronization are particularly noticeable from looking at the sequence diagram, the most important, and most influential, difference is MVC’s use of Separated Presentation. Calculating the variance between actual and target is domain behavior, it is nothing to do with the UI. As a result following Separated Presentation says we should place this in the domain layer of the system – which is exactly what the reading object represents. When we look at the reading object, the variance feature makes complete sense without any notion of the user interface.

At this point, however, we can begin to look at some complications. There’s two areas where I’ve skipped over some awkward points that get in the way of MVC theory. The first problem area is to deal with setting the color of the variance. This shouldn’t really fit into a domain object, as the color by which we display a value isn’t part of the domain. The first step in dealing with this is to realize that part of the logic is domain logic. What we are doing here is making a qualitative statement about the variance, which we could term as good (over by more than 5%), bad (under by more than 10%), and normal (the rest). Making that assessment is certainly domain language, mapping that to colors and altering the variance field is view logic. The problem lies in where we put this view logic – it’s not part of our standard text field.

This kind of problem was faced by early smalltalkers and they came up with some solutions. The solution I’ve shown above is the dirty one – compromise some of the purity of the domain in order to make things work. I’ll admit to the occasional impure act – but I try not to make a habit of it.

We could do pretty much what Forms and Controls does – have the assessment screen view observe the variance field view, when the variance field changes the assessment screen could react and set the variance field’s text color. Problems here include yet more use of the observer mechanism – which gets exponentially more complicated the more you use it – and extra coupling between the various views.

A way I would prefer is to build a new type of the UI control. Essentially what we need is a UI control that asks the domain for a qualitative value, compares it to some internal table of values and colors, and sets the font color accordingly. Both the table and message to ask the domain object would be set by the assessment view as it’s assembling itself, just as it sets the aspect for the field to monitor. This approach could work very well if I can easily subclass text field to just add the extra behavior. This obvious depends on how well the components are designed to enable sub-classing – Smalltalk made it very easy – other environments can make it more difficult.

Figure 7: Using a special subclass of text field that can be configured to determine the color.

The final route is to make a new kind of model object, one that’s oriented around around the screen, but is still independent of the widgets. It would be the model for the screen. Methods that were the same as those on the reading object would just be delegated to the reading, but it would add methods that supported behavior relevant only to the UI, such as the text color.

Figure 8: Using an intermediate Presentation Model to handle view logic.

This last option works well for a number of cases and, as we’ll see, became a common route for Smalltalkers to follow – I call this a Presentation Model because it’s a model that is really designed for and thus part of the presentation layer.

The Presentation Model works well also for another presentation logic problem – presentation state. The basic MVC notion assumes that all the state of the view can be derived from the state of the model. In this case how do we figure out which station is selected in the list box? The Presentation Model solves this for us by giving us a place to put this kind of state. A similar problem occurs if we have save buttons that are only enabled if data has changed – again that’s state about our interaction with the model, not the model itself.

So now I think it’s time for some soundbites on MVC.

  • Make a strong separation between presentation (view & controller) and domain (model) – Separated Presentation.
  • Divide GUI widgets into a controller (for reacting to user stimulus) and view (for displaying the state of the model). Controller and view should (mostly) not communicate directly but through the model.
  • Have views (and controllers) observe the model to allow multiple widgets to update without needed to communicate directly – Observer Synchronization.

VisualWorks Application Model

As I’ve discussed above, Smalltalk 80’s MVC was very influential and had some excellent features, but also some faults. As Smalltalk developed in the 80’s and 90’s this led to some significant variations on the classic MVC model. Indeed one could almost say that MVC disappeared, if you consider the view/controller separation to be an essential part of MVC – which the name does imply.

The things that clearly worked from MVC were Separated Presentation and Observer Synchronization. So these stayed as Smalltalk developed – indeed for many people they were the key element of MVC.

Smalltalk also fragmented in these years. The basic ideas of Smalltalk, including the (minimal) language definition remained the same, but we saw multiple Smalltalks develop with different libraries. From a UI perspective this became important as several libraries started using native widgets, the controls used by the Forms and Controls style.

Smalltalk was originally developed by Xerox Parc labs and they span off a separate company, ParcPlace, to market and develop Smalltalk. ParcPlace Smalltalk was called VisualWorks and made a point of being a cross-platform system. Long before Java you could take a Smalltalk program written in Windows and run it right away on Solaris. As a result VisualWorks didn’t use native widgets and kept the GUI completely within Smalltalk.

In my discussion of MVC I finished with some problems of MVC – particularly how to deal with view logic and view state. VisualWorks refined its framework to deal with this by coming up with a construct called the Application Model – a construct that moves towards Presentation Model. The idea of using something like a Presentation Model wasn’t new to VisualWorks – the original Smalltalk 80 code browser was very similar, but the VisualWorks Application Model baked it fully into the framework.

A key element of this kind of smalltalk was the idea of turning properties into objects. In our usual notion of objects with properties we think of a Person object having properties for name and address. These properties may be fields, but could be something else. There is usually a standard convention for accessing the properties: in Java we would see temp = aPerson.getName() and aPerson.setName("martin"), in C# it would temp = aPerson.name and aPerson.name = "martin".

A Property Object changes this by having the property return an object that wraps the actual value. So in VisualWorks when we ask for a name we get back a wrapping object. We then get the actual value by asking the wrapping object for its value. So accessing a person’s name would use temp = aPerson name value and aPerson name value: 'martin'

Property objects make the mapping between widgets and model a little easier. We just have to tell the widget what message to send to get the corresponding property, and the widget knows to access the proper value using value and value:. VisualWorks’s property objects also allow you to set up observers with the message onChangeSend: aMessage to: anObserver.

You won’t actually find a class called property object in Visual Works. Instead there were a number of classes that followed the value/value:/onChangeSend: protocol. The simplest is the ValueHolder – which just contains its value. More relevant to this discussion is the AspectAdaptor. The AspectAdaptor allowed a property object to wrap a property of another object completely. This way you could define a property object on a PersonUI class that wrapped a property on a Person object by code like

adaptor := AspectAdaptor subject: person
adaptor forAspect: #name
adaptor onChangeSend: #redisplay to: self

So let’s see how the application model fits into our running example.

Figure 9: Class diagram for visual works application model on the running example

The main difference between using an application model and classic MVC is that we now have an intermediate class between the domain model class (Reader) and the widget – this is the application model class. The widgets don’t access the domain objects directly – their model is the application model. Widgets are still broken down into views and controllers, but unless you’re building new widgets that distinction isn’t important.

When you assemble the UI you do so in a UI painter, while in that painter you set the aspect for each widget. The aspect corresponds to a method on the application model that returns a property object.

Figure 10: Sequence diagram showing how updating the actual value updates the variance text.

Figure 10 shows how the basic update sequence works. When I change a value in text field, that field then updates the value in the property object inside the application model. That update follows through to the underlying domain object, updating its actual value.

At this point the observer relationships kick in. We need to set things up so that updating the actual value causes the reading to indicate that it has changed. We do this by putting a call in the modifier for actual to indicate that the reading object has changed – in particular that the variance aspect has changed. When setting up the aspect adaptor for variance it’s easy to tell it to observe the reader, so it picks up the update message which it then forwards to its text field. The text field then initiates getting a new value, again through the aspect adaptor.

Using the application model and property objects like this helps us wire up the updates without having to write much code. It also supports fine-grained synchronization (which I don’t think is a good thing).

Application models allow us to separate behavior and state that’s particular to the UI from real domain logic. So one of the problems I mentioned earlier, holding the currently selected item in a list, can be solved by using a particular kind of aspect adaptor that wraps the domain model’s list and also stores the currently selected item.

The limitation of all this, however, is that for more complex behavior you need to construct special widgets and property objects. As an example the provided set of objects don’t provide a way to link the text color of the variance to the degree of variance. Separating the application and domain models does allow us to separate the decision making in the right way, but then to use widgets observing aspect adapters we need to make some new classes. Often this was seen as too much work, so we could make this kind of thing easier by allowing the application model to access the widgets directly, as in Figure 11.

Figure 11: Application Model updates colors by manipulating widgets directly.

Directly updating the widgets like this is not part of Presentation Model, which is why the visual works application model isn’t truly a Presentation Model. This need to manipulate the widgets directly was seen by many as a bit of dirty work-around and helped develop the Model-View-Presenter approach.

So now the soundbites on Application Model

  • Followed MVC in using Separated Presentation and Observer Synchronization.
  • Introduced an intermediate application model as a home for presentation logic and state – a partial development of Presentation Model.
  • Widgets do not observe domain objects directly, instead they observe the application model.
  • Made extensive use of Property Objects to help connect the various layers and to support the fine grained synchronization using observers.
  • It wasn’t the default behavior for the application model to manipulate widgets, but it was commonly done for complicated cases.

Model-View-Presenter (MVP)

MVP is an architecture that first appeared in IBM and more visibly at Taligent the 1990’s. It’s most commonly referred via the Potel paper. The idea was further popularized and described by the developers of Dolphin Smalltalk. As we’ll see the two descriptions don’t entirely mesh but the basic idea underneath it has become popular.

To approach MVP I find it helpful to think about a significant mismatch between two strands of UI thinking. On the one hand is the Forms and Controller architecture which was mainstream approach to UI design, on the other is MVC and its derivatives. The Forms and Controls model provides a design that is easy to understand and makes a good separation between reusable widgets and application specific code. What it lacks, and MVC has so strongly, is Separated Presentation and indeed the context of programming using a Domain Model. I see MVP as a step towards uniting these streams, trying to take the best from each.

The first element of Potel is to treat the view as structure of widgets, widgets that correspond to the controls of the Forms and Controls model and remove any view/controller separation. The view of MVP is a structure of these widgets. It doesn’t contain any behavior that describes how the widgets react to user interaction.

The active reaction to user acts lives in a separate presenter object. The fundamental handlers for user gestures still exist in the widgets, but these handlers merely pass control to the presenter.

The presenter then decides how to react to the event. Potel discusses this interaction primarily in terms of actions on the model, which it does by a system of commands and selections. A useful thing to highlight here is the approach of packaging all the edits to the model in a command – this provides a good foundation for providing undo/redo behavior.

As the Presenter updates the model, the view is updated through the same Observer Synchronization approach that MVC uses.

The Dolphin description is similar. Again the main similarity is the presence of the presenter. In the Dolphin description there isn’t the structure of the presenter acting on the model through commands and selections. There is also explicit discussion of the presenter manipulating the view directly. Potel doesn’t talk about whether presenters should do this or not, but for Dolphin this ability was essential to overcoming the kind of flaw in Application Model that made it awkward for me to color the text in the variation field.

One of the variations in thinking about MVP is the degree to which the presenter controls the widgets in the view. On one hand there is the case where all view logic is left in the view and the presenter doesn’t get involved in deciding how to render the model. This style is the one implied by Potel. The direction behind Bower and McGlashan was what I’m calling Supervising Controller, where the view handles a good deal of the view logic that can be describe declaratively and the presenter then comes in to handle more complex cases.

You can also move all the way to having the presenter do all the manipulation of the widgets. This style, which I call Passive View isn’t part of the original descriptions of MVP but got developed as people explored testability issues. I’m going to talk about that style later, but that style is one of the flavors of MVP.

Before I contrast MVP with what I’ve discussed before I should mention that both MVP papers here do this too – but not quite with the same interpretation I have. Potel implies that MVC controllers were overall coordinators – which isn’t how I see them. Dolphin talks a lot about issues in MVC, but by MVC they mean the VisualWorks Application Model design rather than classic MVC that I’ve described (I don’t blame them for that – trying to get information on classic MVC isn’t easy now let alone then.)

So now it’s time for some contrasts:

  • Forms and Controls: MVP has a model and the presenter is expected to manipulate this model with Observer Synchronizationthen updating the view. Although direct access to the widgets is allowed, this should be in addition to using the model not the first choice.
  • MVC: MVP uses a Supervising Controller to manipulate the model. Widgets hand off user gestures to the Supervising Controller. Widgets aren’t separated into views and controllers. You can think of presenters as being like controllers but without the initial handling of the user gesture. However it’s also important to note that presenters are typically at the form level, rather than the widget level – this is perhaps an even bigger difference.
  • Application Model: Views hand off events to the presenter as they do to the application model. However the view may update itself directly from the domain model, the presenter doesn’t act as a Presentation Model. Furthermore the presenter is welcome to directly access widgets for behaviors that don’t fit into the Observer Synchronization.

There are obvious similarities between MVP presenters and MVC controllers, and presenters are a loose form of MVC controller. As a result a lot of designs will follow the MVP style but use ‘controller’ as a synonym for presenter. There’s a reasonable argument for using controller generally when we are talking about handling user input.

Figure 12: Sequence diagram of the actual reading update in MVP.

Let’s look at an MVP (Supervising Controller) version of the ice-cream monitor ( Figure 12). It starts much the same as the Forms and Controls version – the actual text field raises an event when its text is changed, the presenter listens to this event and gets the new value of the field. At this point the presenter updates the reading domain object, which the variance field observes and updates its text with. The last part is the setting of the color for the variance field, which is done by the presenter. It gets the category from the reading and then updates the color of the variance field.

Here are the MVP soundbites:

  • User gestures are handed off by the widgets to a Supervising Controller.
  • The presenter coordinates changes in a domain model.
  • Different variants of MVP handle view updates differently. These vary from using Observer Synchronization to having the presenter doing all the updates with a lot of ground in-between.

Humble View

In the past few years there’s been a strong fashion for writing self-testing code. Despite being the last person to ask about fashion sense, this is a movement that I’m thoroughly immersed in. Many of my colleagues are big fans of xUnit frameworks, automated regression tests, Test-Driven Development, Continuous Integration and similar buzzwords.

When people talk about self-testing code user-interfaces quickly raise their head as a problem. Many people find that testing GUIs to be somewhere between tough and impossible. This is largely because UIs are tightly coupled into the overall UI environment and difficult to tease apart and test in pieces.

Sometimes this test difficulty is over-stated. You can often get surprisingly far by creating widgets and manipulating them in test code. But there are occasions where this is impossible, you miss important interactions, there are threading issues, and the tests are too slow to run.

As a result there’s been a steady movement to design UIs in such a way that minimizes the behavior in objects that are awkward to test. Michael Feathers crisply summed up this approach in The Humble Dialog Box. Gerard Meszaros generalized this notion to idea of a Humble Object – any object that is difficult to test should have minimal behavior. That way if we are unable to include it in our test suites we minimize the chances of an undetected failure.

The Humble Dialog Box paper uses a presenter, but in a much deeper way than the original MVP. Not just does the presenter decide how to react to user events, it also handles the population of data in the UI widgets themselves. As a result the widgets no longer have, nor need, visibility to the model; they form a Passive View, manipulated by the presenter.

This isn’t the only way to make the UI humble. Another approach is to use Presentation Model, although then you do need a bit more behavior in the widgets, enough for the widgets to know how to map themselves to the Presentation Model.

The key to both approaches is that by testing the presenter or by testing the presentation model, you test most of the risk of the UI without having to touch the hard-to-test widgets.

With Presentation Model you do this by having all the actual decision making made by the Presentation Model. All user events and display logic is routed to the Presentation Model, so that all the widgets have to do is map themselves to properties of the Presentation Model. You can then test most of the behavior of the Presentation Model without any widgets being present – the only remaining risk lies in the widget mapping. Provided that this is simple you can live with not testing it. In this case the screen isn’t quite as humble as with the Passive View approach, but the difference is small.

Since Passive View makes the widgets entirely humble, without even a mapping present, Passive View eliminates even the small risk present with Presentation Model. The cost however is that you need a Test Double to mimic the screen during your test runs – which is extra machinery you need to build.

A similar trade-off exists with Supervising Controller. Having the view do simple mappings introduces some risk but with the benefit (as with Presentation Model) of being able to specify simple mapping declaratively. Mappings will tend to be smaller for Supervising Controller than for Presentation Model as even complex updates will be determined by the Presentation Model and mapped, while a Supervising Controller will manipulate the widgets for complex cases without any mapping involved.



Further Reading

For recent articles that develop these ideas further, take a look at my bliki.


Acknowledgements

Vassili Bykov generously let me have a copy of Hobbes – his implementation of Smalltalk-80 version 2 (from the early 1980’s)which runs in modern VisualWorks. This provided me with a live example of Model-View-Controller which was extremely helpful in answering detailed questions of how it worked and how it was used in the default image. Many people in those days considered it impractical to use a virtual machine. I wonder what our prior selves would have thought to see me running Smalltalk 80 in a virtual machine written in VisualWorks running in the VisualWorks virtual machine on Windows XP running in a VMware virtual machine running on Ubuntu.

from:http://www.martinfowler.com/eaaDev/uiArchs.html

知名网站架构整理

Here are some of the favorite posts on HighScalability…

from:http://highscalability.com/all-time-favorites/

Tumblr:150亿月浏览量背后的架构挑战

导读:和许多新兴的网站一样,著名的轻博客服务 Tumblr 在急速发展中面临了系统架构的瓶颈。每天 5 亿次浏览量,峰值每秒 4 万次请求,每天 3TB 新的数据存储,超过 1000 台服务器,这样的情况下如何保证老系统平稳运行,平稳过渡到新的系统,Tumblr 正面临巨大的挑战。近日,HighScalability 网站的 Todd Hoff 采访了该公司的分布式系统工程师 Blake Matheny,撰文系统介绍了网站的架构,内容很有价值。我们也非常希望国内的公司和团队多做类似分享,贡献于社区的同时,更能提升自身的江湖地位,对招聘、业务发展都好处多多。

Tumblr 每月页面浏览量超过 150 亿次,已经成为火爆的博客社区。用户也许喜欢它的简约、美丽,对用户体验的强烈关注,或是友好而忙碌的沟通方式,总之,它深得人们的喜爱。

每月超过 30% 的增长当然不可能没有挑战,其中可靠性问题尤为艰巨。每天 5 亿次浏览量,峰值每秒 4 万次请求,每天 3TB 新的数据存储,并运行于超过 1000 台服务器上,所有这些帮助 Tumblr 实现巨大的经营规模。

创业公司迈向成功,都要迈过危险的迅速发展期这道门槛。寻找人才,不断改造基础架构,维护旧的架构,同时要面对逐月大增的流量,而且曾经只有 4 位工程师。这意味着必须艰难地选择应该做什么,不该做什么。这就是 Tumblr 的状况。好在现在已经有 20 位工程师了,可以有精力解决问题,并开发一些有意思的解决方案。

Tumblr 最开始是非常典型的 LAMP 应用。目前正在向分布式服务模型演进,该模型基于 ScalaHBaseRedis(著名开源K-V存储方案)、Kafka(Apache 项目,出自 LinkedIn 的分布式发布-订阅消息系统)、Finagle(由 Twitter 开源的容错、协议中立的 RPC 系统),此外还有一个有趣的基于 Cell 的架构,用来支持 Dashboard(CSDN 注:Tumblr 富有特色的用户界面,类似于微博的时间轴)。

Tumblr 目前的最大问题是如何改造为一个大规模网站。系统架构正在从 LAMP 演进为最先进的技术组合,同时团队也要从小的创业型发展为全副武装、随时待命的正规开发团队,不断创造出新的功能和基础设施。下面就是 Blake Matheny 对 Tumblr 系统架构情况的介绍。

网站地址

http://www.tumblr.com/

主要数据

  • 每天 5 亿次 PV(页面访问量)
  • 每月超过 150 亿 PV
  • 约 20 名工程师
  • 峰值请求每秒近 4 万次
  • 每天超过 1TB 数据进入 Hadoop 集群
  • MySQL/HBase/Redis/memcache 每天生成若干 TB 数据
  • 每月增长 30%
  • 近 1000 硬件节点用于生产环境
  • 平均每位工程师每月负责数以亿计的页面访问
  • 每天上传大约 50GB 的文章,每天跟帖更新数据大约2.7TB(CSDN 注:这两个数据的比例看上去不太合理,据 Tumblr 数据科学家 Adam Laiacano 在 Twitter 上解释,前一个数据应该指的是文章的文本内容和元数据,不包括存储在 S3 上的多媒体内容)

软件环境

  • 开发使用 OS X,生产环境使用 Linux(CentOS/Scientific)
  • Apache
  • PHP, Scala, Ruby
  • Redis, HBase, MySQL
  • Varnish, HAProxy, nginx
  • memcache, Gearman(支持多语言的任务分发应用框架), Kafka, Kestrel(Twitter 开源的分布式消息队列系统), Finagle
  • Thrift, HTTP
  • Func——一个安全、支持脚本的远程控制框架和 API
  • Git, Capistrano(多服务器脚本部署工具), Puppet, Jenkins

硬件环境

  • 500 台 Web 服务器
  • 200 台数据库服务器(47 pool,20 shard)
  • 30 台 memcache 服务器
  • 22 台 Redis 服务器
  • 15 台 Varnish 服务器
  • 25 台 HAproxy 节点
  • 8 台 nginx 服务器
  • 14 台工作队列服务器(Kestrel + Gearman)

架构

1. 相对其他社交网站而言,Tumblr 有其独特的使用模式:

  • 每天有超过 5 千万篇文章更新,平均每篇文章的跟帖又数以百计。用户一般只有数百个粉丝。这与其他社会化网站里少数用户有几百万粉丝非常不同,使得 Tumblr 的扩展性极具挑战性。
  • 按用户使用时间衡量,Tumblr 已经是排名第二的社会化网站。内容的吸引力很强,有很多图片和视频,文章往往不短,一般也不会太长,但允许写得很长。文章内容往往比较深入,用户会花费更长的时间来阅读。
  • 用户与其他用户建立联系后,可能会在 Dashboard 上往回翻几百页逐篇阅读,这与其他网站基本上只是部分信息流不同。
  • 用户的数量庞大,用户的平均到达范围更广,用户较频繁的发帖,这些都意味着有巨量的更新需要处理。

2. Tumblr 目前运行在一个托管数据中心中,已在考虑地域上的分布性。

3. Tumblr 作为一个平台,由两个组件构成:公共 Tumblelogs 和 Dashboard

  • 公共 Tumblelogs 与博客类似(此句请 Tumblr 用户校正),并非动态,易于缓存
  • Dashboard 是类似于 Twitter 的时间轴,用户由此可以看到自己关注的所有用户的实时更新。与博客的扩展性不同,缓存作用不大,因为每次请求都不同,尤其是活跃的关注者。而且需要实时而 且一致,文章每天仅更新 50GB,跟帖每天更新2.7TB,所有的多媒体数据都存储在 S3 上面。
  • 大多数用户以 Tumblr 作为内容浏览工具,每天浏览超过 5 亿个页面,70% 的浏览来自 Dashboard。
  • Dashboard 的可用性已经不错,但 Tumblelog 一直不够好,因为基础设施是老的,而且很难迁移。由于人手不足,一时半会儿还顾不上。

老的架构

Tumblr 最开始是托管在 Rackspace 上的,每个自定义域名的博客都有一个A记录。当 2007 年 Rackspace 无法满足其发展速度不得不迁移时,大量的用户都需要同时迁移。所以他们不得不将自定义域名保留在 Rackspace,然后再使用 HAProxy 和 Varnish 路由到新的数据中心。类似这样的遗留问题很多。

开始的架构演进是典型的 LAMP 路线:

  • 最初用 PHP 开发,几乎所有程序员都用 PHP
  • 最初是三台服务器:一台 Web,一台数据库,一台 PHP
  • 为了扩展,开始使用 memcache,然后引入前端 cache,然后在 cache 前再加 HAProxy,然后是 MySQL sharding(非常奏效)
  • 采用“在单台服务器上榨出一切”的方式。过去一年已经用C开发了两个后端服务:ID 生成程序Staircar(用 Redis 支持 Dashboard 通知)

Dashboard 采用了“扩散-收集”方式。当用户访问 Dashboard 时将显示事件,来自所关注的用户的事件是通过拉然后显示的。这样支撑了 6 个月。由于数据是按时间排序的,因此 sharding 模式不太管用。

新的架构

由于招人和开发速度等原因,改为以 JVM 为中心。目标是将一切从 PHP 应用改为服务,使应用变成请求鉴别、呈现等诸多服务之上的薄层。

这其中,非常重要的是选用了 Scala 和 Finagle

  • 在团队内部有很多人具备 Ruby 和 PHP 经验,所以 Scala 很有吸引力。
  • Finagle 是选择 Scala 的重要因素之一。这个来自 Twitter 的库可以解决大多数分布式问题,比如分布式跟踪、服务发现、服务注册等。
  • 转到 JVM 上之后,Finagle 提供了团队所需的所有基本功能(Thrift, ZooKeeper 等),无需再开发许多网络代码,另外,团队成员认识该项目的一些开发者。
  • Foursquare 和 Twitter 都在用 Finagle,Meetup 也在用 Scala。
  • 应用接口与 Thrift 类似,性能极佳。
  • 团队本来很喜欢 Netty(Java 异步网络应用框架,2月 4 日刚刚发布3.3.1最终版),但不想用 Java,Scala 是不错的选择。
  • 选择 Finagle 是因为它很酷,还认识几个开发者。

之所以没有选择 Node.js,是因为以 JVM 为基础更容易扩展。Node 的发展为时尚短,缺乏标准、最佳实践以及大量久经测试的代码。而用 Scala 的话,可以使用所有 Java 代码。虽然其中并没有多少可扩展的东西,也无法解决 5 毫秒响应时间、49秒 HA、4万每秒请求甚至有时每秒 40 万次请求的问题。但是,Java 的生态链要大得多,有很多资源可以利用。

内部服务从C/libevent 为基础正在转向 Scala/Finagle 为基础。

开始采用新的NoSQL 存储方案如 HBase 和 Redis。但大量数据仍然存储在大量分区的 MySQL 架构中,并没有用 HBase 代替 MySQL。HBase 主要支持短地址生产程序(数以十亿计)还有历史数据和分析,非常结实。此外,HBase 也用于高写入需求场景,比如 Dashboard 刷新时一秒上百万的写入。之所以还没有替换 HBase,是因为不能冒业务上风险,目前还是依靠人来负责更保险,先在一些小的、不那么关键的项目中应用,以获得经验。MySQL 和时间序列数据 sharding(分片)的问题在于,总有一个分片太热。另外,由于要在 slave 上插入并发,也会遇到读复制延迟问题。

此外,还开发了一个公用服务框架

  • 花了很多时间解决分布式系统管理这个运维问题。
  • 为服务开发了一种 Rails scaffolding,内部用模板来启动服务。
  • 所有服务从运维的角度来看都是一样的,所有服务检查统计数据、监控、启动和停止的方式都一样。
  • 工具方面,构建过程围绕 SBT(一个 Scala 构建工具),使用插件和辅助程序管理常见操作,包括在 Git 里打标签,发布到代码库等等。大多数程序员都不用再操心构建系统的细节了。

200台数据库服务器中,很多是为了提高可用性而设,使用的是常规硬件,但 MTBF(平均故障间隔时间)极低。故障时,备用充足。

为了支持 PHP 应用有 6 个后端服务,并有一个小组专门开发后端服务。新服务的发布需要两到三周,包 括 Dashboard 通知、Dashboard 二级索引、短地址生成、处理透明分片的 memcache 代理。其中在 MySQL 分片上耗时很多。虽然在纽约本地非常热,但并没有使用 MongoDB,他们认为 MySQL 的可扩展性足够了。

Gearman用于会长期运行无需人工干预的工作。

可用性是以达到范围(reach)衡量的。用户能够访问自定义域或者 Dashboard 吗?也会用错误率。

历史上总是解决那些最高优先级的问题,而现在会对故障模式系统地分析和解决,目的是从用户和应用的角度来定成功指标。(后一句原文似乎不全)

最开始 Finagle 是用于 Actor 模型的,但是后来放弃了。对于运行后无需人工干预的工作,使用任务队列。而且 Twitter 的 util 工具库中有 Future 实现,服务都是用 Future(Scala 中的无参数函数,在与函数关联的并行操作没有完成时,会阻塞调用方)实现的。当需要线程池的时候,就将 Future 传入 Future 池。一切都提交到 Future 池进行异步执行。

Scala 提倡无共享状态。由于已经在 Twitter 生产环境中经过测试,Finagle 这方面应该是没有问题的。使用 Scala 和 Finagle 中的结构需要避免可变状态,不使用长期运行的状态机。状态从数据库中拉出、使用再写回数据库。这样做的好处是,开发人员不需要操心线程和锁。

22台Redis服务器,每台的都有8-32个实例,因此线上同时使用了 100 多个 Redis 实例。

  • Redis 主要用于 Dashboard 通知的后端存储。
  • 所谓通知就是指某个用户 like 了某篇文章这样的事件。通知会在用户的 Dashboard 中显示,告诉他其他用户对其内容做了哪些操作。
  • 高写入率使 MySQL 无法应对。
  • 通知转瞬即逝,所以即使遗漏也不会有严重问题,因此 Redis 是这一场景的合适选择。
  • 这也给了开发团队了解 Redis 的机会。
  • 使用中完全没有发现 Redis 有任何问题,社区也非常棒。
  • 开发了一个基于 Scala Futures 的 Redis 接口,该功能现在已经并入了 Cell 架构。
  • 短地址生成程序使用 Redis 作为一级 Cache,HBase 作为永久存储。
  • Dashboard 的二级索引是以 Redis 为基础开发的。
  • Redis 还用作 Gearman 的持久存储层,使用 Finagle 开发的 memcache 代理。
  • 正在缓慢地从 memcache 转向 Redis。希望最终只用一个 cache 服务。性能上 Redis 与 memcache 相当。

内部的 firehose(通信管道)

  • 内部的应用需要活跃的信息流通道。这些信息包括用户创建/删除的信息,liking/unliking 的提示,等等。挑战在于这些数据要实时的分布式处理。我们希望能够检测内部运行状况,应用的生态系统能够可靠的生长,同时还需要建设分布式系统的控制中心。
  • 以前,这些信息是基于 Scribe (Facebook 开源的分布式日志系统。)/Hadoop 的分布式系统。服务会先记录在 Scribe 中,并持续的长尾形式写入,然后将数据输送给应用。这种模式可以立即停止伸缩,尤其在峰值时每秒要创建数以千计的信息。不要指望人们会细水长流式的发布文 件和 grep。
  • 内部的 firehose 就像装载着信息的大巴,各种服务和应用通过 Thrift 与消防管线沟通。(一个可伸缩的跨语言的服务开发框架。)
  • LinkedIn 的 Kafka 用于存储信息。内部人员通过 HTTP 链接 firehose。经常面对巨大的数据冲击,采用 MySQL 显然不是一个好主意,分区实施越来越普遍。
  • firehose 的模型是非常灵活的,而不像 Twitter 的 firehose 那样数据被假定是丢失的。
  • firehose 的信息流可以及时的回放。他保留一周内的数据,可以调出这期间任何时间点的数据。
  • 支持多个客户端连接,而且不会看到重复的数据。每个客户端有一个 ID。Kafka 支持客户群,每个群中的客户都用同一个 ID,他们不会读取重复的数据。可以创建多个客户端使用同一个 ID,而且不会看到重复的数据。这将保证数据的独立性和并行处理。Kafka 使用 ZooKeeper (Apache 推出的开源分布式应用程序协调服务。)定期检查用户阅读了多少。

为 Dashboard 收件箱设计的 Cell 架构

  • 现在支持 Dashboard 的功能的分散-集中架构非常受限,这种状况不会持续很久。
  • 解决方法是采用基于 Cell 架构的收件箱模型,与 Facebook Messages 非常相似。
  • 收件箱与分散-集中架构是对立的。每一位用户的 dashboard 都是由其追随者的发言和行动组成的,并按照时间顺序存储。
  • 就因为是收件箱就解决了分散-集中的问题。你可以会问到底在收件箱中放了些什么,让其如此廉价。这种方式将运行很长时间。
  • 重写 Dashboard 非常困难。数据已经分布,但是用户局部升级产生的数据交换的质量还没有完全搞定。
  • 数据量是非常惊人的。平均每条消息转发给上百个不同的用户,这比 Facebook 面对的困难还要大。大数据+高分布率+多个数据中心。
  • 每秒钟上百万次写入,5万次读取。没有重复和压缩的数据增长为2.7TB,每秒百万次写入操作来自 24 字节行键。
  • 已经流行的应用按此方法运行。
  • cell
  • 每个 cell 是独立的,并保存着一定数量用户的全部数据。在用户的 Dashboard 中显示的所有数据也在这个 cell 中。
  • 用户映射到 cell。一个数据中心有很多 cell。
  • 每个 cell 都有一个 HBase 的集群,服务集群,Redis 的缓存集群。
  • 用户归属到 cell,所有 cell 的共同为用户发言提供支持。
  • 每个 cell 都基于 Finagle(Twitter 推出的异步的远程过程调用库),建设在 HBase 上,Thrift 用于开发与 firehose 和各种请求与数据库的链接。(请纠错)
  • 一个用户进入 Dashboard,其追随者归属到特定的 cell,这个服务节点通过 HBase 读取他们的 dashboard 并返回数据。
  • 后台将追随者的 dashboard 归入当前用户的 table,并处理请求。
  • Redis 的缓存层用于 cell 内部处理用户发言。
  • 请求流:用户发布消息,消息将被写入 firehose,所有的 cell 处理这条消息并把发言文本写入数据库,cell 查找是否所有发布消息追随者都在本 cell 内,如果是的话,所有追随者的收件箱将更新用户的 ID。(请纠错
  • cell 设计的优点:
  • 大规模的请求被并行处理,组件相互隔离不会产生干扰。 cell 是一个并行的单位,因此可以任意调整规格以适应用户群的增长。
  • cell 的故障是独立的。一个 Cell 的故障不会影响其他 cell。
  • cell 的表现非常好,能够进行各种升级测试,实施滚动升级,并测试不同版本的软件。
  • 关键的思想是容易遗漏的:所有的发言都是可以复制到所有的 cell。
  • 每个 cell 中存储的所有发言的单一副本。 每个 cell 可以完全满足 Dashboard 呈现请求。应用不用请求所有发言者的 ID,只需要请求那些用户的 ID。(“那些用户”所指不清,请指正。)他可以在 dashboard 返回内容。每一个 cell 都可以满足 Dashboard 的所有需求,而不需要与其他 cell 进行通信。
  • 用到两个 HBase table :一个 table 用于存储每个发言的副本,这个 table 相对较小。在 cell 内,这些数据将与存储每一个发言者 ID。第二个 table 告诉我们用户的 dashboard 不需要显示所有的追随者。当用户通过不同的终端访问一个发言,并不代表阅读了两次。收件箱模型可以保证你阅读到。
  • 发言并不会直接进入到收件箱,因为那实在太大了。所以,发言者的 ID 将被发送到收件箱,同时发言内容将进入 cell。这个模式有效的减少了存储需求,只需要返回用户在收件箱中浏览发言的时间。而缺点是每一个 cell 保存所有的发言副本。令人惊奇的是,所有发言比收件箱中的镜像要小。(请纠错)每天每个 cell 的发言增长 50GB,收件箱每天增长2.7TB。用户消耗的资源远远超过他们制造的。
  • 用户的 dashboard 不包含发言的内容,只显示发言者的 ID,主要的增长来自 ID。(请 Tumblr 用户纠错)
  • 当追随者改变时,这种设计方案也是安全的。因为所有的发言都保存在 cell 中了。如果只有追随者的发言保存在 cell 中,那么当追随者改变了,将需要一些回填工作。
  • 另外一种设计方案是采用独立的发言存储集群。这种设计的缺点是,如果群集出现故障,它会影响整个网站。因此,使用 cell 的设计以及后复制到所有 cell 的方式,创建了一个非常强大的架构。
  • 一个用户拥有上百万的追随者,这带来非常大的困难,有选择的处理用户的追随者以及他们的存取模式(见 Feeding Frenzy
  • 不同的用户采用不同并且恰当的存取模式和分布模型,两个不同的分布模式包括:一个适合受欢迎的用户,一个使用大众。
  • 依据用户的类型采用不同的数据处理方式,活跃用户的发言并不会被真正发布,发言将被有选择的体现。(果真如此?请 Tumblr 用户纠错)
  • 追随了上百万用户的用户,将像拥有上百万追随者的用户那样对待。
  • cell 的大小非常难于决定。cell 的大小直接影响网站的成败。每个 cell 归于的用户数量是影响力之一。需要权衡接受怎样的用户体验,以及为之付出多少投资。
  • 从 firehose 中读取数据将是对网络最大的考验。在 cell 内部网络流量是可管理的。
  • 当更多 cell 被增添到网络中来,他们可以进入到 cell 组中,并从 firehose 中读取数据。一个分层的数据复制计划。这可以帮助迁移到多个数据中心。

在纽约启动运作

  • 纽约具有独特的环境,资金和广告充足。招聘极具挑战性,因为缺乏创业经验。
  • 在过去的几年里,纽约一直致力于推动创业。纽约大学和哥伦比亚大学有一些项目,鼓励学生到初创企业实习,而不仅仅去华尔街。市长建立了一所学院,侧重于技术。

团队架构

  • 团队:基础架构,平台,SRE,产品,web ops,服务;
  • 基础架构:5层以下,IP 地址和 DNS,硬件配置;
  • 平台:核心应用开发,SQL 分片,服务,Web 运营;
  • SRE:在平台和产品之间,侧重于解决可靠性和扩展性的燃眉之急;
  • 服务团队:相对而言更具战略性,
  • Web ops:负责问题检测、响应和优化。

软件部署

  • 开发了一套 rsync 脚本,可以随处部署 PHP 应用程序。一旦机器的数量超过 200 台,系统便开始出现问题,部署花费了很长时间才完成,机器处于部署进程中的各种状态。
  • 接下来,使用 Capistrano(一个开源工具,可以在多台服务器上运行脚本)在服务堆栈中构建部署进程(开发、分期、生产)。在几十台机器上部署可以正常工作,但当通过 SSH 部署到数百台服务器时,再次失败。
  • 现在,所有的机器上运行一个协调软件。基于 Redhat Func(一个安全的、脚本化的远程控制框架和接口)功能,一个轻量级的 API 用于向主机发送命令,以构建扩展性。
  • 建立部署是在 Func 的基础上向主机发送命令,避免了使用 SSH。比如,想在组A上部署软件,控制主机就可以找出隶属于组A的节点,并运行部署命令。
  • 部署命令通过 Capistrano 实施。Func API 可用于返回状态报告,报告哪些机器上有这些软件版本。
  • 安全重启任何服务,因为它们会关闭连接,然后重启。
  • 在激活前的黑暗模式下运行所有功能。

展望

  • 从哲学上将,任何人都可以使用自己想要的任意工具。但随着团队的发展壮大,这些工具出现了问题。新员工想要更好地融入团队,快速地解决问题,必须以他们为中心,建立操作的标准化。
  • 过程类似于 Scrum(一种敏捷管理框架),非常敏捷。
  • 每个开发人员都有一台预配置的开发机器,并按照控制更新。
  • 开发机会出现变化,测试,分期,乃至用于生产。
  • 开发者使用 VIM 和 TextMate。
  • 测试是对 PHP 程序进行代码审核。
  • 在服务方面,他们已经实现了一个与提交相挂钩的测试基础架构,接下来将继承并内建通知机制。

招聘流程

  • 面试通常避免数学、猜谜、脑筋急转弯等问题,而着重关注应聘者在工作中实际要做什么。
  • 着重编程技能。
  • 面试不是比较,只是要找对的人。
  • 挑战在于找到具有可用性、扩展性经验的人才,以应对 Tumblr 面临的网络拥塞。
  • 在 Tumblr 工程博客(Tumblr Engineering Blog),他们对已过世的 Dennis Ritchie 和 John McCarthy 予以纪念。

经验及教训

  • 自动化无处不在
  • MySQL(增加分片)规模,应用程序暂时还不行
  • Redis 总能带给人惊喜
  • 基于 Scala 语言的应用执行效率是出色的
  • 废弃项目——当你不确定将如何工作时
  • 不顾用在他们发展经历中没经历过技术挑战的人,聘用有技术实力的人是因为他们能适合你的团队以及工作。
  • 选择正确的软件集合将会帮助你找到你需要的人
  • 建立团队的技能
  • 阅读文档和博客文章。
  • 多与同行交流,可以接触一些领域中经验丰富的人,例如与在 Facebook、Twitter、LinkedIn 的工程师多交流,从他们身上可以学到很多
  • 对技术要循序渐进,在正式投入使用之前他们煞费苦心的学习 HBase 和 Redis。同时在试点项目中使用或将其控制在有限损害范围之内。

from:

Tumblr:150亿月浏览量背后的架构挑战(上)

Tumblr:150亿月浏览量背后的架构挑战(下)

英文原文:High Scalability

百万级访问网站前期的技术准备

开了自己域名的博客,第一篇就得来个重磅一点的才对得起这4美金的域名。作为一个技术从业者十年,逛了十年发现有些知识东一榔头西一棒槌的得满世界看个遍才整理出个头绪,那咱就系统点的从头一步一步的说,一个从日几千访问的小小网站,到日访问一两百万的小网站,怎么才能让它平滑的度过这个阶段,别在技术上出现先天不足,写给一些技术人员,也写给不懂技术的创业者。

转载请注明出自 http://zhiyi.us ,假如您还想从这转到好文章的话。

对互联网有了解的人都有自己的想法,有人就把想法付诸实现,做个网站然后开始运营。其实从纯网站技术上来说,因为开源模式的发展,现在建一个小网站已经很简单也很便宜。当访问量到达一定数量级的时候成本就开始飙升了,问题也开始显现了。因为带宽的增加、硬件的扩展、人员的扩张所带来的成本提高是显而易见的,而还有相当大的一部分成本是因为代码重构、架构重构,甚至底层开发语言更换引起的,最惨的就是数据丢失,辛辛苦苦好几年,一夜回到创业前。

减少成本就是增加利润。很多事情,我们在一开始就可以避免,先打好基础,往后可以省很多精力,少操很多心。

假设你是一个参与创业的技术人员,当前一穷二白,什么都要自己做,自己出钱,初期几十万的资金,做一个应用不是特别复杂的网站,那么就要注意以下几点:

一、开发语言

一般来说,技术人员(程序员)创业都是根据自己技术背景选择自己最熟悉的语言,不过考虑到不可能永远是您一个人写程序,这点还得仔细想想。无论用什么语言,最终代码质量是看管理,所以我们还是从纯语言层面来说实际一点。现在流行的javaphp.netpythonruby都有自己的优劣,python和ruby,现在人员还是相对难招一些,性能优化也会费些力气,.net平台买不起windows server。java、php用的还是最多。对于初期,应用几乎都是靠前端支撑的网站来说,php的优势稍大一些,入门简单、设计模式简单、写起来快、性能足够等,不过不注重设计模式也是它的劣势,容易变得松散,隐藏bug稍多、难以维护。java的优势在于整套管理流程已经有很多成熟工具来辅助,强类型也能避免一些弱智BUG,大多数JAVA程序员比较注重设计模式,别管实不实际,代码格式看起来还是不错的。这也是个劣势,初学者可能太注重模式而很难解决实际需求。

前端不只是html、css这类。整个负责跟用户交互的部分都是前端,包括处理程序。这类程序还是建议用php,主要原因就是开发迅速、从业人员广泛。至于后端例如行为分析、银行接口、异步消息处理等,随便用什么程序,那个只能是根据不同业务需求来选择不同语言了。

二、代码版本管理

如果开发人员之间的网络速度差不多,就SVN;比较分散例如跨国,就hg。大多数人还是svn的.

假设选了svn,那么有几点考虑。一是采用什么树结构。初期可能只有一条主干,往后就需要建立分支,例如一条开发分支,一条上线分支,再往后,可能要每个小组一个分支。建议一开始人少时选择两条分支,开发和线上,每个功能本地测试无误后提交到开发分支,最后统一测试,可以上线时合并到上线分支。如果喜欢把svn当做移动硬盘用,写一点就commit一次也无所谓,就是合并的时候头大一些,这些人可以自己建个分支甚至建立个本地代码仓库,随便往自己的分支提交,测试完毕后再提交到开发分支上。

部署,可以手工部署也可以自动部署。手工部署相对简单,一般是直接在服务器上svn update,或者找个新目录svn checkout,再把web root给ln -s过去。应用越复杂,部署越复杂,没有什么统一标准,只要别再用ftp上传那种形式就好,一是上传时文件引用不一致错误率增加,二是很容易出现开发人员的版本跟线上版本不一致,导致本来想改个错字结果变成回滚的杯具。如果有多台服务器还是建议自动部署,更换代码的机器从当前服务池中临时撤出,更新完毕后再重新加入。

不管项目多小,养成使用版本管理的好习惯,最起码还可以当做你的备份,我的 http://zhiyi.us 虽然就是一个wordpress,可还是svn了,只改动一两句css那也是劳动成果。

三、服务器硬件

别羡慕大客户和有钱人,看看机房散户区,一台服务器孤独的支撑的网站数不清。如果资金稍微充足,建议至少三台的标准配置,分别用作web处理、数据库、备份。web服务器至少要8G内存,双sata raid1,如果经济稍微宽松,或静态文件或图片多,则15k sas raid1+0。数据库至少16G内存,15k sas raid 1+0。备份服务器最好跟数据库服务器同等配置。硬件可以自己买品牌的底板,也就是机箱配主板和硬盘盒,CPU内存硬盘都自己配,也可以上整套品牌,也可以兼容机。三台机器,市场行情6、7万也就配齐了。

web服务器可以既跑程序又当内存缓存,数据库服务器则只跑主数据库(假如是MySQL的话),备份服务器干的活就相对多一些,web配置、缓存配置、数据库配置都要跟前两台一致,这样WEB和数据库任意一台出问题,把备份服务器换个ip就切换上去了。备份策略,可以drbd,可以rsync,或者其他的很多很多的开源备份方案可选择。rsync最简单,放cron里自己跑就行。备份和切换,建议多做测试,选最安全最适合业务的,并且尽可能异地备份。

四、机房

三种机房尽量不要选:联通访问特别慢的电信机房、电信访问特别慢的联通机房、电信联通访问特别慢的移动或铁通机房。那网通机房呢?亲,网通联通N久以前合并改叫联通了。多多寻找,实地参观,多多测试,多方打探,北京、上海、广州等各个主节点城市,还是有很多优质机房的,找个网络质量好,管理严格的机房,特别是管理要严格,千万别网站无法访问了,打个电话过去才知道别人维护时把你网线碰掉了,这比DOS都头疼。自己扯了几根光纤就称为机房的,看您抗风险程度和心理素质了。机房可以说是非常重要,直接关系到网站访问速度,网站访问速度直接关系到用户体验,我可以翻墙看风景,但买个网游vpn才能打开你这个还不怎么知名的网站就有难度了。或许您网站的ajax很出色,可是document怎么也不ready,一些代码永远绝缘于用户。

五、架构

初期架构一般比较简单,web负载均衡+数据库主从+缓存+分布式存储+队列。大方向上也确实就这几样东西,细节上也无数文章都重复过了,按照将来会有N多WEB,N多主从关系,N多缓存,N多xxx设计就行,基本方案都是现成的,只是您比其他人厉害之处就在于设计上考虑到缓存失效时的雪崩效应、主从同步的数据一致性和时间差、队列的稳定性和失败后的重试策略、文件存储的效率和备份方式等等意外情况。缓存总有一天会失效,数据库复制总有一天会断掉,队列总有一天会写不进去,电源总有一天会烧坏。根据墨菲定律,如果不考虑这些,网站早晚会成为茶几。

六、服务器软件

Linux、nginx、php、mysql,几乎是标配,我们除了看名字,还得选版本。Linux发行版众多,只要没特殊要求,就选个用的人最多的,社区最活跃的,配置最方便的,软件包最全最新的,例如debianubuntu。至于RHEL之类的嘛,你用只能在RHEL上才能运行的软件么?剩下的nginx、php、mysql、activemq、其他的等等,除非你改过这些软件或你的程序真的不兼容新版本,否则尽量版本越新越好,版本新,意味着新特性增多、BUG减少、性能增加。总有些道听途说的人跟你说老的版本稳定。所谓稳定,是相对于特殊业务来说的,而就一个php写的网站,大多数人都没改过任何服务器软件源代码,绝大多数情况是能平稳的升级到新版本的。类似于jdk5到jdk6,python2到python3这类变动比较大的升级还是比较少见的。看看ChangeLog,看看升级说明,结合自己情况评估一下,越早升级越好,别人家都用php6写程序了这边还php4的逛游呢。优秀的开源程序升级还是很负责任的,看好文档,别怕。

以上这六点准备完毕,现在我们有了运行环境,有了基本架构骨架,有了备份和切换方案,应该开始着手设计开发方面的事情了。开发方面的事情无数,下一篇会先说一些重点。

七、数据库

几乎所有操作最后都要落到数据库身上,它又最难扩展(存储也挺难)。对于mysql,什么样的表用myisam,什么样的表用innodb,在开发之前要确定。复制策略、分片策略,也要确定。表引擎方面,一般,更新不多、不需要事务的表可以用myisam,需要行锁定、事务支持的,用innodb。myisam的锁表不一定是性能低下的根源,innodb也不一定全是行锁,具体细节要多看相关的文档,熟悉了引擎特性才能用的更好。现代WEB应用越来越复杂了,我们设计表结构时常常设计很多冗余,虽然不符合传统范式,但为了速度考虑还是值得的,要求高的情况下甚至要杜绝联合查询。编程时得多注意数据一致性。 复制策略方面,多主多从结构也最好一开始就设计好,代码直接按照多主多从来编写,用一些小技巧来避免复制延时问题,并且还要解决多数据库数据是否一致,可以自己写或者找现成的运维工具。

分片策略。总会有那么几个表数据量超大,这时分片必不可免。分片有很多策略,从简单的分区到根据热度自动调整,依照具体业务选择一个适合自己的。避免自增ID作为主键,不利于分片。

用存储过程是比较难扩展的,这种情形多发生于传统C/S,特别是OA系统转换过来的开发人员。低成本网站不是一两台小型机跑一个数据库处理所有业务的模式,是机海作战。方便水平扩展比那点预分析时间和网络传输流量要重要的多的多。

NoSQL。这只是一个概念。实际应用中,网站有着越来越多的密集写操作、上亿的简单关系数据读取、热备等,这都不是传统关系数据库所擅长的,于是就产生了很多非关系型数据库,比如Redis/TC&TT/MongoDB/Memcachedb等,在测试中,这些几乎都达到了每秒至少一万次的写操作,内存型的甚至5万以上。例如MongoDB,几句配置就可以组建一个复制+自动分片+failover的环境,文档化的存储也简化了传统设计库结构再开发的模式。很多业务是可以用这类数据库来替代mysql的。

八、缓存。

数据库很脆弱,一定要有缓存在前面挡着,其实我们优化速度,几乎就是优化缓存,能用缓存的地方,就不要再跑到后端数据库那折腾。缓存有持久化缓存、内存缓存,生成静态页面是最容易理解的持久化缓存了,还有很多比如varnish的分块缓存、前面提到的memcachedb等,内存缓存,memcached首当其冲。缓存更新可用被动更新和主动更新。被动更新的好处是设计简单,缓存空了就自动去数据库取数据再把缓存填上,但容易引发雪崩效应,一旦缓存大面积失效,数据库的压力直线上升很可能挂掉。主动缓存可避免这点但是可能引发程序取不到数据的问题。这两者之间如何配合,程序设计要多动脑筋。

九、队列。

用户一个操作很可能引发一系列资源和功能的调动,这些调动如果同时发生,压力无法控制,用户体验也不好,可以把这样一些操作放入队列,由另几个模块去异步执行,例如发送邮件,发送手机短信。开源队列服务器很多,性能要求不高用数据库当做队列也可以,只要保证程序读写队列的接口不变,底层队列服务可随时更换就可以,类似Zend Framework里的Zend_Queue类,java.util.Queue接口等。

十、文件存储。

除了结构化数据,我们经常要存放其他的数据,像图片之类的。这类数据数量繁多、访问量大。典型的就是图片,从用户头像到用户上传的照片,还要生成不同的缩略图尺寸。存储的分布几乎跟数据库扩展一样艰难。不使用专业存储的情况下,基本都是靠自己的NAS。这就涉及到结构。拿图片存储举例,图片是非常容易产生热点的,有些图片上传后就不再有人看,有些可能每天被访问数十万次,而且大量小文件的异步备份也很耗费时间。

为了将来图片走cdn做准备,一开始最好就将图片的域名分开,且不用主域名。很多网站都将cookie设置到了.domain.ltd,如果图片也在这个域名下,很可能因为cookie而造成缓存失效,并且占多余流量,还可能因为浏览器并发线程限制造成访问缓慢。

如果用普通的文件系统存储图片,有一个简单的方法。计算文件的hash值,比如md5,以结果第一位作为第一级目录,这样第一级有16个目录。从0到F,可以把这个字母作为域名,0.yourimg.com到f.yourimg.com(客户端dns压力会增大),还可以扩展到最多16个NAS集群上。第二级可用年月例如,201011,第三级用日,第四级可选,根据上传量,比如am/pm,甚至小时。最终的目录结构可能会是 e/201008/25/am/e43ae391c839d82801920cf.jpg。rsync备份时可以用脚本只同步某年某日某时的文件,避免计算大量文件带来的开销。当然最好是能用专门的分布式文件系统或更专业点的存储解决方案。

下面,我们要谈谈代码了。

这一系列的最后一篇写给普通编程人员,如果不感兴趣可直接看本文最后几段。

开始设计代码结构之前,先回顾一下之前准备过的事情:我们有负载均衡的WEB服务器,有主从DB服务器并可能分片,有缓存,有可扩展的存储。在组织代码的各个方面,跟这些准备息息相关,我一二三的列出来分别说,并且每一条都以“前面讲到”这个经典句式开头,为了方便对照。 别着急看经典句式,我思维跳跃了,插一段。实际开发中,我们总会在性能和代码优雅性上作折中。对于当今的计算机和语言解释器,多几层少几层对象调用、声明变量为Map还是HashMap这种问题是最后才需要考虑的问题,永远要考虑系统最慢的部分,从最慢的部分解决。例如看看你用的ORM是不是做了很多你用不到的事情,是不是有重复的数据调用。我们做的是web应用开发,不是底层框架API,代码易读易懂是保证质量很重要的一方面,你的程序是为了什么而设计,有不同的方法……算了,这个话题另起一篇文章来说,扯远了,想交流可关注我的微博 http://t.sina.com.cn/liuzhiyi,咱继续……

前面讲到,WEB服务器是要做负载均衡的,图片服务器是要分开的。对于这点,代码在处理客户端状态时,不要把状态放到单机上,举例,不要用文件session,嗯,常识。如果有可能,最好在一开始就做好用户单点认证的统一接口,包括跨域如何判断状态、静态页面如何判断状态,需要登录时的跳转和返回参数定义,底层给好接口,应用层直接就用(可参考GAE的user服务)。登录方面的设计要考虑移动设备的特性,比如电脑可以用浮动层窗口,但NOKIA自带的浏览器或UCWEB就无法处理这种表现形式,程序一定既能处理AJAX请求又能直接通过URL来处理请求。图片服务器分开,资源文件最好也布局到图片服务器,也就是WEB服务器只服务动态程序。虽然开发测试时稍微复杂(因为需要绝对URI才能访问),但将来页面前端优化上会轻松许多,并且你的WEB服务器IO优化也轻松许多。程序引用资源文件时,要有一个统一的处理方法,在方法内部可以自动完成很多事情,例如将css/js根据组合,拼成一个文件,或者自动在生成的URI后面加上QUERYSTRING,如果将来前端用了缓存服务,那生成QUERYSTRING是最简单的刷新服务端缓存和客户端缓存的办法。

前面讲到,数据库会有复制,可能会多主多从,可能会分片。我们程序在处理数据的过程中,最好能抽象出来单独放做一层。拿现在流行的MVC模式来说,就是在M层下方再放一个数据层,这个数据层不是通常所说的JDBC/PDO/ActiveRecord等,而是你自己的存取数据层,仅对外暴露方法,隐藏数据存取细节。这个数据层内部不要怕写的难看,但一定要提供所有的数据存储功能,其他任何层次不要看到跟数据库打交道的字眼。之所以这样做,是因为在单关系数据库的情况下,可能会SELECT…JOIN…或直接INSERT…INTO…,可你可能会将一些表放到key-value数据库里存储,或者分片,这么做之后原来的语句和方式要全部改变,如果过于分散,则移植时会耗费很大精力,或得到一个很大的Model。在数据层面的设计上,尽量避免JOIN查询,我们可以多做冗余,多做缓存,每种数据尽量只需要一次查询,然后在你的程序里面进行组合。对于比较复杂的数据组合,在实时性要求不高的情况下,可采用异步处理,用户访问时只取处理后的结果。在对于主键的处理上,避免使用自增ID,可以用一定规则生成的唯一值当做主键,这种主键是最简单的分片分布策略。即使用自增ID,也最好用一个自增ID发生器,否则从数据库不小心被写了一下,那主键很容易冲突。

前面讲到,咱数据库前面还有某些缓存挡着。别把mysql的query cache当缓存,应用稍复杂的时候QUERY CACHE反而会成为累赘。缓存跟数据库和业务结合的很紧密,正因为跟业务关系紧密,所以这点没有放之四海而皆准的方法。但我们还是有一些规则可参照。规则一:越接近前端,缓存的颗粒度越大。例如在WEB最前端缓存整个页面,再往后一层缓存部分页面区域,再往后缓存区域内的单条记录。因为越靠近后端,我们的可操作性越灵活,并且变化最多的前端代码也比较方便编写。在实践中,因为产品需求变化速度非常快,迭代周期越来越短,有时很难将Controller和Model分的那么清楚,Controller层面处理部分缓存必不可免,但要保证如果出现这种情况,Controller所操作的缓存一定不要影响其他数据需求方,也就是要保证这个缓存数据只有这一个Controller在用。规则二:没有缓存时程序不能出错。在不考虑缓存失效引发的雪崩效应时,你的程序要有缓存跟没缓存一个样,不能像新浪微博一样,缓存一失效,粉丝微博全空,整个应用都乱套了。在缓存必不可少的情况下,给用户出错信息都比给一个让人误解的信息强。规则三,缓存更新要保证原子性或称作线程安全,特别是采用被动缓存的方式时,很可能两个用户访问时导致同一个缓存被更新,通常情况这不是大问题,可缓存失效后重建时很可能是引发连锁反应的原因之一。规则四:缓存也是有成本的。不只是技术成本,还有人工时间成本。如果一个功能使用缓存和不使用,在可预见的访问量情况下区别微小,但使用缓存会使复杂度增加,那就不用,我们可以加个TODO标注,在下次迭代的时候加上缓存处理。

前面讲到,文件存储是独立的,那么所有的文件操作就都是远程调用。可以在文件服务器上提供一个很简单的RESTful接口,也可以提供xmlrpc或json serveice,WEB服务器端所生成和处理的文件,全部通过接口通知文件服务器去处理,WEB服务器本身不要提供任何文件存储。你会发现很多大网站的上传图片跟保存文章是分两步完成的,就是基于这个原因。

以上几条“前面讲到”,其实无数人都讲过,我也只是结合前几篇文章用自己的话重复了一遍,真正分析起来精髓很简单——除了良好的功能逻辑分层,我们还要为数据库存储、缓存、队列、文件服务等程序外层资源调用单独设计接口,你可以把你的程序想象成是运行在 Amazon EC2 上并用他的所有web service服务,你的数据库就是它的SimpleDB,你的队列就是他的SQS,你的存储就是他的S3,唯一不同是amazon的接口是远程调用,你的是内部调用。

将支撑服务接口化,意味着将MySQL更换到PostgreSQL不需要更改业务处理程序,移植团队甚至不需要跟业务开发团队过多沟通;意味着业务开发团队是对接口编程而不是对数据库编程;意味着不会因为某个业务开发人员的失误而拖垮性能。

对程序扫盲不感兴趣的直接看这里——

产品设计完了,程序框架搭完了,可能有矛盾在这个节骨眼儿产生了。不断有产品设计抱怨说他的创意没实现到预期效果,有程序员抱怨说产品设计不切实际。这种抱怨多缘于产品人员不懂技术,技术人员不理解产品。从广义上来讲,产品包含市场策略、营销手段、功能设计,产品和技术在争论时往往把焦点放在功能上,而实际重点是,实现这个功能所消耗的成本跟能这个功能带来的利益能否换算,能否取其轻重。若可以,争议解决。若不能,则抛硬币看运气。因为一个功能的加强而引发指标井喷,或因项目拖延而导致贻误战机的例子比比皆是。激进的决策者注重利益,保守的决策者注重损失,聪明的决策者会考虑这个问题是否真的那么严重。

关系到未来的事情谁都说不准,要不怎么说创业一半靠运气呢。不过总有能说的准的事情,那就得靠数据说话。

没有100%也有99.9%的网站安装了访问统计代码,连我的 http://zhiyi.us 也不例外,新闻联播也总说科学决策科学发展的。有了统计,能确定的事情就很多了。例如,可以根据来源-目标转化率来分析哪类渠道的人均获取成本低,根据来源-内容访问猜测用户跳出率原因,根据用户点击行为判断链接位置是否合理等。将数据以不同方式组合起来,找到内在联系,分析内因外因,制定对应策略,减少拍脑门决策。靠数据支撑运营是个非常专业的事情,虽然不懂深奥的数学模型不会复杂的公式计算,渐渐学会因为A所以B,因为A和B所以C还是相对简单的。

全系列完毕。老话,大半夜连抽烟带码字的挺伤身,转载请注明出处.

from:

http://zhiyi.us/internet/thinking-twice-before-building-your-site-one.html

http://zhiyi.us/internet/thinking-twice-before-building-your-site-two.html

http://zhiyi.us/internet/thinking-twice-before-building-your-site-final.html