Category Archives: 高并发

Understanding transaction pitfalls

The most common reason for using transactions in an application is to maintain a high degree of data integrity and consistency. If you’re unconcerned about the quality of your data, you needn’t concern yourself with transactions. After all, transaction support in the Java platform can kill performance, introduce locking issues and database concurrency problems, and add complexity to your application.

But developers who don’t concern themselves with transactions do so at their own peril. Almost all business-related applications require a high degree of data quality. The financial investment industry alone wastes tens of billions of dollars on failed trades, with bad data being the second-leading cause. Although lack of transaction support is only one factor leading to bad data (albeit a major one), a safe inference is that billions of dollars are wasted in the financial investment industry alone as a result of nonexistent or poor transaction support.

Ignorance about transaction support is another source of problems. All too often I hear claims like “we don’t need transaction support in our applications because they never fail.” Right. I have witnessed some applications that in fact rarely or never throw exceptions. These applications bank on well-written code, well-written validation routines, and full testing and code coverage support to avoid the performance costs and complexity associated with transaction processing. The problem with this type of thinking is that it takes into account only one characteristic of transaction support: atomicity. Atomicity ensures that all updates are treated as a single unit and are either all committed or all rolled back. But rolling back or coordinating updates isn’t the only aspect of transaction support. Another aspect, isolation, ensures that one unit of work is isolated from other units of work. Without proper transaction isolation, other units of work can access updates made by an ongoing unit of work, even though that unit of work is incomplete. As a result, business decisions might be made on the basis of partial data, which could cause failed trades or other negative (or costly) outcomes.

So, given the high cost and negative impact of bad data and the basic knowledge that transactions are important (and necessary), you need to use transactions and learn how to deal with the issues that can arise. You press on and add transaction support to your applications. And that’s where the problem often begins. Transactions don’t always seem to work as promised in the Java platform. This article is an exploration of the reasons why. With the help of code examples, I’ll introduce some of the common transaction pitfalls I continually see and experience in the field, in most cases in production environments.

Although most of this article’s code examples use the Spring Framework (version 2.5), the transaction concepts are the same as for the EJB 3.0 specification. In most cases, it is simply a matter of replacing the Spring Framework @Transactional annotation with the @TransactionAttribute annotation found in the EJB 3.0 specification. Where the two frameworks differ in concept and technique, I have included both Spring Framework and EJB 3.0 source code examples.

Local transaction pitfalls

A good place to start is with the easiest scenario: the use of local transactions, also commonly referred to as database transactions. In the early days of database persistence (for example, JDBC), we commonly delegated transaction processing to the database. After all, isn’t that what the database is supposed to do? Local transactions work fine for logical units of work (LUW) that perform a single insert, update, or delete statement. For example, consider the simple JDBC code in Listing 1, which performs an insert of a stock-trade order to a TRADE table:

Listing 1. Simple database insert using JDBC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
@Stateless
public class TradingServiceImpl implements TradingService {
   @Resource SessionContext ctx;
   @Resource(mappedName="java:jdbc/tradingDS") DataSource ds;
   public long insertTrade(TradeData trade) throws Exception {
      Connection dbConnection = ds.getConnection();
      try {
         Statement sql = dbConnection.createStatement();
         String stmt =
            "INSERT INTO TRADE (ACCT_ID, SIDE, SYMBOL, SHARES, PRICE, STATE)"
          + "VALUES ("
          + trade.getAcct() + "','"
          + trade.getAction() + "','"
          + trade.getSymbol() + "',"
          + trade.getShares() + ","
          + trade.getPrice() + ",'"
          + trade.getState() + "')";
         sql.executeUpdate(stmt, Statement.RETURN_GENERATED_KEYS);
         ResultSet rs = sql.getGeneratedKeys();
         if (rs.next()) {
            return rs.getBigDecimal(1).longValue();
         } else {
            throw new Exception("Trade Order Insert Failed");
         }
      } finally {
         if (dbConnection != null) dbConnection.close();
      }
   }
}

The JDBC code in Listing 1 includes no transaction logic, yet it persists the trade order in the TRADE table in the database. In this case, the database handles the transaction logic.

This is all well and good for a single database maintenance action in the LUW. But suppose you need to update the account balance at the same time you insert the trade order into the database, as shown in Listing 2:

Listing 2. Performing multiple table updates in the same method
1
2
3
4
5
6
7
8
9
10
public TradeData placeTrade(TradeData trade) throws Exception {
   try {
      insertTrade(trade);
      updateAcct(trade);
      return trade;
   } catch (Exception up) {
      //log the error
      throw up;
   }
}

In this case, the insertTrade() and updateAcct() methods use standard JDBC code without transactions. Once the insertTrade() method ends, the database has persisted (and committed) the trade order. If the updateAcct() method should fail for any reason, the trade order would remain in the TRADE table at the end of the placeTrade() method, resulting in inconsistent data in the database. If the placeTrade() method had used transactions, both of these activities would have been included in a single LUW, and the trade order would have been rolled back if the account update failed.

With the popularity of Java persistence frameworks like Hibernate, TopLink, and the Java Persistence API (JPA) on the rise, we rarely write straight JDBC code anymore. More commonly, we use the newer object-relational mapping (ORM) frameworks to make our lives easier by replacing all of that nasty JDBC code with a few simple method calls. For example, to insert the trade order from the JDBC code example in Listing 1, using the Spring Framework with JPA, you’d map the TradeData object to the TRADE table and replace all of that JDBC code with the JPA code in Listing 3:

Listing 3. Simple insert using JPA
1
2
3
4
5
6
7
8
public class TradingServiceImpl {
    @PersistenceContext(unitName="trading") EntityManager em;
    public long insertTrade(TradeData trade) throws Exception {
       em.persist(trade);
       return trade.getTradeId();
    }
}

Notice that Listing 3 invokes the persist() method on the EntityManager to insert the trade order. Simple, right? Not really. This code will not insert the trade order into the TRADE table as expected, nor will it throw an exception. It will simply return a value of 0 as the key to the trade order without changing the database. This is one of the first major pitfalls of transaction processing: ORM-based frameworks require a transaction in order to trigger the synchronization between the object cache and the database. It is through a transaction commit that the SQL code is generated and the database affected by the desired action (that is, insert, update, delete). Without a transaction there is no trigger for the ORM to generate SQL code and persist the changes, so the method simply ends — no exceptions, no updates. If you are using an ORM-based framework, you must use transactions. You can no longer rely on the database to manage the connections and commit the work.

These simple examples should make it clear that transactions are necessary in order to maintain data integrity and consistency. But they only begin to scratch the surface of the complexity and pitfalls associated with implementing transactions in the Java platform.

Spring Framework @Transactional annotation pitfalls

So, you test the code in Listing 3 and discover that the persist() method didn’t work without a transaction. As a result, you view a few links from a simple Internet search and find that with the Spring Framework, you need to use the @Transactional annotation. So you add the annotation to your code as shown in Listing 4:

Listing 4. Using the @Transactional annotation
1
2
3
4
5
6
7
8
9
public class TradingServiceImpl {
   @PersistenceContext(unitName="trading") EntityManager em;
   @Transactional
   public long insertTrade(TradeData trade) throws Exception {
      em.persist(trade);
      return trade.getTradeId();
   }
}

You retest your code, and you find it still doesn’t work. The problem is that you must tell the Spring Framework that you are using annotations for your transaction management. Unless you are doing full unit testing, this pitfall is sometimes hard to discover. It usually leads to developers simply adding the transaction logic in the Spring configuration files rather than through annotations.

When using the @Transactional annotation in Spring, you must add the following line to your Spring configuration file:

1
<tx:annotation-driven transaction-manager="transactionManager"/>

The transaction-manager property holds a reference to the transaction manager bean defined in the Spring configuration file. This code tells Spring to use the @Transaction annotation when applying the transaction interceptor. Without it, the @Transactional annotation is ignored, resulting in no transaction being used in your code.

Getting the basic @Transactional annotation to work in the code in Listing 4 is only the beginning. Notice that Listing 4 uses the @Transactional annotation without specifying any additional annotation parameters. I’ve found that many developers use the @Transactional annotation without taking the time to understand fully what it does. For example, when using the @Transactional annotation by itself as I do in Listing 4, what is the transaction propagation mode set to? What is the read-only flag set to? What is the transaction isolation level set to? More important, when should the transaction roll back the work? Understanding how this annotation is used is important to ensuring that you have the proper level of transaction support in your application. To answer the questions I’ve just asked: when using the @Transactional annotation by itself without any parameters, the propagation mode is set to REQUIRED, the read-only flag is set to false, the transaction isolation level is set to the database default (usually READ_COMMITTED), and the transaction will not roll back on a checked exception.

@Transactional read-only flag pitfalls

A common pitfall I frequently come across in my travels is the improper use of the read-only flag on the Spring @Transactional annotation. Here is a quick quiz for you: When using standard JDBC code for Java persistence, what does the @Transactional annotation in Listing 5 do when the read-only flag is set to true and the propagation mode set to SUPPORTS?

Listing 5. Using read-only with SUPPORTS propagation mode — JDBC
1
2
3
4
@Transactional(readOnly = true, propagation=Propagation.SUPPORTS)
public long insertTrade(TradeData trade) throws Exception {
   //JDBC Code...
}

When the insertTrade() method in Listing 5 executes, does it:

  • Throw a read-only connection exception
  • Correctly insert the trade order and commit the data
  • Do nothing because the propagation level is set to SUPPORTS

Give up? The correct answer is B. The trade order is correctly inserted into the database, even though the read-only flag is set to true and the transaction propagation set to SUPPORTS. But how can that be? No transaction is started because of the SUPPORTS propagation mode, so the method effectively uses a local (database) transaction. The read-only flag is applied only if a transaction is started. In this case, no transaction was started, so the read-only flag is ignored.

Okay, so if that is the case, what does the @Transactional annotation do in Listing 6 when the read-only flag is set and the propagation mode is set to REQUIRED?

Listing 6. Using read-only with REQUIRED propagation mode — JDBC
1
2
3
4
@Transactional(readOnly = true, propagation=Propagation.REQUIRED)
public long insertTrade(TradeData trade) throws Exception {
   //JDBC code...
}

When executed, does the insertTrade() method in Listing 6:

  • Throw a read-only connection exception
  • Correctly insert the trade order and commit the data
  • Do nothing because the read-only flag is set to true

This one should be easy to answer given the prior explanation. The correct answer here is A. An exception will be thrown, indicating that you are trying to perform an update operation on a read-only connection. Because a transaction is started (REQUIRED), the connection is set to read-only. Sure enough, when you try to execute the SQL statement, you get an exception telling you that the connection is a read-only connection.

The odd thing about the read-only flag is that you need to start a transaction in order to use it. Why would you need a transaction if you are only reading data? The answer is that you don’t. Starting a transaction to perform a read-only operation adds to the overhead of the processing thread and can cause shared read locks on the database (depending on what type of database you are using and what the isolation level is set to). The bottom line is that the read-only flag is somewhat meaningless when you use it for JDBC-based Java persistence and causes additional overhead when an unnecessary transaction is started.

What about when you use an ORM-based framework? In keeping with the quiz format, can you guess what the result of the @Transactional annotation in Listing 7 would be if the insertTrade() method were invoked using JPA with Hibernate?

Listing 7. Using read-only with REQUIRED propagation mode — JPA
1
2
3
4
5
@Transactional(readOnly = true, propagation=Propagation.REQUIRED)
public long insertTrade(TradeData trade) throws Exception {
   em.persist(trade);
   return trade.getTradeId();
}

Does the insertTrade() method in Listing 7:

  • Throw a read-only connection exception
  • Correctly insert the trade order and commit the data
  • Do nothing because the readOnly flag is set to true

The answer to this question is a bit more tricky. In some cases the answer is C, but in most cases (particularly when using JPA) the answer is B. The trade order is correctly inserted into the database without error. Wait a minute — the preceding example shows that a read-only connection exception would be thrown when the REQUIRED propagation mode is used. That is true when you use JDBC. However, when you use an ORM-based framework, the read-only flag works a bit differently. When you are generating a key on an insert, the ORM framework will go to the database to obtain the key and subsequently perform the insert. For some vendors, such as Hibernate, the flush mode will be set to MANUAL, and no insert will occur for inserts with non-generated keys. The same holds true for updates. However, other vendors, like TopLink, will always perform inserts and updates when the read-only flag is set to true. Although this is both vendor and version specific, the point here is that you cannot be guaranteed that the insert or update will not occur when the read-only flag is set, particularly when using JPA as it is vendor-agnostic.

Which brings me to another major pitfall I frequently encounter. Given all you’ve read so far, what do you suppose the code in Listing 8 would do if you only set the read-only flag on the @Transactional annotation?

Listing 8. Using read-only — JPA
1
2
3
4
@Transactional(readOnly = true)
public TradeData getTrade(long tradeId) throws Exception {
   return em.find(TradeData.class, tradeId);
}

Does the getTrade() method in Listing 8:

  • Start a transaction, get the trade order, then commit the transaction
  • Get the trade order without starting a transaction

The correct answer here is A. A transaction is started and committed. Don’t forget: the default propagation mode for the @Transactional annotation is REQUIRED. This means that a transaction is started when in fact one is not required (see Never say never). . Depending on the database you are using, this can cause unnecessary shared locks, resulting in possible deadlock situations in the database. In addition, unnecessary processing time and resources are being consumed starting and stopping the transaction. The bottom line is that when you use an ORM-based framework, the read-only flag is quite useless and in most cases is ignored. But if you still insist on using it, always set the propagation mode to SUPPORTS, as shown in Listing 9, so no transaction is started:

Listing 9. Using read-only and SUPPORTS propagation mode for select operation
1
2
3
4
@Transactional(readOnly = true, propagation=Propagation.SUPPORTS)
public TradeData getTrade(long tradeId) throws Exception {
   return em.find(TradeData.class, tradeId);
}

Better yet, just avoid using the @Transactional annotation altogether when doing read operations, as shown in Listing 10:

Listing 10. Removing the @Transactional annotation for select operations
1
2
3
public TradeData getTrade(long tradeId) throws Exception {
   return em.find(TradeData.class, tradeId);
}

REQUIRES_NEW transaction attribute pitfalls

Whether you’re using the Spring Framework or EJB, use of the REQUIRES_NEW transaction attribute can have negative results and lead to corrupt and inconsistent data. The REQUIRES_NEW transaction attribute always starts a new transaction when the method is started, whether or not an existing transaction is present. Many developers use the REQUIRES_NEW attribute incorrectly, assuming it is the correct way to make sure that a transaction is started. Consider the two methods in Listing 11:

Listing 11. Using the REQUIRES_NEW transaction attribute
1
2
3
4
5
@Transactional(propagation=Propagation.REQUIRES_NEW)
public long insertTrade(TradeData trade) throws Exception {...}
@Transactional(propagation=Propagation.REQUIRES_NEW)
public void updateAcct(TradeData trade) throws Exception {...}

Notice in Listing 11 that both of these methods are public, implying that they can be invoked independently from each other. Problems occur with the REQUIRES_NEW attribute when methods using it are invoked within the same logical unit of work via inter-service communication or through orchestration. For example, suppose in Listing 11 that you can invoke the updateAcct() method independently of any other method in some use cases, but there’s also the case where the updateAcct() method is also invoked in the insertTrade() method. Now, if an exception occurs after the updateAcct() method call, the trade order would be rolled back, but the account updates would be committed to the database, as shown in Listing 12:

Listing 12. Multiple updates using the REQUIRES_NEW transaction attribute
1
2
3
4
5
6
7
@Transactional(propagation=Propagation.REQUIRES_NEW)
public long insertTrade(TradeData trade) throws Exception {
   em.persist(trade);
   updateAcct(trade);
   //exception occurs here! Trade rolled back but account update is not!
   ...
}

This happens because a new transaction is started in the updateAcct() method, so that transaction commits once the updateAcct() method ends. When you use the REQUIRES_NEW transaction attribute, if an existing transaction context is present, the current transaction is suspended and a new transaction started. Once that method ends, the new transaction commits and the original transaction resumes.

Because of this behavior, the REQUIRES_NEW transaction attribute should be used only if the database action in the method being invoked needs to be saved to the database regardless of the outcome of the overlaying transaction. For example, suppose that every stock trade that was attempted had to be recorded in an audit database. This information needs to be persisted whether or not the trade failed because of validation errors, insufficient funds, or some other reason. If you did not use the REQUIRES_NEW attribute on the audit method, the audit record would be rolled back along with the attempted trade. Using the REQUIRES_NEW attribute guarantees that the audit data is saved regardless of the initial transaction’s outcome. The main point here is always to use either the MANDATORY or REQUIRED attribute instead of REQUIRES_NEW unless you have a reason to use it for reasons similar those to the audit example.

Transaction rollback pitfalls

I’ve saved the most common transaction pitfall for last. Unfortunately, I see this one in production code more times than not. I’ll start with the Spring Framework and then move on to EJB 3.

So far, the code you have been looking at looks something like Listing 13:

Listing 13. No rollback support
1
2
3
4
5
6
7
8
9
10
11
@Transactional(propagation=Propagation.REQUIRED)
public TradeData placeTrade(TradeData trade) throws Exception {
   try {
      insertTrade(trade);
      updateAcct(trade);
      return trade;
   } catch (Exception up) {
      //log the error
      throw up;
   }
}

Suppose the account does not have enough funds to purchase the stock in question or is not set up to purchase or sell stock yet and throws a checked exception (for example, FundsNotAvailableException). Does the trade order get persisted in the database or is the entire logical unit of work rolled back? The answer, surprisingly, is that upon a checked exception (either in the Spring Framework or EJB), the transaction commits any work that has not yet been committed. Using Listing 13, this means that if a checked exception occurs during the updateAcct() method, the trade order is persisted, but the account isn’t updated to reflect the trade.

This is perhaps the primary data-integrity and consistency issue when transactions are used. Run-time exceptions (that is, unchecked exceptions) automatically force the entire logical unit of work to roll back, but checked exceptions do not. Therefore, the code in Listing 13 is useless from a transaction standpoint; although it appears that it uses transactions to maintain atomicity and consistency, in fact it does not.

Although this sort of behavior may seem strange, transactions behave this way for some good reasons. First of all, not all checked exceptions are bad; they might be used for event notification or to redirect processing based on certain conditions. But more to the point, the application code may be able to take corrective action on some types of checked exceptions, thereby allowing the transaction to complete. For example, consider the scenario in which you are writing the code for an online book retailer. To complete the book order, you need to send an e-mail confirmation as part of the order process. If the e-mail server is down, you would send some sort of SMTP checked exception indicating that the message cannot be sent. If checked exceptions caused an automatic rollback, the entire book order would be rolled back just because the e-mail server was down. By not automatically rolling back on checked exceptions, you can catch that exception and perform some sort of corrective action (such as sending the message to a pending queue) and commit the rest of the order.

When you use the Declarative transaction model (described in more detail in Part 2 of this series), you must specify how the container or framework should handle checked exceptions. In the Spring Framework you specify this through the rollbackFor parameter in the @Transactional annotation, as shown in Listing 14:

Listing 14. Adding transaction rollback support — Spring
1
2
3
4
5
6
7
8
9
10
11
@Transactional(propagation=Propagation.REQUIRED, rollbackFor=Exception.class)
public TradeData placeTrade(TradeData trade) throws Exception {
   try {
      insertTrade(trade);
      updateAcct(trade);
      return trade;
   } catch (Exception up) {
      //log the error
      throw up;
   }
}

Notice the use of the rollbackFor parameter in the @Transactional annotation. This parameter accepts either a single exception class or an array of exception classes, or you can use the rollbackForClassName parameter to specify the names of the exceptions as Java String types. You can also use the negative version of this property (noRollbackFor) to specify that all exceptions should force a rollback except certain ones. Typically most developers specify Exception.class as the value, indicating that all exceptions in this method should force a rollback.

EJBs work a little bit differently from the Spring Framework with regard to rolling back a transaction. The @TransactionAttribute annotation found in the EJB 3.0 specification does not include directives to specify the rollback behavior. Rather, you must use the SessionContext.setRollbackOnly() method to mark the transaction for rollback, as illustrated in Listing 15:

Listing 15. Adding transaction rollback support — EJB
1
2
3
4
5
6
7
8
9
10
11
12
@TransactionAttribute(TransactionAttributeType.REQUIRED)
public TradeData placeTrade(TradeData trade) throws Exception {
   try {
      insertTrade(trade);
      updateAcct(trade);
      return trade;
   } catch (Exception up) {
      //log the error
      sessionCtx.setRollbackOnly();
      throw up;
   }
}

Once the setRollbackOnly() method is invoked, you cannot change your mind; the only possible outcome is to roll back the transaction upon completion of the method that started the transaction. The transaction strategies described in future articles in the series will provide guidance on when and where to use the rollback directives and on when to use the REQUIRED vs. MANDATORY transaction attributes.

Conclusion

The code used to implement transactions in the Java platform is not overly complex; however, how you use and configure it can get somewhat complex. Many pitfalls are associated with implementing transaction support in the Java platform (including some less common ones that I haven’t discussed here). The biggest issue with most of them is that no compiler warnings or run-time errors tell you that the transaction implementation is incorrect. Furthermore, contrary to the assumption reflected in the “Better late than never” anecdote at the start of this article, implementing transaction support is not only a coding exercise. A significant amount of design effort goes into developing an overall transaction strategy. The rest of the Transaction strategies series will help guide you in terms of how to design an effective transaction strategy for use cases ranging from simple applications to high-performance transaction processing.


Downloadable resources

from:http://www.ibm.com/developerworks/java/library/j-ts1/index.html

refer:Spring @Transactional – isolation, propagationrefer

关于Java并发编程的总结和思考

为什么需要并发

并发其实是一种解耦合的策略,它帮助我们把做什么(目标)和什么时候做(时机)分开。这样做可以明显改进应用程序的吞吐量(获得更多的CPU调度时间)和结构(程序有多个部分在协同工作)。做过Java Web开发的人都知道,Java Web中的Servlet程序在Servlet容器的支持下采用单实例多线程的工作模式,Servlet容器为你处理了并发问题。

误解和正解

最常见的对并发编程的误解有以下这些:
-并发总能改进性能(并发在CPU有很多空闲时间时能明显改进程序的性能,但当线程数量较多的时候,线程间频繁的调度切换反而会让系统的性能下降)
-编写并发程序无需修改原有的设计(目的与时机的解耦往往会对系统结构产生巨大的影响)
-在使用Web或EJB容器时不用关注并发问题(只有了解了容器在做什么,才能更好的使用容器)
下面的这些说法才是对并发客观的认识:
编写并发程序会在代码上增加额外的开销
-正确的并发是非常复杂的,即使对于很简单的问题
-并发中的缺陷因为不易重现也不容易被发现
-并发往往需要对设计策略从根本上进行修改

并发编程的原则和技巧

单一职责原则

分离并发相关代码和其他代码(并发相关代码有自己的开发、修改和调优生命周期)。

限制数据作用域

两个线程修改共享对象的同一字段时可能会相互干扰,导致不可预期的行为,解决方案之一是构造临界区,但是必须限制临界区的数量。

使用数据副本

数据副本是避免共享数据的好方法,复制出来的对象只是以只读的方式对待。Java 5的java.util.concurrent包中增加一个名为CopyOnWriteArrayList的类,它是List接口的子类型,所以你可以认为它是ArrayList的线程安全的版本,它使用了写时复制的方式创建数据副本进行操作来避免对共享数据并发访问而引发的问题。

线程应尽可能独立

让线程存在于自己的世界中,不与其他线程共享数据。有过Java Web开发经验的人都知道,Servlet就是以单实例多线程的方式工作,和每个请求相关的数据都是通过Servlet子类的service方法(或者是doGet或doPost方法)的参数传入的。只要Servlet中的代码只使用局部变量,Servlet就不会导致同步问题。springMVC的控制器也是这么做的,从请求中获得的对象都是以方法的参数传入而不是作为类的成员,很明显Struts 2的做法就正好相反,因此Struts 2中作为控制器的Action类都是每个请求对应一个实例。

Java 5以前的并发编程

Java的线程模型建立在抢占式线程调度的基础上,也就是说:

  • 所有线程可以很容易的共享同一进程中的对象。
  • 能够引用这些对象的任何线程都可以修改这些对象。
  • 为了保护数据,对象可以被锁住。

Java基于线程和锁的并发过于底层,而且使用锁很多时候都是很万恶的,因为它相当于让所有的并发都变成了排队等待。
在Java 5以前,可以用synchronized关键字来实现锁的功能,它可以用在代码块和方法上,表示在执行整个代码块或方法之前线程必须取得合适的锁。对于类的非静态方法(成员方法)而言,这意味这要取得对象实例的锁,对于类的静态方法(类方法)而言,要取得类的Class对象的锁,对于同步代码块,程序员可以指定要取得的是那个对象的锁。
不管是同步代码块还是同步方法,每次只有一个线程可以进入,如果其他线程试图进入(不管是同一同步块还是不同的同步块),JVM会将它们挂起(放入到等锁池中)。这种结构在并发理论中称为临界区(critical section)。这里我们可以对Java中用synchronized实现同步和锁的功能做一个总结:

  • 只能锁定对象,不能锁定基本数据类型
  • 被锁定的对象数组中的单个对象不会被锁定
  • 同步方法可以视为包含整个方法的synchronized(this) { … }代码块
  • 静态同步方法会锁定它的Class对象
  • 内部类的同步是独立于外部类的
  • synchronized修饰符并不是方法签名的组成部分,所以不能出现在接口的方法声明中
  • 非同步的方法不关心锁的状态,它们在同步方法运行时仍然可以得以运行
  • synchronized实现的锁是可重入的锁。

在JVM内部,为了提高效率,同时运行的每个线程都会有它正在处理的数据的缓存副本,当我们使用synchronzied进行同步的时候,真正被同步的是在不同线程中表示被锁定对象的内存块(副本数据会保持和主内存的同步,现在知道为什么要用同步这个词汇了吧),简单的说就是在同步块或同步方法执行完后,对被锁定的对象做的任何修改要在释放锁之前写回到主内存中;在进入同步块得到锁之后,被锁定对象的数据是从主内存中读出来的,持有锁的线程的数据副本一定和主内存中的数据视图是同步的 。
在Java最初的版本中,就有一个叫volatile的关键字,它是一种简单的同步的处理机制,因为被volatile修饰的变量遵循以下规则:

  • 变量的值在使用之前总会从主内存中再读取出来。
  • 对变量值的修改总会在完成之后写回到主内存中。

使用volatile关键字可以在多线程环境下预防编译器不正确的优化假设(编译器可能会将在一个线程中值不会发生改变的变量优化成常量),但只有修改时不依赖当前状态(读取时的值)的变量才应该声明为volatile变量。
不变模式也是并发编程时可以考虑的一种设计。让对象的状态是不变的,如果希望修改对象的状态,就会创建对象的副本并将改变写入副本而不改变原来的对象,这样就不会出现状态不一致的情况,因此不变对象是线程安全的。Java中我们使用频率极高的String类就采用了这样的设计。如果对不变模式不熟悉,可以阅读阎宏博士的《Java与模式》一书的第34章。说到这里你可能也体会到final关键字的重要意义了。

Java 5的并发编程

不管今后的Java向着何种方向发展或者灭亡,Java 5绝对是Java发展史中一个极其重要的版本,这个版本提供的各种语言特性我们不在这里讨论(有兴趣的可以阅读我的另一篇文章《Java的第20年:从Java版本演进看编程技术的发展》),但是我们必须要感谢Doug Lea在Java 5中提供了他里程碑式的杰作java.util.concurrent包,它的出现让Java的并发编程有了更多的选择和更好的工作方式。Doug Lea的杰作主要包括以下内容:

  • 更好的线程安全的容器
  • 线程池和相关的工具类
  • 可选的非阻塞解决方案
  • 显示的锁和信号量机制

下面我们对这些东西进行一一解读。

原子类

Java 5中的java.util.concurrent包下面有一个atomic子包,其中有几个以Atomic打头的类,例如AtomicInteger和AtomicLong。它们利用了现代处理器的特性,可以用非阻塞的方式完成原子操作,代码如下所示:

/**
 ID序列生成器
*/
public class IdGenerator {
    private final AtomicLong sequenceNumber = new AtomicLong(0);

    public long next() {
        return sequenceNumber.getAndIncrement(); 
    }
}

显示锁

基于synchronized关键字的锁机制有以下问题:

  • 锁只有一种类型,而且对所有同步操作都是一样的作用
  • 锁只能在代码块或方法开始的地方获得,在结束的地方释放
  • 线程要么得到锁,要么阻塞,没有其他的可能性

Java 5对锁机制进行了 重构,提供了显示的锁,这样可以在以下几个方面提升锁机制:

  • 可以添加不同类型的锁,例如读取锁和写入锁
  • 可以在一个方法中加锁,在另一个方法中解锁
  • 可以使用tryLock方式尝试获得锁,如果得不到锁可以等待、回退或者干点别的事情,当然也可以在超时之后放弃操作

显示的锁都实现了java.util.concurrent.Lock接口,主要有两个实现类:

  • ReentrantLock – 比synchronized稍微灵活一些的重入锁
  • ReentrantReadWriteLock – 在读操作很多写操作很少时性能更好的一种重入锁

对于如何使用显示锁,可以参考我的Java面试系列文章 《Java面试题集51-70》中第60题的代码。只有一点需要提醒,解锁的方法unlock的调用最好能够在finally块中,因为这里是释放外部资源最好的地方,当然也是释放锁的最佳位置,因为不管正常异常可能都要释放掉锁来给其他线程以运行的机会。

CountDownLatch

CountDownLatch是一种简单的同步模式,它让一个线程可以等待一个或多个线程完成它们的工作从而避免对临界资源并发访问所引发的各种问题。下面借用别人的一段代码(我对它做了一些重构)来演示CountDownLatch是如何工作的。

import java.util.concurrent.CountDownLatch;

/**
 * 工人类
 * @author 骆昊
 *
 */
class Worker {
    private String name;        // 名字
    private long workDuration;  // 工作持续时间

    /**
     * 构造器
     */
    public Worker(String name, long workDuration) {
        this.name = name;
        this.workDuration = workDuration;
    }

    /**
     * 完成工作
     */
    public void doWork() {
        System.out.println(name + " begins to work...");
        try {
            Thread.sleep(workDuration); // 用休眠模拟工作执行的时间
        } catch(InterruptedException ex) {
            ex.printStackTrace();
        }
        System.out.println(name + " has finished the job...");
    }
}

/**
 * 测试线程
 * @author 骆昊
 *
 */
class WorkerTestThread implements Runnable {
    private Worker worker;
    private CountDownLatch cdLatch;

    public WorkerTestThread(Worker worker, CountDownLatch cdLatch) {
        this.worker = worker;
        this.cdLatch = cdLatch;
    }

    @Override
    public void run() {
        worker.doWork();        // 让工人开始工作
        cdLatch.countDown();    // 工作完成后倒计时次数减1
    }
}

class CountDownLatchTest {

    private static final int MAX_WORK_DURATION = 5000;  // 最大工作时间
    private static final int MIN_WORK_DURATION = 1000;  // 最小工作时间

    // 产生随机的工作时间
    private static long getRandomWorkDuration(long min, long max) {
        return (long) (Math.random() * (max - min) + min);
    }

    public static void main(String[] args) {
        CountDownLatch latch = new CountDownLatch(2);   // 创建倒计时闩并指定倒计时次数为2
        Worker w1 = new Worker("骆昊", getRandomWorkDuration(MIN_WORK_DURATION, MAX_WORK_DURATION));
        Worker w2 = new Worker("王大锤", getRandomWorkDuration(MIN_WORK_DURATION, MAX_WORK_DURATION));

        new Thread(new WorkerTestThread(w1, latch)).start();
        new Thread(new WorkerTestThread(w2, latch)).start();

        try {
            latch.await();  // 等待倒计时闩减到0
            System.out.println("All jobs have been finished!");
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

ConcurrentHashMap

ConcurrentHashMap是HashMap在并发环境下的版本,大家可能要问,既然已经可以通过Collections.synchronizedMap获得线程安全的映射型容器,为什么还需要ConcurrentHashMap呢?因为通过Collections工具类获得的线程安全的HashMap会在读写数据时对整个容器对象上锁,这样其他使用该容器的线程无论如何也无法再获得该对象的锁,也就意味着要一直等待前一个获得锁的线程离开同步代码块之后才有机会执行。实际上,HashMap是通过哈希函数来确定存放键值对的桶(桶是为了解决哈希冲突而引入的),修改HashMap时并不需要将整个容器锁住,只需要锁住即将修改的“桶”就可以了。HashMap的数据结构如下图所示。
这里写图片描述

此外,ConcurrentHashMap还提供了原子操作的方法,如下所示:

  • putIfAbsent:如果还没有对应的键值对映射,就将其添加到HashMap中。
  • remove:如果键存在而且值与当前状态相等(equals比较结果为true),则用原子方式移除该键值对映射
  • replace:替换掉映射中元素的原子操作

CopyOnWriteArrayList

CopyOnWriteArrayList是ArrayList在并发环境下的替代品。CopyOnWriteArrayList通过增加写时复制语义来避免并发访问引起的问题,也就是说任何修改操作都会在底层创建一个列表的副本,也就意味着之前已有的迭代器不会碰到意料之外的修改。这种方式对于不要严格读写同步的场景非常有用,因为它提供了更好的性能。记住,要尽量减少锁的使用,因为那势必带来性能的下降(对数据库中数据的并发访问不也是如此吗?如果可以的话就应该放弃悲观锁而使用乐观锁),CopyOnWriteArrayList很明显也是通过牺牲空间获得了时间(在计算机的世界里,时间和空间通常是不可调和的矛盾,可以牺牲空间来提升效率获得时间,当然也可以通过牺牲时间来减少对空间的使用)。
这里写图片描述

可以通过下面两段代码的运行状况来验证一下CopyOnWriteArrayList是不是线程安全的容器。

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

class AddThread implements Runnable {
    private List<Double> list;

    public AddThread(List<Double> list) {
        this.list = list;
    }

    @Override
    public void run() {
        for(int i = 0; i < 10000; ++i) {
            list.add(Math.random());
        }
    }
}

public class Test05 {
    private static final int THREAD_POOL_SIZE = 2;

    public static void main(String[] args) {
        List<Double> list = new ArrayList<>();
        ExecutorService es = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
        es.execute(new AddThread(list));
        es.execute(new AddThread(list));
        es.shutdown();
    }
}

上面的代码会在运行时产生ArrayIndexOutOfBoundsException,试一试将上面代码25行的ArrayList换成CopyOnWriteArrayList再重新运行。

List<Double> list = new CopyOnWriteArrayList<>();

Queue

队列是一个无处不在的美妙概念,它提供了一种简单又可靠的方式将资源分发给处理单元(也可以说是将工作单元分配给待处理的资源,这取决于你看待问题的方式)。实现中的并发编程模型很多都依赖队列来实现,因为它可以在线程之间传递工作单元。
Java 5中的BlockingQueue就是一个在并发环境下非常好用的工具,在调用put方法向队列中插入元素时,如果队列已满,它会让插入元素的线程等待队列腾出空间;在调用take方法从队列中取元素时,如果队列为空,取出元素的线程就会阻塞。
这里写图片描述
可以用BlockingQueue来实现生产者-消费者并发模型(下一节中有介绍),当然在Java 5以前也可以通过wait和notify来实现线程调度,比较一下两种代码就知道基于已有的并发工具类来重构并发代码到底好在哪里了。

  • 基于wait和notify的实现
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * 公共常量
 * @author 骆昊
 *
 */
class Constants {
    public static final int MAX_BUFFER_SIZE = 10;
    public static final int NUM_OF_PRODUCER = 2;
    public static final int NUM_OF_CONSUMER = 3;
}

/**
 * 工作任务
 * @author 骆昊
 *
 */
class Task {
    private String id;  // 任务的编号

    public Task() {
        id = UUID.randomUUID().toString();
    }

    @Override
    public String toString() {
        return "Task[" + id + "]";
    }
}

/**
 * 消费者
 * @author 骆昊
 *
 */
class Consumer implements Runnable {
    private List<Task> buffer;

    public Consumer(List<Task> buffer) {
        this.buffer = buffer;
    }

    @Override
    public void run() {
        while(true) {
            synchronized(buffer) {
                while(buffer.isEmpty()) {
                    try {
                        buffer.wait();
                    } catch(InterruptedException e) {
                        e.printStackTrace();
                    }
                }
                Task task = buffer.remove(0);
                buffer.notifyAll();
                System.out.println("Consumer[" + Thread.currentThread().getName() + "] got " + task);
            }
        }
    }
}

/**
 * 生产者
 * @author 骆昊
 *
 */
class Producer implements Runnable {
    private List<Task> buffer;

    public Producer(List<Task> buffer) {
        this.buffer = buffer;
    }

    @Override
    public void run() {
        while(true) {
            synchronized (buffer) {
                while(buffer.size() >= Constants.MAX_BUFFER_SIZE) {
                    try {
                        buffer.wait();
                    } catch(InterruptedException e) {
                        e.printStackTrace();
                    }
                }
                Task task = new Task();
                buffer.add(task);
                buffer.notifyAll();
                System.out.println("Producer[" + Thread.currentThread().getName() + "] put " + task);
            }
        }
    }

}

public class Test06 {

    public static void main(String[] args) {
        List<Task> buffer = new ArrayList<>(Constants.MAX_BUFFER_SIZE);
        ExecutorService es = Executors.newFixedThreadPool(Constants.NUM_OF_CONSUMER + Constants.NUM_OF_PRODUCER);
        for(int i = 1; i <= Constants.NUM_OF_PRODUCER; ++i) {
            es.execute(new Producer(buffer));
        }
        for(int i = 1; i <= Constants.NUM_OF_CONSUMER; ++i) {
            es.execute(new Consumer(buffer));
        }
    }
}
  • 基于BlockingQueue的实现
import java.util.UUID;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;

/**
 * 公共常量
 * @author 骆昊
 *
 */
class Constants {
    public static final int MAX_BUFFER_SIZE = 10;
    public static final int NUM_OF_PRODUCER = 2;
    public static final int NUM_OF_CONSUMER = 3;
}

/**
 * 工作任务
 * @author 骆昊
 *
 */
class Task {
    private String id;  // 任务的编号

    public Task() {
        id = UUID.randomUUID().toString();
    }

    @Override
    public String toString() {
        return "Task[" + id + "]";
    }
}

/**
 * 消费者
 * @author 骆昊
 *
 */
class Consumer implements Runnable {
    private BlockingQueue<Task> buffer;

    public Consumer(BlockingQueue<Task> buffer) {
        this.buffer = buffer;
    }

    @Override
    public void run() {
        while(true) {
            try {
                Task task = buffer.take();
                System.out.println("Consumer[" + Thread.currentThread().getName() + "] got " + task);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

/**
 * 生产者
 * @author 骆昊
 *
 */
class Producer implements Runnable {
    private BlockingQueue<Task> buffer;

    public Producer(BlockingQueue<Task> buffer) {
        this.buffer = buffer;
    }

    @Override
    public void run() {
        while(true) {
            try {
                Task task = new Task();
                buffer.put(task);
                System.out.println("Producer[" + Thread.currentThread().getName() + "] put " + task);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }

        }
    }

}

public class Test07 {

    public static void main(String[] args) {
        BlockingQueue<Task> buffer = new LinkedBlockingQueue<>(Constants.MAX_BUFFER_SIZE);
        ExecutorService es = Executors.newFixedThreadPool(Constants.NUM_OF_CONSUMER + Constants.NUM_OF_PRODUCER);
        for(int i = 1; i <= Constants.NUM_OF_PRODUCER; ++i) {
            es.execute(new Producer(buffer));
        }
        for(int i = 1; i <= Constants.NUM_OF_CONSUMER; ++i) {
            es.execute(new Consumer(buffer));
        }
    }
}

使用BlockingQueue后代码优雅了很多。

并发模型

在继续下面的探讨之前,我们还是重温一下几个概念:

概念 解释
临界资源 并发环境中有着固定数量的资源
互斥 对资源的访问是排他式的
饥饿 一个或一组线程长时间或永远无法取得进展
死锁 两个或多个线程相互等待对方结束
活锁 想要执行的线程总是发现其他的线程正在执行以至于长时间或永远无法执行

重温了这几个概念后,我们可以探讨一下下面的几种并发模型。

生产者-消费者

一个或多个生产者创建某些工作并将其置于缓冲区或队列中,一个或多个消费者会从队列中获得这些工作并完成之。这里的缓冲区或队列是临界资源。当缓冲区或队列放满的时候,生产这会被阻塞;而缓冲区或队列为空的时候,消费者会被阻塞。生产者和消费者的调度是通过二者相互交换信号完成的。

读者-写者

当存在一个主要为读者提供信息的共享资源,它偶尔会被写者更新,但是需要考虑系统的吞吐量,又要防止饥饿和陈旧资源得不到更新的问题。在这种并发模型中,如何平衡读者和写者是最困难的,当然这个问题至今还是一个被热议的问题,恐怕必须根据具体的场景来提供合适的解决方案而没有那种放之四海而皆准的方法(不像我在国内的科研文献中看到的那样)。

哲学家进餐

1965年,荷兰计算机科学家图灵奖得主Edsger Wybe Dijkstra提出并解决了一个他称之为哲学家进餐的同步问题。这个问题可以简单地描述如下:五个哲学家围坐在一张圆桌周围,每个哲学家面前都有一盘通心粉。由于通心粉很滑,所以需要两把叉子才能夹住。相邻两个盘子之间放有一把叉子如下图所示。哲学家的生活中有两种交替活动时段:即吃饭和思考。当一个哲学家觉得饿了时,他就试图分两次去取其左边和右边的叉子,每次拿一把,但不分次序。如果成功地得到了两把叉子,就开始吃饭,吃完后放下叉子继续思考。
把上面问题中的哲学家换成线程,把叉子换成竞争的临界资源,上面的问题就是线程竞争资源的问题。如果没有经过精心的设计,系统就会出现死锁、活锁、吞吐量下降等问题。
这里写图片描述
下面是用信号量原语来解决哲学家进餐问题的代码,使用了Java 5并发工具包中的Semaphore类(代码不够漂亮但是已经足以说明问题了)。

//import java.util.concurrent.ExecutorService;
//import java.util.concurrent.Executors;
import java.util.concurrent.Semaphore;

/**
 * 存放线程共享信号量的上下问
 * @author 骆昊
 *
 */
class AppContext {
    public static final int NUM_OF_FORKS = 5;   // 叉子数量(资源)
    public static final int NUM_OF_PHILO = 5;   // 哲学家数量(线程)

    public static Semaphore[] forks;    // 叉子的信号量
    public static Semaphore counter;    // 哲学家的信号量

    static {
        forks = new Semaphore[NUM_OF_FORKS];

        for (int i = 0, len = forks.length; i < len; ++i) {
            forks[i] = new Semaphore(1);    // 每个叉子的信号量为1
        }

        counter = new Semaphore(NUM_OF_PHILO - 1);  // 如果有N个哲学家,最多只允许N-1人同时取叉子
    }

    /**
     * 取得叉子
     * @param index 第几个哲学家
     * @param leftFirst 是否先取得左边的叉子
     * @throws InterruptedException
     */
    public static void putOnFork(int index, boolean leftFirst) throws InterruptedException {
        if(leftFirst) {
            forks[index].acquire();
            forks[(index + 1) % NUM_OF_PHILO].acquire();
        }
        else {
            forks[(index + 1) % NUM_OF_PHILO].acquire();
            forks[index].acquire();
        }
    }

    /**
     * 放回叉子
     * @param index 第几个哲学家
     * @param leftFirst 是否先放回左边的叉子
     * @throws InterruptedException
     */
    public static void putDownFork(int index, boolean leftFirst) throws InterruptedException {
        if(leftFirst) {
            forks[index].release();
            forks[(index + 1) % NUM_OF_PHILO].release();
        }
        else {
            forks[(index + 1) % NUM_OF_PHILO].release();
            forks[index].release();
        }
    }
}

/**
 * 哲学家
 * @author 骆昊
 *
 */
class Philosopher implements Runnable {
    private int index;      // 编号
    private String name;    // 名字

    public Philosopher(int index, String name) {
        this.index = index;
        this.name = name;
    }

    @Override
    public void run() {
        while(true) {
            try {
                AppContext.counter.acquire();
                boolean leftFirst = index % 2 == 0;
                AppContext.putOnFork(index, leftFirst);
                System.out.println(name + "正在吃意大利面(通心粉)...");   // 取到两个叉子就可以进食
                AppContext.putDownFork(index, leftFirst);
                AppContext.counter.release();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

public class Test04 {

    public static void main(String[] args) {
        String[] names = { "骆昊", "王大锤", "张三丰", "杨过", "李莫愁" };   // 5位哲学家的名字
//      ExecutorService es = Executors.newFixedThreadPool(AppContext.NUM_OF_PHILO); // 创建固定大小的线程池
//      for(int i = 0, len = names.length; i < len; ++i) {
//          es.execute(new Philosopher(i, names[i]));   // 启动线程
//      }
//      es.shutdown();
        for(int i = 0, len = names.length; i < len; ++i) {
            new Thread(new Philosopher(i, names[i])).start();
        }
    }

}

现实中的并发问题基本上都是这三种模型或者是这三种模型的变体。

测试并发代码

对并发代码的测试也是非常棘手的事情,棘手到无需说明大家也很清楚的程度,所以这里我们只是探讨一下如何解决这个棘手的问题。我们建议大家编写一些能够发现问题的测试并经常性的在不同的配置和不同的负载下运行这些测试。不要忽略掉任何一次失败的测试,线程代码中的缺陷可能在上万次测试中仅仅出现一次。具体来说有这么几个注意事项:

  • 不要将系统的失效归结于偶发事件,就像拉不出屎的时候不能怪地球没有引力。
  • 先让非并发代码工作起来,不要试图同时找到并发和非并发代码中的缺陷。
  • 编写可以在不同配置环境下运行的线程代码。
  • 编写容易调整的线程代码,这样可以调整线程使性能达到最优。
  • 让线程的数量多于CPU或CPU核心的数量,这样CPU调度切换过程中潜在的问题才会暴露出来。
  • 让并发代码在不同的平台上运行。
  • 通过自动化或者硬编码的方式向并发代码中加入一些辅助测试的代码。

Java 7的并发编程

Java 7中引入了TransferQueue,它比BlockingQueue多了一个叫transfer的方法,如果接收线程处于等待状态,该操作可以马上将任务交给它,否则就会阻塞直至取走该任务的线程出现。可以用TransferQueue代替BlockingQueue,因为它可以获得更好的性能。
刚才忘记了一件事情,Java 5中还引入了Callable接口、Future接口和FutureTask接口,通过他们也可以构建并发应用程序,代码如下所示。

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class Test07 {
    private static final int POOL_SIZE = 10;

    static class CalcThread implements Callable<Double> {
        private List<Double> dataList = new ArrayList<>();

        public CalcThread() {
            for(int i = 0; i < 10000; ++i) {
                dataList.add(Math.random());
            }
        }

        @Override
        public Double call() throws Exception {
            double total = 0;
            for(Double d : dataList) {
                total += d;
            }
            return total / dataList.size();
        }

    }

    public static void main(String[] args) {
        List<Future<Double>> fList = new ArrayList<>();
        ExecutorService es = Executors.newFixedThreadPool(POOL_SIZE);
        for(int i = 0; i < POOL_SIZE; ++i) {
            fList.add(es.submit(new CalcThread()));
        }

        for(Future<Double> f : fList) {
            try {
                System.out.println(f.get());
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        es.shutdown();
    }
}

Callable接口也是一个单方法接口,显然这是一个回调方法,类似于函数式编程中的回调函数,在Java 8 以前,Java中还不能使用Lambda表达式来简化这种函数式编程。和Runnable接口不同的是Callable接口的回调方法call方法会返回一个对象,这个对象可以用将来时的方式在线程执行结束的时候获得信息。上面代码中的call方法就是将计算出的10000个0到1之间的随机小数的平均值返回,我们通过一个Future接口的对象得到了这个返回值。目前最新的Java版本中,Callable接口和Runnable接口都被打上了@FunctionalInterface的注解,也就是说它可以用函数式编程的方式(Lambda表达式)创建接口对象。
下面是Future接口的主要方法:

  • get():获取结果。如果结果还没有准备好,get方法会阻塞直到取得结果;当然也可以通过参数设置阻塞超时时间。
  • cancel():在运算结束前取消。
  • isDone():可以用来判断运算是否结束。

Java 7中还提供了分支/合并(fork/join)框架,它可以实现线程池中任务的自动调度,并且这种调度对用户来说是透明的。为了达到这种效果,必须按照用户指定的方式对任务进行分解,然后再将分解出的小型任务的执行结果合并成原来任务的执行结果。这显然是运用了分治法(divide-and-conquer)的思想。下面的代码使用了分支/合并框架来计算1到10000的和,当然对于如此简单的任务根本不需要分支/合并框架,因为分支和合并本身也会带来一定的开销,但是这里我们只是探索一下在代码中如何使用分支/合并框架,让我们的代码能够充分利用现代多核CPU的强大运算能力。

import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.Future;
import java.util.concurrent.RecursiveTask;

class Calculator extends RecursiveTask<Integer> {
    private static final long serialVersionUID = 7333472779649130114L;

    private static final int THRESHOLD = 10;
    private int start;
    private int end;

    public Calculator(int start, int end) {
        this.start = start;
        this.end = end;
    }

    @Override
    public Integer compute() {
        int sum = 0;
        if ((end - start) < THRESHOLD) {    // 当问题分解到可求解程度时直接计算结果
            for (int i = start; i <= end; i++) {
                sum += i;
            }
        } else {
            int middle = (start + end) >>> 1;
            // 将任务一分为二
            Calculator left = new Calculator(start, middle);
            Calculator right = new Calculator(middle + 1, end);
            left.fork();
            right.fork();
            // 注意:由于此处是递归式的任务分解,也就意味着接下来会二分为四,四分为八...

            sum = left.join() + right.join();   // 合并两个子任务的结果
        }
        return sum;
    }

}

public class Test08 {

    public static void main(String[] args) throws Exception {
        ForkJoinPool forkJoinPool = new ForkJoinPool();
        Future<Integer> result = forkJoinPool.submit(new Calculator(1, 10000));
        System.out.println(result.get());
    }
}

伴随着Java 7的到来,Java中默认的数组排序算法已经不再是经典的快速排序(双枢轴快速排序)了,新的排序算法叫TimSort,它是归并排序和插入排序的混合体,TimSort可以通过分支合并框架充分利用现代处理器的多核特性,从而获得更好的性能(更短的排序时间)。

参考文献

  1. Benjamin J. Evans, etc, The Well-Grounded Java Developer. Jul 21, 2012
  2. Robert Martin, Clean Code. Aug 11, 2008.
  3. Doug Lea, Concurrent Programming in Java: Design Principles and Patterns. 1999

from:http://www.importnew.com/22286.html

扛住100亿次请求?我们来试一试

作者:ppmsn2005#gmail.com
项目: https://github.com/xiaojiaqi/10billionhongbaos
wiki: https://github.com/xiaojiaqi/10billionhongbaos/wiki/扛住100亿次请求?我们来试一试

1. 前言

前几天,偶然看到了 《扛住100亿次请求——如何做一个“有把握”的春晚红包系统”》(url)一文,看完以后,感慨良多,收益很多。正所谓他山之石,可以攻玉,虽然此文发表于2015年,我看到时已经是2016年末,但是其中的思想仍然是可以为很多后端设计借鉴,。同时作为一个工程师,看完以后又会思考,学习了这样的文章以后,是否能给自己的工作带来一些实际的经验呢?所谓纸上得来终觉浅,绝知此事要躬行,能否自己实践一下100亿次红包请求呢?否则读完以后脑子里能剩下的东西 不过就是100亿 1400万QPS整流 这样的字眼,剩下的文章将展示作者是如何以此过程为目标,在本地环境的模拟了此过程。

实现的目标: 单机支持100万连接,模拟了摇红包和发红包过程,单机峰值QPS 6万,平稳支持了业务。

注:本文以及作者所有内容,仅代表个人理解和实践,过程和微信团队没有任何关系,真正的线上系统也不同,只是从一些技术点进行了实践,请读者进行区分。因作者水平有限,有任何问题都是作者的责任,有问题请联系 ppmsn2005#gmail.com. 全文内容 扛住100亿次请求?我们来试一试

2. 背景知识

QPS: Queries per second 每秒的请求数目

PPS:Packets per second 每秒数据包数目

摇红包:客户端发出一个摇红包的请求,如果系统有红包就会返回,用户获得红包

发红包:产生一个红包里面含有一定金额,红包指定数个用户,每个用户会收到红包信息,用户可以发送拆红包的请求,获取其中的部分金额。

3. 确定目标

在一切系统开始以前,我们应该搞清楚我们的系统在完成以后,应该有一个什么样的负载能力。

3.1 用户总数:

通过文章我们可以了解到接入服务器638台, 服务上限大概是14.3亿用户, 所以单机负载的用户上限大概是14.3亿/638台=228万用户/台。但是目前中国肯定不会有14亿用户同时在线,参考 http://qiye.qianzhan.com/show/detail/160818-b8d1c700.html 的说法,2016年Q2 微信用户大概是8亿,月活在5.4 亿左右。所以在2015年春节期间,虽然使用的用户会很多,但是同时在线肯定不到5.4亿。

3.2. 服务器数量:

一共有638台服务器,按照正常运维设计,我相信所有服务器不会完全上线,会有一定的硬件冗余,来防止突发硬件故障。假设一共有600台接入服务器。

3.3 单机需要支持的负载数:

每台服务器支持的用户数:5.4亿/600 = 90万。也就是平均单机支持90万用户。如果真实情况比90万更多,则模拟的情况可能会有偏差,但是我认为QPS在这个实验中更重要。

3.4. 单机峰值QPS:

文章中明确表示为1400万QPS.这个数值是非常高的,但是因为有600台服务器存在,所以单机的QPS为 1400万/600= 约为2.3万QPS, 文章曾经提及系统可以支持4000万QPS,那么系统的QPS 至少要到4000万/600 = 约为 6.6万, 这个数值大约是目前的3倍,短期来看并不会被触及。但是我相信应该做过相应的压力测试。

3.5. 发放红包:

文中提到系统以5万个每秒的下发速度,那么单机每秒下发速度50000/600 =83个/秒,也就是单机系统应该保证每秒以83个的速度下发即可。
最后考虑到系统的真实性,还至少有用户登录的动作,拿红包这样的业务。真实的系统还会包括聊天这样的服务业务。

最后整体的看一下 100亿次摇红包这个需求,假设它是均匀地发生在春节联欢晚会的4个小时里,那么服务器的QPS 应该是10000000000/600/3600/4.0=1157. 也就是单机每秒1000多次,这个数值其实并不高。如果完全由峰值速度1400万消化 10000000000/(1400*10000) = 714秒,也就是说只需要峰值坚持11分钟,就可以完成所有的请求。可见互联网产品的一个特点就是峰值非常高,持续时间并不会很长。

总结:

从单台服务器看.它需要满足下面一些条件
1. 支持至少100万连接用户
2. 每秒至少能处理2.3万的QPS,这里我们把目标定得更高一些 分别设定到了3万和6万。
3. 摇红包:支持每秒83个的速度下发放红包,也就是说每秒有2.3万次摇红包的请求,其中83个请求能摇到红包,其余的2.29万次请求会知道自己没摇到。当然客户端在收到红包以后,也需要确保客户端和服务器两边的红包数目和红包内的金额要一致。因为没有支付模块,所以我们也把要求提高一倍,达到200个红包每秒的分发速度
4. 支持用户之间发红包业务,确保收发两边的红包数目和红包内金额要一致。同样也设定200个红包每秒的分发速度为我们的目标。

想完整模拟整个系统实在太难了,首先需要海量的服务器,其次需要上亿的模拟客户端。这对我来说是办不到,但是有一点可以确定,整个系统是可以水平扩展的,所以我们可以模拟100万客户端,在模拟一台服务器 那么就完成了1/600的模拟。

和现有系统区别:
和大部分高QPS测试的不同,本系统的侧重点有所不同。我对2者做了一些对比。

常见高QPS系统压力测试 本系统压力测试
连接数 一般<1000 (几百以内) 1000000 (1百万)
单连接吞吐量 非常大 每个连接几十M字节吞吐 非常小 每个连接每次几十个字节
需要的IO次数 不多 非常多

4. 基础软件和硬件

4.1软件:

Golang 1.8r3 , shell, python (开发没有使用c++ 而是使用了golang, 是因为使用golang 的最初原型达到了系统要求。虽然golang 还存在一定的问题,但是和开发效率比,这点损失可以接受)
服务器操作系统:
Ubuntu 12.04
客户端操作系统:
debian 5.0

4.2硬件环境

服务端: dell R2950。 8核物理机,非独占有其他业务在工作,16G内存。这台硬件大概是7年前的产品,性能应该不是很高要求。
服务器硬件版本:
machine
服务器CPU信息:
cpu

客户端: esxi 5.0 虚拟机,配置为4核 5G内存。一共17台,每台和服务器建立6万个连接。完成100万客户端模拟

5. 技术分析和实现

5.1) 单机实现100万用户连接

这一点来说相对简单,笔者在几年前就早完成了单机百万用户的开发以及操作。现代的服务器都可以支持百万用户。相关内容可以查看   github代码以及相关文档。
https://github.com/xiaojiaqi/C1000kPracticeGuide
系统配置以及优化文档:
https://github.com/xiaojiaqi/C1000kPracticeGuide/tree/master/docs/cn

5.2) 3万QPS

这个问题需要分2个部分来看客户端方面和服务器方面。

客户端QPS

因为有100万连接连在服务器上,QPS为3万。这就意味着每个连接每33秒,就需要向服务器发一个摇红包的请求。因为单IP可以建立的连接数为6万左右, 有17台服务器同时模拟客户端行为。我们要做的就保证在每一秒都有这么多的请求发往服务器即可。
其中技术要点就是客户端协同。但是各个客户端的启动时间,建立连接的时间都不一致,还存在网络断开重连这样的情况,各个客户端如何判断何时自己需要发送请求,各自该发送多少请求呢?

我是这样解决的:利用NTP服务,同步所有的服务器时间,客户端利用时间戳来判断自己的此时需要发送多少请求。
算法很容易实现:
假设有100万用户,则用户id 为0-999999.要求的QPS为5万, 客户端得知QPS为5万,总用户数为100万,它计算 100万/5万=20,所有的用户应该分为20组,如果 time() % 20 == 用户id % 20,那么这个id的用户就该在这一秒发出请求,如此实现了多客户端协同工作。每个客户端只需要知道 总用户数和QPS 就能自行准确发出请求了。
(扩展思考:如果QPS是3万 这样不能被整除的数目,该如何办?如何保证每台客户端发出的请求数目尽量的均衡呢?)

服务器QPS

服务器端的QPS相对简单,它只需要处理客户端的请求即可。但是为了客观了解处理情况,我们还需要做2件事情。
第一: 需要记录每秒处理的请求数目,这需要在代码里埋入计数器。
第二: 我们需要监控网络,因为网络的吞吐情况,可以客观的反映出QPS的真实数据。为此,我利用python脚本 结合ethtool 工具编写了一个简单的工具,通过它我们可以直观的监视到网络的数据包通过情况如何。它可以客观的显示出我们的网络有如此多的数据传输在发生。
工具截图: 工具截图

5.3) 摇红包业务

摇红包的业务非常简单,首先服务器按照一定的速度生产红包。红包没有被取走的话,就堆积在里面。服务器接收一个客户端的请求,如果服务器里现在有红包就会告诉客户端有,否则就提示没有红包。
因为单机每秒有3万的请求,所以大部分的请求会失败。只需要处理好锁的问题即可。
我为了减少竞争,将所有的用户分在了不同的桶里。这样可以减少对锁的竞争。如果以后还有更高的性能要求,还可以使用 高性能队列——Disruptor来进一步提高性能。

注意,在我的测试环境里是缺少支付这个核心服务的,所以实现的难度是大大的减轻了。另外提供一组数字:2016年淘宝的双11的交易峰值仅仅为12万/秒,微信红包分发速度是5万/秒,要做到这点是非常困难的。(http://mt.sohu.com/20161111/n472951708.shtml)

5.4) 发红包业务

发红包的业务很简单,系统随机产生一些红包,并且随机选择一些用户,系统向这些用户提示有红包。这些用户只需要发出拆红包的请求,系统就可以随机从红包中拆分出部分金额,分给用户,完成这个业务。同样这里也没有支付这个核心服务。

5.5)监控

最后 我们需要一套监控系统来了解系统的状况,我借用了我另一个项目(https://github.com/xiaojiaqi/fakewechat) 里的部分代码完成了这个监控模块,利用这个监控,服务器和客户端会把当前的计数器内容发往监控,监控需要把各个客户端的数据做一个整合和展示。同时还会把日志记录下来,给以后的分析提供原始数据。 线上系统更多使用opentsdb这样的时序数据库,这里资源有限,所以用了一个原始的方案

监控显示日志大概这样
监控日志

6. 代码实现及分析

在代码方面,使用到的技巧实在不多,主要是设计思想和golang本身的一些问题需要考虑。
首先golang的goroutine 的数目控制,因为至少有100万以上的连接,所以按照普通的设计方案,至少需要200万或者300万的goroutine在工作。这会造成系统本身的负担很重。
其次就是100万个连接的管理,无论是连接还是业务都会造成一些心智的负担。
我的设计是这样的:

架构图

首先将100万连接分成多个不同的SET,每个SET是一个独立,平行的对象。每个SET 只管理几千个连接,如果单个SET 工作正常,我只需要添加SET就能提高系统处理能力。
其次谨慎的设计了每个SET里数据结构的大小,保证每个SET的压力不会太大,不会出现消息的堆积。
再次减少了gcroutine的数目,每个连接只使用一个goroutine,发送消息在一个SET里只有一个gcroutine负责,这样节省了100万个goroutine。这样整个系统只需要保留 100万零几百个gcroutine就能完成业务。大量的节省了cpu 和内存
系统的工作流程大概如下:
每个客户端连接成功后,系统会分配一个goroutine读取客户端的消息,当消息读取完成,将它转化为消息对象放至在SET的接收消息队列,然后返回获取下一个消息
在SET内部,有一个工作goroutine,它只做非常简单而高效的事情,它做的事情如下,检查SET的接受消息,它会收到3类消息

1, 客户端的摇红包请求消息

2, 客户端的其他消息 比如聊天 好友这一类

3, 服务器端对客户端消息的回应

对于 第1种消息 客户端的摇红包请求消息 是这样处理的,从客户端拿到摇红包请求消息,试图从SET的红包队列里 获取一个红包,如果拿到了就把红包信息 返回给客户端,否则构造一个没有摇到的消息,返回给对应的客户端。
对于第2种消息 客户端的其他消息 比如聊天 好友这一类,只需简单地从队列里拿走消息,转发给后端的聊天服务队列即可,其他服务会把消息转发出去。
对于第3种消息 服务器端对客户端消息的回应。SET 只需要根据消息里的用户id,找到SET里保留的用户连接对象,发回去就可以了。

对于红包产生服务,它的工作很简单,只需要按照顺序在轮流在每个SET的红包产生对列里放至红包对象就可以了。这样可以保证每个SET里都是公平的,其次它的工作强度很低,可以保证业务稳定。

见代码
https://github.com/xiaojiaqi/10billionhongbaos

7实践

实践的过程分为3个阶段

阶段1:

分别启动服务器端和监控端,然后逐一启动17台客户端,让它们建立起100万的链接。在服务器端,利用ss 命令 统计出每个客户端和服务器建立了多少连接。
命令如下:
Alias ss2=Ss –ant | grep 1025 | grep EST | awk –F: “{print \$8}” | sort | uniq –c’

结果如下: 100万连接建立

阶段2:

利用客户端的http接口,将所有的客户端QPS 调整到3万,让客户端发出3W QPS强度的请求。
运行如下命令:
启动脚本

观察网络监控和监控端反馈,发现QPS 达到预期数据
网络监控截图
3万qps

在服务器端启动一个产生红包的服务,这个服务会以200个每秒的速度下发红包,总共4万个。此时观察客户端在监控上的日志,会发现基本上以200个每秒的速度获取到红包。

摇红包

等到所有红包下发完成后,再启动一个发红包的服务,这个服务系统会生成2万个红包,每秒也是200个,每个红包随机指定3位用户,并向这3个用户发出消息,客户端会自动来拿红包,最后所有的红包都被拿走。

发红包

阶段3

利用客户端的http接口,将所有的客户端QPS 调整到6万,让客户端发出6W QPS强度的请求。

6wqps

如法炮制,在服务器端,启动一个产生红包的服务,这个服务会以200个每秒的速度下发红包。总共4万个。此时观察客户端在监控上的日志,会发现基本上以200个每秒的速度获取到红包。
等到所有红包下发完成后,再启动一个发红包的服务,这个服务系统会生成2万个红包,每秒也是200个,每个红包随机指定3位用户,并向这3个用户发出消息,客户端会自动来拿红包,最后所有的红包都被拿走。

最后,实践完成。

8 分析数据

在实践过程中,服务器和客户端都将自己内部的计数器记录发往监控端,成为了日志。我们利用简单python 脚本和gnuplt 绘图工具,将实践的过程可视化,由此来验证运行过程。

第一张是 客户端的QPS发送数据
客户端qps
这张图的横坐标是时间,单位是秒,纵坐标是QPS,表示这时刻所有客户端发送的请求的QPS。
图的第一区间,几个小的峰值,是100万客户端建立连接的, 图的第二区间是3万QPS 区间,我们可以看到数据 比较稳定的保持在3万这个区间。最后是6万QPS区间。但是从整张图可以看到QPS不是完美地保持在我们希望的直线上。这主要是以下几个原因造成的

  1. 当非常多goroutine 同时运行的时候,依靠sleep 定时并不准确,发生了偏移。我觉得这是golang本身调度导致的。当然如果cpu比较强劲,这个现象会消失。
  2. 因为网络的影响,客户端在发起连接时,可能发生延迟,导致在前1秒没有完成连接。
  3. 服务器负载较大时,1000M网络已经出现了丢包现象,可以通过ifconfig 命令观察到这个现象,所以会有QPS的波动。

第二张是 服务器处理的QPS图
服务器qps

和客户端的向对应的,服务器也存在3个区间,和客户端的情况很接近。但是我们看到了在大概22:57分,系统的处理能力就有一个明显的下降,随后又提高的尖状。这说明代码还需要优化。

整体观察在3万QPS区间,服务器的QPS比较稳定,在6万QSP时候,服务器的处理就不稳定了。我相信这和我的代码有关,如果继续优化的话,还应该能有更好的效果。

将2张图合并起来 qps

基本是吻合的,这也证明系统是符合预期设计的。

这是红包生成数量的状态变化图
生成红包

非常的稳定。

这是客户端每秒获取的摇红包状态
获取红包

可以发现3万QPS区间,客户端每秒获取的红包数基本在200左右,在6万QPS的时候,以及出现剧烈的抖动,不能保证在200这个数值了。我觉得主要是6万QPS时候,网络的抖动加剧了,造成了红包数目也在抖动。

最后是golang 自带的pprof 信息,其中有gc 时间超过了10ms, 考虑到这是一个7年前的硬件,而且非独占模式,所以还是可以接受。
pprof

总结:

按照设计目标,我们模拟和设计了一个支持100万用户,并且每秒至少可以支持3万QPS,最多6万QPS的系统,简单模拟了微信的摇红包和发红包的过程。可以说达到了预期的目的。
如果600台主机每台主机可以支持6万QPS,只需要7分钟就可以完成 100亿次摇红包请求。

虽然这个原型简单地完成了预设的业务,但是它和真正的服务会有哪些差别呢?我罗列了一下

区别 真正服务 本次模拟
业务复杂 更复杂 非常简单
协议 Protobuf 以及加密 简单的协议
支付 复杂
日志 复杂
性能 更高
用户分布 用户id分散在不同服务器,需要hash以后统一, 复杂。 用户id 连续,很多优化使代码简单 非常高效
安全控制 复杂
热更新及版本控制 复杂
监控 细致 简单

Refers: