从技术雷达看DevOps的十年 – 基础设施即代码和云计算

在上一篇文章中,我们讲到了 DevOps 和持续交付的关系。本篇将回顾最先改变运维工作的相关技术 —— 基础设施即代码和云计算,通过技术雷达上相关条目的变动来跟踪其趋势变化。

和持续交付一样,基础设施即代码(Infrastructure as code)这项技术第一次在技术雷达出现就被纳入到了“采纳”环。

(2012年10月期技术雷达,blip28: Infrastructure as code, Adopt)

十年前,云计算的普及程度远不如当今。很多企业开始采用虚拟化技术(严格的说,那时候还不能称作是云)来解决资源不足和设备异构的问题。简单的说,你可以接虚拟化技术是在异构的设备上构建了一个通用适配层。使得各种不同的应用程序和设备能够通过通用的操作进行统一的管理,那时候面临这样问题多是通信、银行、政府、石油等关键领域。即便 IBM,Oracle,EMC 微软等都有“整体解决方案”,但为了避免供应商绑定风险,政府还是希望能够“混搭”:通过做大蛋糕来降低风险。当然,这种做法也降低了效率。然而当虚拟化技术解决了异构问题之后,基础设施资源被抽象为网络、计算、存储三类资源。由于业务的异构性,企业级领域迟迟没有解决方案。毕竟为了让虚拟化的资源能够尽快产出价值,往虚拟资源的迁移工作相关的集成工作占据了工作主要内容。

于是运维工程师和网络工程师慢慢远离机房,和系统工程师以及数据库工程师坐在了一起,共同成为了“脚本工程师”。

此时,Linux 开始通过 Xen 和 KVM 侵蚀传统 UNIX 厂商的市场份额。SCO,AIX 和 HP-UX 这些过去按卖 License 获得售后服务的方式毕竟太贵了。可以说,借由 Linux 虚拟化技术的云计算技术给商业 UNIX 来了一记补刀,如今你很少能看到这些商业 UNIX 了。

虚拟化技术把所有的空闲资源收集到了一起,这些资源完全可以在不增加基础设施设备投入的情况下运行更多的应用程序。拟化技术还可以通过整合小型设备,得到和大型设备一样的表现。

但是,如果你通过虚拟化节约出来的空闲资源你使用不了,但是还要收取电费,这就是很大的浪费。于是有些人则想到了把这些空闲的资源租出去,变成一个单独的业务。这就是另外一个故事了,我们稍后会提到。
随着 VMware,Oracle,Cisco,IBM 推出了各自的解决方案,“脚本工程师”们开始考虑如何管理大量的空闲资源。随着敏捷软件开发逐渐成为主流,基础设施的变更效率显然满足不了敏捷的迭代速度。基础设施的变更带来的风险和周期远远大于应用。如何让基础设施敏捷起来,成为了敏捷软件开发在交付最后一公里需要迫切解决的问题。

这时候,由于规模和复杂度都很大,脚本工程师们考虑的第一个问题就是:如果规模没办法改变,我们就降低复杂度吧。

Puppet 的短暂辉煌

Puppet 是第一个嗅到这个商机的工具,它在第2010年8月的技术雷达上出现在了“试验”环里。

(2010年8月期技术雷达,blip29: Puppet, Trial)

Ruby 很适合构建领域特定语言(DSL),继 Cucumber 在这方面的成功探索后,脚本工程师们希望通过 DSL 缩小 Dev 和 Ops 之间的差距。作为同一时期的竞争者,Chef 以对开发人员更加友好的方式出现。Chef 相比 Puppet 更有竞争力的一点就是对于 Windows 的支持。

不过,由于缺乏最佳实践,Puppet 和 Chef 很快就被玩坏了,复杂性的治理难度超过预期。随着治理规模的扩大,Puppet 和 Chef 带来的负面效应逐渐显现。曾经有人这样讽刺 Puppet:

Puppet 就像蟑螂。当你刚开始用了 Puppet,慢慢的你会发现你的代码库里到处都是 Puppet。

此外,事实证明 Ruby 是一个便于开发,但是难于维护的语言。Ruby 及其社区的频繁发布和不兼容特性使得后期接手维护的脚本工程师们叫苦不迭,加之 Ruby 工程师的招聘成本和培训成本都更高。即便 Ruby 的 Puppet 和 Chef 工具学习曲线比较平缓,但遗留的基础设施即代码的学习曲线却非常陡峭。基础设施的变更风险很大,且缺乏必要的质量实践,特别是主从模式的中心化还带来了单点故障和复杂度,这些都使得基础设施代码越来越难以维护。

在敏捷团队中,去中心化、自治的团队往往是被提倡的。于是 Puppet 推出了 standalone 模式,Chef 出现了 chef-solo 这样去中心化的特性。技术雷达很快就出现了与之相对的Librarian-puppet and Librarian-Chef 和 Masterless Chef/Puppet这样去中心化的实践。

于是,大家把聚光灯从 Ruby 转向了 Python。从中心化转向了去中心化。然而,当“无状态服务器” 出现在2012 年 10月的技术雷达的“采纳”区域时,新的基础设施即代码管理思想也应运而生。

从菜谱(Cookbook)到剧本(Playbook)—— Ansible

在 Puppet 和 Chef 的最佳实践并没有创造出新的市场份额,而是给它们创造了一个新对手——Ansible。Ansible 在 2014 年 1 月首次出现在了技术雷达的 “试验” 区域,短短半年后就在 2014年 7月的技术雷达中出现在了 “采纳” 区域。

(更多详情可至ThoughtWorks官网查看)

Ansible 采用了 Python + Yaml 这种 Python 社区常见的组合。用 Yaml 作为 Playbook 的格式来存储虚拟机的配置。通过把虚拟机抽象成状态机,在 Playbook 中版本化保存状态的方式使得基础设施即代码的“状态”和“状态变更”的分离更加彻底,大大减少了代码量和编程量。甚至坊间有人笑称 Ansible 把运维工程师从脚本工程师变成了配置管理工程师,基础设施即代码变成了基础设施即配置。

面向云计算的基础设施即代码

基础设施即代码的技术最早不是为云计算设计的。但随着云计算的广泛应用,脚本工程师对于“看不见的机房”的管理就只剩下编程了。然而,面向于传统机房和 IaaS 的基础设施即代码技术在PaaS 盛行的今天却有点捉襟见肘,云平台自己的 CLI 工具是为管理员设计的,而不是为开发者设计的。此外,尽管 Puppet,Chef 和 Ansible 各自都增添了对云计算更友好的功能,但本质上是面向虚拟机而非云计算平台设计的。对云计算平台的操作仍然需要通过构建一个 Agent 的方式处理。

这些诉求推动了面向云平台的技术设施即代码工具的出现。最先为大众所熟知的就是 Terraform。

“Hashi 出品,必属精品”,HashiCrop 就像 DevOps 界的暴雪娱乐。在云计算和 DevOps 的领域里,HashiCrop的每一款产品都进入了技术雷达,并引领了接下来几年 DevOps 技术的发展。

在虚拟化技术刚刚成熟的时候,HashiCrop 就推出了 Vagrant。Vagrant 于 2011 年 1 月出现在技术雷达的 “评估” 区域,2012 年进入了 “试验” 区域。

(更多详情可至ThoughtWorks官网查看)

随之在技术雷达上就出现了对开发工作站的基础设施自动化的实践。随着 Packer 在 2014 年 6 月 进入技术雷达“采纳”区域的同时,镜像构建流水线也出现在了技术雷达上。

Vagrant 和 Packer 这样的组合深深影响了 Docker,这个我们后面再说。我们还是回过头来说说 Terraform。2015 年,Terraform 出现在了技术雷达的 “评估” 区域上。技术雷达是这么描述的:

使用 terraform, 可以通过编写声明性定义来管理云基础架构。由 terraform 实例化的服务器的配置通常留给 Puppet, Chef 或 Ansible 等工具。我们喜欢 terraform, 因为它的文件的语法可读性比较高, 它支持多个云提供商, 同时不试图在这些提供商之间提供人为的抽象。在这个阶段, terraform 是新的, 并不是所有的东西都得到了实施。我们还发现它的状态管理是脆弱的, 往往需要尴尬的体力工作来解决。

虽然 Terraform 有一些问题,但瑕不掩瑜。HashiCrop 改进了 Terraform。一年之后,在 2016 年 11 月的技术雷达中,Terraform 进入了 “试验” 区域。这些改进也被技术雷达敏锐的捕捉到:

在我们近两年前首次更谨慎地提到 terraform 之后, 它得到了持续的发展, 并已发展成为一个稳定的产品, 已经证明了它在我们项目中的价值。现在, 通过使用 terraform 所说的 “远程状态后端”, 可以回避状态文件管理的问题。

为了避免重蹈 Puppet 和 Chef 被玩坏的覆辙,Terraform 总结了最佳实践并发布了 Terraform: Up and Running 一书。随之推出了与之对应的工具Terragrunt,Terragrunt 于 2018 年 11 月出现在了技术雷达,它包含了之前介绍过的“基础设施流水线”的思想。

(2018年11月期技术雷达,blip72: Terragrunt, Assess)

基础设施即代码的自动化测试

可测试性和自动化测试永远是技术雷达不可缺少的话题,基础设施即代码也是一样。在提出基础设施的可测试性诉求后,Provisioning Testing应运而生,它的目的在于对服务器初始化正确性的验证,被纳入到了 2014 年 1 月技术雷达的 “试验” 区域。Puppet 和 Chef 分别有了 rspec-puppet 和 kitchen 作为各自的测试框架来支持这种实践。

但当基础设施即代码采用不止一种工具的时候,采用各自的测试套件就比较困难了。因此,寻找与基础设施即代码无关的测试工具就非常必要,毕竟 Chef,Puppet 和 Ansible 都只是一种实现方式,而不是结果。

采用 Ruby 编写的 Serverspec 出现在了 2016 年 11 月技术雷达的 “试验” 区域。半年后,采用 Python 写的Testinfra 也出现在了 2017 年 6 月技术雷达的 “试验” 区域。它们都可以通过工具无关的描述方式来验证基础设施的正确性。

有了自动化测试工具,我们就可以采用 TDD 的方式开发基础设施。先用代码来描述服务器的规格,然后通过本地或远程的方式进行验证。此外,这样的自动化测试可以被当做一种监控,集成在流水线中定时运行。

下面是基础设施即代码相关条目的发展历程一览图。实线为同一条目变动,虚线为相关不同条目变动:

相关条目:PuppetLibrarian-puppet and Librarian-ChefMasterless Chef/PuppetProvisioning TestingTestinfraServerspecTerraformTerragrunt

揭开云计算的大幕

咱们接着说“有人想把虚拟化后的空闲资源变成一个独立的业务”这件事。彼时,网格计算和云计算的口水战愈演愈烈,大家似乎没有看出来IDC(Internet Data Center)机房里托管虚拟机和云计算之间太多的差别,云计算听起来只是一个营销上的噱头。

2010 年第一期的技术雷达上,云计算就处在了 “采纳” 区域,技术雷达是这么描述云计算的:

Google Cloud Platform Amazon EC2 和 salesforce. com 都声称自己是云提供商, 但他们的每个产品都有所不同。云适用于服务产品的广泛分类, 分为基础架构即服务 (例如 Amazon EC2 和 Rackspace)、平台即服务 (如Google App Engine) 和软件即服务 (如 salesforce. com)。在某些情况下, 提供商可能跨越多个服务类别, 进一步稀释云作为标签。无论如何, 云中基础设施、平台和软件的价值是毋庸置疑的, 尽管许多产品在路上遇到了坎坷, 但他们肯定已经赢得了自己在雷达上的地位。

那时的 IaaS、PaaS 和 SaaS 都可以被称之为云计算,只不过每个供应商的能力不同。而它们的共同点都是通过 API 提供服务。

到了2010年4月的第二期技术雷达,技术雷达则把 SaaS 看作是云计算的最高级成熟度。而 IaaS 和 PaaS 是不同阶段的成熟度。并把原先的云计算拆分成了三个条目:EC2&S3 (来自 AWS),Google Cloud Platform,Azure。并且分别放在 “试验”、“评估”、“暂缓” 象限。也就是说,在 2010年,ThoughtWorks 一定会用 AWS,有些情况下会考虑 GCP,基本不会考虑使用 Azure。

而公有云计算供应商的三国演义就此展开。

AWS 一马当先

多年以来 AWS 上的服务一直引领者云计算的发展,成为众多云计算供应商的效仿对象,也成为了多数企业云计算供应商的首选。虽然 AWS 正式出现在技术雷达是在 2011 年 7 月,然而 EC2 & S3 的组合在第二期就出现在技术雷达的 “试验” 区域了。在 Docker 出现的第二年,AWS 就出现了托管的弹性容器服务 ECS (Elastic Container Service),也是第一家在云计算平台上集成 Docker 的供应商。为了解决大量不同品牌移动设备测试的问题推出了 AWS Device Farm,使得可以通过在线的方式模拟数千种移动设备。在微服务架构流行的年代,不光推出了第二代容器基础设施 AWS Fargate 和 7层负载均衡 Application LoadBalancer。更是先声夺人,率先提供了基于 Lambda 的函数即服务(Function As A Service)无服务器(Serverless)计算架构,使得开发和部署应用变得更加灵活、稳定和高效。

然而,随着成熟的云平台的选择增多。AWS 不再是默认的选择,在2018 年 11 月的技术雷达中, AWS 从 “采纳” 落到了第 “试验” 区域。但这并不是说明 AWS 不行了,而是其它的公有云供应商的技术能力在不断追赶中提升了。这就意味从 2018 年开始, AWS 并不一定是最佳选择。Google Cloud Platform 和 Azure 可能会根据场景不同,成为不同场景的首选。

(更多详情可至ThoughtWorks官网查看)

GCP 紧随其后

开发人员最不想面对的就是基础设施的细节。它们希望应用程序经过简单的配置可以直接在互联网上运行。而无需关注网络、操作系统、虚拟机等实现细节——这些细节对开发者应该是透明的。

Google App Engine 最早就以云计算的概念出现在技术雷达上的 “评估” 象限,存在了两期后便消失不见。在那个时代,人们对于无法控制基础设施细节的云计算平台还是心存怀疑。更重要的是,按照新的编程模型修改现有应用架构的成本远远大于基于 IaaS 平台的平行移动成本。前者需要重构整个应用,后者几乎可以无缝对接。

然而,新时代的容器技术和 SaaS 应用让 Google 笑到了最后。基于 Kubernetes 的容器编排技术几乎成为了行业标准。Google Cloud Platform 适时推出了自己的 Kubernetes 平台服务GKE – Google Kubernetes Engine,使得 Google Cloud Platform 重回技术雷达的视野,在 2017 年 11 月的技术雷达,Google Cloud Platform 进入了 “尝试” 象限。技术雷达是这么描述的:

随着GOOGLE CLOUD PLATFORM(GCP)在可用地理区域和服务成熟度方面的扩展,全球的客户在规划云技术策略时可以认真考虑这个平台了。与其主要竞争对手Amazon Web Services相比,在某些领域, GCP 所具备的功能已经能与之相媲美。而在其他领域又不失特色——尤其是在可访问的机器学习平台、数据工程工具和可行的 “Kubernetes 即服务解决方案”(GKE)这些方面。在实践中,我们的团队对GCP工具和API良好的开发者体验也赞赏有嘉。

即便 AWS 也推出了对应的 Kubernetes 服务 EKS (Amazon Elastic Container Service for Kubernetes,别问我为什么不是 ECSK,官方网站上就这么写的),但也无法撼动其领先位置。随着更多的企业已经接受容器化技术,并通过 Kubernetes 在私有云中进行编排以实现 DevOps。通过 GKE 实现云迁移成本降低了很多。

Azure 后来居上

Azure 在 2010 年的第二期技术雷达被放到了”暂缓”区域。意思就是在考虑云计算平台的时候,就不要考虑用 Azure 了。尽管如此,Azure并没有因为被边缘化就逡巡不前。经过了 7 年, Azure 伴随着一系列激动人心的新产品重回人们的视野。然而,从 2017 年底开始,Azure 的服务开始进入技术雷达的 “评估” 区域。首先进入技术雷达的是 Azure Service Fabric:

AZURE SERVICE FABRIC是为微服务和容器打造的分布式系统平台。它不仅可以与诸如Kubernetes之类的容器编排工具相媲美,还可以支持老式的服务。它的使用方式花样繁多,既可以支持用指定编程语言编写的简单服务,也可以支持 Docker 容器,还可以支持基于 SDK 开发的各种服务。自几年之前发布以来,它不断增加更多功能,包括提供对Linux 容器的支持。尽管 Kubernetes 已成为容器编排工具的主角,但 Service Fabric 可以作为 .NET 应用程序的首选。

而后到了 2018 年,Azure 的后发优势不断在技术雷达中涌现出来,除了 Azure 进入了 “试验” 以外,就是 Azure Stack 和 Azure DevOps 两个产品了。技术雷达在 2018 年 5月是这么描述 Azure Stack 的:

通过 AZURE STACK,微软在全功能的公有云和简单的本地虚拟化之间提供了一个有意思的产品:一个运行Microsoft Azure Global云的精简版本软件。该软件可以安装在诸如惠普和联想这样的预配置通用商品硬件上,从而让企业在本地获得核心的 Azure 体验。默认情况下,Microsoft 和硬件供应商所提供的技术支持是彼此分离的(他们承诺要相互合作),但系统集成商也能提供完整的 Azure Stack 解决方案。

在我看来,Azure Stack 就是云时代的 Windows。相较于以前硬件厂商受制于 Windows 的各种设备而言,未来的虚拟设备厂商也会受制于 Azure Stack。这时候 Azure Stack 不单单是一套私有云了,它更是未来硬件厂商的渠道。虽然在私有云领域中有很多的选择,但在使用体验上,微软的产品正在超过其它竞争者。

另外一个强烈推荐的服务就是 Azure DevOps。DevOps 运动发展以来,不断有公司在开发 DevOps 平台这样的产品,希望能够通过产品巩固自己在 DevOps 领域的话语权。也有很多做 DevOps 的企业通过集成不同的工具来构建自己的 DevOps 平台。目的是将计算资源和开发流程采用工具整合起来,形成一套由工具构建的工作流程和制度。并采用逆康威定律——用系统结构反向改变组织结构,从而达到 DevOps 技术和管理的双转型。

但很少有产品能够跨越足够长的流程来做到管理,这也导致了 DevOps 平台由于范围的限制引起的不充分的转型。而Azure DevOps 提供了完整的产品端到端解决方案,Azure DevOps 的前身是微软 VSTS,也有基于企业的 TFS 产品可供选择。它涵盖了产品管理,任务看板,持续交付流水线等服务,这些服务也同时可以和 Azure 其它服务有机结合。并且可以和 Visual Studio 完美集成。真正解决从需求编写到上线发布中间每一个活动的管理。你还可以构建仪表盘,用各个活动中的数据来自动化度量 DevOps 的效果。

私有云——从 IaaS,PaaS 到 CaaS

公有云和私有云似乎是在两个世界。很久以来,私有云算不算”云”也存在争议。甚至有人把私有云称之为”企业虚拟化 2.0″。但直到多个公有云上的实践和工具同时能够兼容企业的私有虚拟化平台,私有云的概念才真正建立起来。这就是为什么私有云在技术雷达上出现的时间要比 OpenStack 这样的虚拟化工具更晚。OpenStack 在 2010 年第二期技术雷达就出现了,而私有云要到 2 年后,也就是 2012 年,才出现在技术雷达上。

OpenStack是由NASA(美国国家航空航天局)和Rackspace合作研发并发起的,以Apache许可证授权的自由软件和开放源代码项目。OpenStack是一个开源的云计算管理平台项目,由几个主要的组件组合起来完成具体工作。OpenStack支持几乎所有类型的云环境,项目目标是提供实施简单、可大规模扩展、丰富、标准统一的云计算管理平台。OpenStack通过各种互补的服务提供了基础设施即服务(IaaS)的解决方案,每个服务提供API以进行集成。

虽然 OpenStack 出现在技术雷达上比较早,但直到2013年5月,也就是 3年后,才进入到 “试验” 区域。即便有很多企业用于生产环境,技术雷达的编写者仍然很慎重的选择这样的开源产品。毕竟,可能造成的影响越大,就越要小心。

在众多大型厂商的私有云和虚拟化平台中,OpenStack 因为其开源的免费,并且有 NASA 和 Rackspace 做背书。成为了很多企业构建私有云的首选。然而,构建一套基于 OpenStack 的 IaaS 基础设施到真正能够帮助开发人员提升效率是需要花费很大成本的。随着 OpenStack 的影响力不断扩大,用户需要的技术支持服务也慢慢成为了一个新兴的市场。甚至于有企业将基于 OpenStack 开发自己的私有云产品以提供对外服务。

然而,彼时的 OpenStack 在开发者体验上并没有什么优势。不过由于 OpenStack 是基于 Python 开发的,OpenStack 的流行可以说是促进了 Python 的大规模推广。( Python 的第二次大规模推广是大数据和人工智能,如果想问的话。)这使得一批基于 DevOps 理念的 PaaS 平台崛起,最先为人所知首当其冲的就是 Pivotal 的 CloudFoundry。由于 Pivotal 是一个商业组织,他更关心客户的痛点,为此构建了很多解决方案。甚至将 CloudFoundry 自身部署在 OpenStack 上,使得 OpenStack 看起来不是那么的难用。

自2012年我们上次提及 CloudFoundry 以来, PaaS 空间发生了许多变化。虽然开源核心有各种分布, 但作为 Pivotal Cloud Foundry公司组装的产品和生态系统给我们留下了深刻的印象。虽然我们期望非结构化方法 (Docker、Mesos、Kubernetes 等) 与 Cloud Foundry 和其他公司提供的结构更结构化、更固执己见的构建包样式之间继续保持趋同, 但我们认为, 对于愿意这样做的组织来说, 我们看到了真正的好处。接受采用 PaaS 的约束和演化速度。特别令人感兴趣的是开发团队和平台操作之间交互的简化和标准化所带来的开发速度。

不过,正在 IaaS 和 PaaS 正在讨论谁更适合做 SaaS 平台的时候。Docker 的出现成为了云计算市场和 DevOps 领域的另一个标志性事件。使得无论是公有云产品还是私有云产品,IaaS 产品还是 PaaS 产品。都不约而同的开始了对 Docker 的支持。并且有人认为 Docker 会是云计算的下一个里程碑和战场。正如上文介绍的那样,AWS 推出了 ECS,Google 推出了 GKS,Azure 也推出了自己的容器服务。同时也有不少的创业公司提出了 “容器即服务”(Container as a Service)的概念,企图从云计算市场上分得一杯羹。关于 Docker 和容器平台,我们会放在下一篇详细讲。

混合云(HybirdCloud)

和私有云同时出现在了 2012 年 4 月的技术雷达上,但是是在 “评估” 区域。彼时,混合云只是为了在资源不足时对私有云进行临时扩展:

混合云描述了一组结合公共云和私有数据中心的最佳功能的模式。它们允许应用程序在正常时段在私有数据中心运行, 然后在公有云中使用租用的空间, 以便在交通高峰期实现溢出容量。以敏捷的方式组合公共云和私有云的另一种方法是使用公共云的弹性和可塑性来开发和了解应用程序的生产特征, 然后将其移动到私有数据中的永久基础结构中中心时, 它是稳定的。

在体会了公有云”真香”之后,大多数企业都回不去了。然而,种种限制还是阻碍了企业从私有云向公有云迁移的进度。不过,这种情况下促生了混合云的生意。不光公有云供应商提供了自己的服务,很多创业公司也加入进来。于是技术雷达在半年后更新了混合云:

混合云结合了公有云和私有数据中心的最佳功能。它们允许应用程序在正常时段在私人数据中心运行, 然后在公共云中使用租用的空间, 以便在交通高峰期实现溢出容量。现在有许多基础架构解决方案允许跨混合云 (如 Palette 和 Rrightscale) 进行自动和一致的部署。借助来自亚马逊、Rackspace 和其他公司的强大产品, 我们正在将混合云转移到此版本的雷达上的 ““尝试”” 区域。

从另外一个角度说,公有云的技术发展速度和成本是远高于私有云的。这也是集中化投资的优势,减少研发和协调上的浪费。当企业开始结合公有云和私有云之后,就会慢慢发现公有云带来的成本和技术优势。私有云和数据中心就会被公有云逐渐取代。

多云(PolyCloud)共用时代

多云不同于混合云,混合云指的是私有云和公有云之间的混合使用。多云指的是不同的公有云供应商之间的混合使用。在三大公有云供应商共同相聚在 2018 年 11 月的 “试验” 之前。多云的趋势就在 1 年之前进入了技术雷达的 “评估” 区域:

主要的云提供商 (亚马逊、微软和谷歌) 陷入了一场激烈的竞争, 以保持核心功能的平价, 而他们的产品只受到轻微的区分。这导致少数组织采用 Polycloud 战略, 而不是与一个提供商 “All-in”, 而是以最佳的方式将不同类型的工作负载传递给不同的提供商。例如, 这可能涉及将标准服务放在 AWS 上, 但使用 Google 进行机器学习, 将 Azure 用于使用 SQLServer 的. net 应用程序, 或者可能使用 Ethereum 联盟区块链解决方案。这不同于以供应商之间的可移植性为目标的云无关策略, 这种策略成本高昂, 并迫使人们采取最小公约数思维。而多云则专注于使用每个云提供的最佳产品。

然而,短短半年,多云就进入了 “试验” 区域。与其说技术雷达推荐,倒不如说是两方面大势所趋:一方面,企业在采用混合云之后会想要跟多的云服务。另一方面,公有云供应商之间的产品同质性迫使它们要发挥自己的特色。此外,如果其中一个云供应商出了问题,我们还有其它的供应商可用。这就引发了一个新问题:企业不想自己被供应商绑定。于是就有了 “泛化云用法”(Generic cloud usage,我自己的翻译)这样不推荐的实践。它和多云一起出现在了 2017年的技术雷达和 “暂缓” 区域:

主要云提供商继续以快速的速度向其云添加新功能, 在 Polycloud 的旗帜下, 我们建议并行使用多个云, 以便根据每个提供商的产品优势混合和匹配服务。我们越来越多地看到组织准备使用多个云–不过, 不是从个别供应商的优势中获益, 而是不惜一切代价避免供应商 “锁定”。当然, 这导致了泛化云用法, 只使用所有提供商都有的功能, 这让我们想起了10年前我们看到的最低公分母场景, 当时公司努力避免了关系数据库中的许多高级功能以保持供应商中立。锁定的问题是真实存在的。但是, 我们建议不要使用大锤方法来处理此问题, 而是从退出成本的角度看待此问题, 并将这些问题与使用特定于云的功能的好处相关联。

然而,这种警告确实在早期很难引起注意。因为大规模的”通用云用法“导致的不良后果不会来的那么快。

主要的云提供商在定价和发布新功能的快速速度方面的竞争力越来越强。这使得消费者在选择并承诺向提供者承诺时处于困难境地。越来越多的人看到, 我们看到组织准备使用 “任何云”, 并不惜一切代价避免供应商锁定。当然, 这会导致泛化云用法。我们看到组织将其对云的使用限制在所有云提供商中共有的功能, 从而忽略了提供商的独特优势。我们看到组织对自制的抽象层进行了大量投资, 这些抽象层过于复杂, 无法构建, 维护成本也太高, 无法保持云不可知论。锁定的问题是真实存在的。我们建议使用多云策略来解决此问题, 该策略根据使用特定于云的功能的好处, 评估从一个云到另一个云的迁移成本和功能的工作量。我们建议通过将应用程序作为广泛采用的 Docker 容器运输来提高工作负载的可移植性: 使用开源安全和身份协议轻松迁移工作负载的标识, 这是一种与风险相称的供应商策略, 以只有在必要的时候才能保持云的独立性, Polycloud 才能在有意义的情况下混合和匹配来自不同提供商的服务。简而言之, 请将您的方法从通用云使用转向明智的多云战略。

下面是云计算相关条目的发展历程一览图。实线为同一条目变动,虚线为相关不同条目变动:

当大规模的基础设施能够通过开发的方式管理起来以后。似乎运维工程师也变成了一类开发者——基础设施开发者。而和一般应用程序开发者的区别就是面向的领域和使用的工具不同。而基础设施即代码技术和云计算的结合使用可以大大降低基础设施的复杂度。于是我们就可以驾驭更加复杂的应用程序了,特别是微服务。请期待下一篇:从技术雷达看DevOps十年——容器和微服务。

相关条目:AWS ECSAWS Device FarmAWS LambdaAWS ECSAWS FargateAWS Application LoadbalancerGoogle App EngineGoogle Cloud PlatformGKEAzureAzure Service FabricAzure StackAzure DevOpsPrivate CloudsHybird CloudsPolyCloudGeneric Cloud Usage

from:https://insights.thoughtworks.cn/infrastructure-as-code-and-cloud-computing/

What’s Up With Floating Point?

I presume most people reading this have used floating-point numbers at some point, often not intentionally.

I’m also fairly sure a good number who have encountered them did so after trying to discover why the result of a simple computation is incorrect. E.G.

0.1 + 0.2
// result: 0.30000000000000004
// Which I'm fairly sure
// should be 0.3 ...

The problem is that without understanding what a floating-point number is, you will often find yourself frustrated when the result you expected is ever-so-slightly different.

My goal here is to clarify what a floating-point number is, why we use them, and how they work.

Why do we even need floating point 🤔?

It’s not a bold statement to say that computers need to store numbers. We store these numbers in binary on our phones, laptops, fridges etc.

I hope that most people reading this are familiar with binary numbers in some form; if not, consider reading this blog post by Linda Vivah.

But what about decimal? Fractions, π, real numbers?

For any useful computation, we need computers to be able to represent the following:

  • Veryveryveryvery SMALL numbers,
  • Veryveryveryvery BIG numbers,
  • Everything in-between!

Starting with the veryveryveryvery small numbers to take us in the right direction; how do we store them?

Well, thats simple. We store those using the equivalent to decimal representation in binary…

Binary fractions

An example should help here. Let’s choose a random binary fraction: 101011.101

 

This is very similar to how decimal numbers work. The only difference is that we have base 2 instead of base 10. If you chuck the above in a binary converter of your choice, you’ll find that it is correct!

So how might we store these binary fractions?

Let’s say we allocate one byte (8 bits) to store our binary fraction: 00000000. We then must choose a place to put our binary separator so that we can have the fractional part of our binary number.

Let’s try smack-bang in the middle!

0000.0000

What’s the biggest number we can represent with this?

1111.1111 = 15.9375

That’s… not very impressive. Neither is the smallest number we can represent, 0.00625.

There is a lot of wasted storage space here alongside a poor range of possible numbers. The issue is that choosing any place to put our point would leave us with either fractional precision or a larger integer range — not both.

If we could just move the fractional point around as we needed, we could get so much more out of our limited storage. If only the point could float around as we needed it, a floating point if you will…

(I’m sorry, I had to 😅.)

So what is floating point?

Floating point is exactly that, a floating (fractional) point number that gives us the ability to change the relative size of our number.

So how do we mathematically represent a number in such a way that we,

  1. store the significant digits of the number we want to represent (E.G. the 12 in 0.00000012);
  2. know where to put the fractional point in relation to the significant digits (E.G. all the 0’s in 0.00000012)?

To do this, let’s time travel (for some) back to secondary school…

Standard Form (Scientific Notation)

Anyone remember mathsisfun? I somehow feel old right now but either way, this is from their website:

 

You can do the exact same thing with binary! Instead of 7*10^2 = 700, we can write

1010111100 * 2^0 = 700= 10101111 * 2^2 = 700

Which is equivalent to 175 * 4 = 700. This is a great way to represent a number based on the significant digits and the placement of the fractional point relative to said digits.

That’s it! Floating point is a binary standard-form representation of numbers.

If we want to formalise the representation a little, we need to account for positive and negative numbers. To do this, we also add a sign to our number by multiplying by \pm 1:

(\text{sign}) * (\text{significant digits}) * (\text{base})^{(\text{some power})}

And back to the example given by mathsisfun…

700 = (1) * (7) * (10)^{2}= (1) * (10101111) * (2)^{2}

If you are reading other literature, you’ll find the representation will look something like…

(-1)^s * c * b^{e}

Where s is the sign bit, c is the significand/mantissa, b is the base, and e is the exponent.

So why am I getting weird errors 😡?

So, we know what floating point is now. But why can’t I do something as simple as add two numbers together without the risk of error?

Well the problem lies partially with the computer, but mostly with mathematics itself.

Recurring Fractions

Recurring fractions are an interesting problem in number representation systems.

Let’s choose any fraction \frac{x}{y}. If y has a prime factor that isn’t also a factor of the base, it will be a recurring fraction.

This is why numbers like 1/21 can’t be represented in a finite number of digits; 21 has 7 and 3 as prime factors, neither of which are a factor of 10.

Let’s work through an example in decimal.

Decimal

Say you want to add the numbers 1/3 and 2/3. We all know the answer is 42 1, but if we are working in decimal, this isn’t as obvious.

This is because 1/3 = 0.333333333333…

It isn’t a number that can be represented as a finite number of digits in decimal. As we can’t store infinite digits, we store an approximation accurate to 10 places.

The calculation becomes…

0.3333333333 + 0.6666666666 = 0.999999999

Which is definitely not 1. It’s real close, but it’s not 1.

The finite nature in which we can store numbers doesn’t mesh well with the inevitable fact that we can’t easily represent all numbers in a finite number of digits.

Binary

This exact same problem occurs in binary, except it’s even worse! The reason for this is that 2 has one less prime factor than 10, namely 5.

Because of this, recurring fractions happen more commonly in base 2.

An example of this is 0.1:

In decimal, that’s easy. In binary however… 0.00011001100110011..., it’s another recurring fraction!

So trying to perform 0.1 + 0.2 becomes

  0.0001100110011
+ 0.0011001100110
= 0.0100110011001
= 0.299926758

Now before we got something similar to 0.30000000004, this is because of things like rounding modes which I won’t go into here (but will do so in a future post). The same principle is causing this issue.

This number of errors introduced by this fact are lessened by introducing rounding.

Precision

The other main issue comes in the form of precision.

We only have a certain number of bits dedicated to the significant digits of our floating point number.

Decimal

As an example, consider we have three decimal values to store our significant digits.

If we compare 333 and 1/3 * 10^3, we would find that in our system they are the exact same.

This is because we only have three values of precision to store the significant digits of our number, and that involves truncating the end off of our recurring fraction.

In an extreme example, adding 1 to 1 * 10^3 will result in 1 * 10^3, the number hasn’t changed. This is because you need four significant digits to represent 1001.

Binary

This exact same issue occurs in binary with veryveryveryvery small and veryveryveryvery big numbers. In a future post I will be talking more about the limits of floating point.

For completeness, consider the previous example in binary, where we now have 3 bits to represent our significant digits.

By using base 2 instead, 1 * 2^3, adding 1 to our number will result in no change. This is because to represent 1001 (now equivalent to 9 in decimal) requires 4 bits, the less significant binary digits are lost in translation.

There is no solution here, the limits of precision are defined by the amount we can store. To get around this, use a larger floating point data type.

  • E.G. move from a single-precision floating-point number to a double-precision floating-point number.

Summary

TLDR

To bring it all together, floating-point numbers are a representation of binary values akin to standard-form or scientific notation.

It allows us to store BIG and small numbers with precision.

To do this we move the fractional point relative to the significant digits of the number to make that number bigger or smaller at a rate proportional to the base being used.

Most of the errors associated with floating point come in the form of representing recurring fractions in a finite number of bits. Rounding modes help to reduce these errors.

Thanks 🥰!

Thank you so much for reading! I hope this was helpful to those that needed a little refresher on floating point and also to those that are new to this concept.

I will be making one or two updates to this post explaining the in-depths of the IEEE 754-2008 Floating Point Standard, so if you have questions like:

  • “What are the biggest and smallest numbers I can use in floating point?”
  • “What do financial institutions do about these errors?
  • “Can we use other bases?”
  • “How do we actually perform floating-point arithmetic?”

Then feel free to follow to see an update! You can also follow me on twitter @tim_cb_roderick for updates. If you have any questions please feel free to leave a comment below.

from:https://timroderick.com/floating-point-introduction/

A Detailed Look at RFC 8446 (a.k.a. TLS 1.3)

For the last five years, the Internet Engineering Task Force (IETF), the standards body that defines internet protocols, has been working on standardizing the latest version of one of its most important security protocols: Transport Layer Security (TLS). TLS is used to secure the web (and much more!), providing encryption and ensuring the authenticity of every HTTPS website and API. The latest version of TLS, TLS 1.3 (RFC 8446) was published today. It is the first major overhaul of the protocol, bringing significant security and performance improvements. This article provides a deep dive into the changes introduced in TLS 1.3 and its impact on the future of internet security.

An evolution

One major way Cloudflare provides security is by supporting HTTPS for websites and web services such as APIs. With HTTPS (the “S” stands for secure) the communication between your browser and the server travels over an encrypted and authenticated channel. Serving your content over HTTPS instead of HTTP provides confidence to the visitor that the content they see is presented by the legitimate content owner and that the communication is safe from eavesdropping. This is a big deal in a world where online privacy is more important than ever.

The machinery under the hood that makes HTTPS secure is a protocol called TLS. It has its roots in a protocol called Secure Sockets Layer (SSL) developed in the mid-nineties at Netscape. By the end of the 1990s, Netscape handed SSL over to the IETF, who renamed it TLS and have been the stewards of the protocol ever since. Many people still refer to web encryption as SSL, even though the vast majority of services have switched over to supporting TLS only. The term SSL continues to have popular appeal and Cloudflare has kept the term alive through product names like Keyless SSL and Universal SSL.

Timeline

In the IETF, protocols are called RFCs. TLS 1.0 was RFC 2246, TLS 1.1 was RFC 4346, and TLS 1.2 was RFC 5246. Today, TLS 1.3 was published as RFC 8446. RFCs are generally published in order, keeping 46 as part of the RFC number is a nice touch.

TLS 1.2 wears parachute pants and shoulder pads

MC Hammer
MC Hammer, like SSL, was popular in the 90s

Over the last few years, TLS has seen its fair share of problems. First of all, there have been problems with the code that implements TLS, including HeartbleedBERserkgoto fail;, and more. These issues are not fundamental to the protocol and mostly resulted from a lack of testing. Tools like TLS Attacker and Project Wycheproof have helped improve the robustness of TLS implementation, but the more challenging problems faced by TLS have had to do with the protocol itself.

TLS was designed by engineers using tools from mathematicians. Many of the early design decisions from the days of SSL were made using heuristics and an incomplete understanding of how to design robust security protocols. That said, this isn’t the fault of the protocol designers (Paul Kocher, Phil Karlton, Alan Freier, Tim Dierks, Christopher Allen and others), as the entire industry was still learning how to do this properly. When TLS was designed, formal papers on the design of secure authentication protocols like Hugo Krawczyk’s landmark SIGMA paper were still years away. TLS was 90s crypto: It meant well and seemed cool at the time, but the modern cryptographer’s design palette has moved on.

Many of the design flaws were discovered using formal verification. Academics attempted to prove certain security properties of TLS, but instead found counter-examples that were turned into real vulnerabilities. These weaknesses range from the purely theoretical (SLOTH and CurveSwap), to feasible for highly resourced attackers (WeakDHLogJamFREAKSWEET32), to practical and dangerous (POODLEROBOT).

TLS 1.2 is slow

Encryption has always been important online, but historically it was only used for things like logging in or sending credit card information, leaving most other data exposed. There has been a major trend in the last few years towards using HTTPS for all traffic on the Internet. This has the positive effect of protecting more of what we do online from eavesdroppers and injection attacks, but has the downside that new connections get a bit slower.

For a browser and web server to agree on a key, they need to exchange cryptographic data. The exchange, called the “handshake” in TLS, has remained largely unchanged since TLS was standardized in 1999. The handshake requires two additional round-trips between the browser and the server before encrypted data can be sent (or one when resuming a previous connection). The additional cost of the TLS handshake for HTTPS results in a noticeable hit to latency compared to an HTTP alone. This additional delay can negatively impact performance-focused applications.

Defining TLS 1.3

Unsatisfied with the outdated design of TLS 1.2 and two-round-trip overhead, the IETF set about defining a new version of TLS. In August 2013, Eric Rescorla laid out a wishlist of features for the new protocol:
https://www.ietf.org/proceedings/87/slides/slides-87-tls-5.pdf

After some debate, it was decided that this new version of TLS was to be called TLS 1.3. The main issues that drove the design of TLS 1.3 were mostly the same as those presented five years ago:

  • reducing handshake latency
  • encrypting more of the handshake
  • improving resiliency to cross-protocol attacks
  • removing legacy features

The specification was shaped by volunteers through an open design process, and after four years of diligent work and vigorous debate, TLS 1.3 is now in its final form: RFC 8446. As adoption increases, the new protocol will make the internet both faster and more secure.

In this blog post I will focus on the two main advantages TLS 1.3 has over previous versions: security and performance.

Trimming the hedges

hedge
Creative Commons Attribution-Share Alike 3.0

In the last two decades, we as a society have learned a lot about how to write secure cryptographic protocols. The parade of cleverly-named attacks from POODLE to Lucky13 to SLOTH to LogJam showed that even TLS 1.2 contains antiquated ideas from the early days of cryptographic design. One of the design goals of TLS 1.3 was to correct previous mistakes by removing potentially dangerous design elements.

Fixing key exchange

TLS is a so-called “hybrid” cryptosystem. This means it uses both symmetric key cryptography (encryption and decryption keys are the same) and public key cryptography (encryption and decryption keys are different). Hybrid schemes are the predominant form of encryption used on the Internet and are used in SSHIPsecSignalWireGuard and other protocols. In hybrid cryptosystems, public key cryptography is used to establish a shared secret between both parties, and the shared secret is used to create symmetric keys that can be used to encrypt the data exchanged.

As a general rule, public key crypto is slow and expensive (microseconds to milliseconds per operation) and symmetric key crypto is fast and cheap (nanoseconds per operation). Hybrid encryption schemes let you send a lot of encrypted data with very little overhead by only doing the expensive part once. Much of the work in TLS 1.3 has been about improving the part of the handshake, where public keys are used to establish symmetric keys.

RSA key exchange

The public key portion of TLS is about establishing a shared secret. There are two main ways of doing this with public key cryptography. The simpler way is with public-key encryption: one party encrypts the shared secret with the other party’s public key and sends it along. The other party then uses its private key to decrypt the shared secret and … voila! They both share the same secret. This technique was discovered in 1977 by Rivest, Shamir and Adelman and is called RSA key exchange. In TLS’s RSA key exchange, the shared secret is decided by the client, who then encrypts it to the server’s public key (extracted from the certificate) and sends it to the server.

image4

The other form of key exchange available in TLS is based on another form of public-key cryptography, invented by Diffie and Hellman in 1976, so-called Diffie-Hellman key agreement. In Diffie-Hellman, the client and server both start by creating a public-private key pair. They then send the public portion of their key share to the other party. When each party receives the public key share of the other, they combine it with their own private key and end up with the same value: the pre-main secret. The server then uses a digital signature to ensure the exchange hasn’t been tampered with. This key exchange is called “ephemeral” if the client and server both choose a new key pair for every exchange.

image3

Both modes result in the client and server having a shared secret, but RSA mode has a serious downside: it’s not forward secret. That means that if someone records the encrypted conversation and then gets ahold of the RSA private key of the server, they can decrypt the conversation. This even applies if the conversation was recorded and the key is obtained some time well into the future. In a world where national governments are recording encrypted conversations and using exploits like Heartbleed to steal private keys, this is a realistic threat.

RSA key exchange has been problematic for some time, and not just because it’s not forward-secret. It’s also notoriously difficult to do correctly. In 1998, Daniel Bleichenbacher discovered a vulnerability in the way RSA encryption was done in SSL and created what’s called the “million-message attack,” which allows an attacker to perform an RSA private key operation with a server’s private key by sending a million or so well-crafted messages and looking for differences in the error codes returned. The attack has been refined over the years and in some cases only requires thousands of messages, making it feasible to do from a laptop. It was recently discovered that major websites (including facebook.com) were also vulnerable to a variant of Bleichenbacher’s attack called the ROBOT attack as recently as 2017.

To reduce the risks caused by non-forward secret connections and million-message attacks, RSA encryption was removed from TLS 1.3, leaving ephemeral Diffie-Hellman as the only key exchange mechanism. Removing RSA key exchange brings other advantages, as we will discuss in the performance section below.

Diffie-Hellman named groups

When it comes to cryptography, giving too many options leads to the wrong option being chosen. This principle is most evident when it comes to choosing Diffie-Hellman parameters. In previous versions of TLS, the choice of the Diffie-Hellman parameters was up to the participants. This resulted in some implementations choosing incorrectly, resulting in vulnerable implementations being deployed. TLS 1.3 takes this choice away.

Diffie-Hellman is a powerful tool, but not all Diffie-Hellman parameters are “safe” to use. The security of Diffie-Hellman depends on the difficulty of a specific mathematical problem called the discrete logarithm problem. If you can solve the discrete logarithm problem for a set of parameters, you can extract the private key and break the security of the protocol. Generally speaking, the bigger the numbers used, the harder it is to solve the discrete logarithm problem. So if you choose small DH parameters, you’re in trouble.

The LogJam and WeakDH attacks of 2015 showed that many TLS servers could be tricked into using small numbers for Diffie-Hellman, allowing an attacker to break the security of the protocol and decrypt conversations.

Diffie-Hellman also requires the parameters to have certain other mathematical properties. In 2016, Antonio Sanso found an issue in OpenSSL where parameters were chosen that lacked the right mathematical properties, resulting in another vulnerability.

TLS 1.3 takes the opinionated route, restricting the Diffie-Hellman parameters to ones that are known to be secure. However, it still leaves several options; permitting only one option makes it difficult to update TLS in case these parameters are found to be insecure some time in the future.

Fixing ciphers

The other half of a hybrid crypto scheme is the actual encryption of data. This is done by combining an authentication code and a symmetric cipher for which each party knows the key. As I’ll describe, there are many ways to encrypt data, most of which are wrong.

CBC mode ciphers

In the last section we described TLS as a hybrid encryption scheme, with a public key part and a symmetric key part. The public key part is not the only one that has caused trouble over the years. The symmetric key portion has also had its fair share of issues. In any secure communication scheme, you need both encryption (to keep things private) and integrity (to make sure people don’t modify, add, or delete pieces of the conversation). Symmetric key encryption is used to provide both encryption and integrity, but in TLS 1.2 and earlier, these two pieces were combined in the wrong way, leading to security vulnerabilities.

An algorithm that performs symmetric encryption and decryption is called a symmetric cipher. Symmetric ciphers usually come in two main forms: block ciphers and stream ciphers.

A stream cipher takes a fixed-size key and uses it to create a stream of pseudo-random data of arbitrary length, called a key stream. To encrypt with a stream cipher, you take your message and combine it with the key stream by XORing each bit of the key stream with the corresponding bit of your message.. To decrypt, you take the encrypted message and XOR it with the key stream. Examples of pure stream ciphers are RC4 and ChaCha20. Stream ciphers are popular because they’re simple to implement and fast in software.

A block cipher is different than a stream cipher because it only encrypts fixed-sized messages. If you want to encrypt a message that is shorter or longer than the block size, you have to do a bit of work. For shorter messages, you have to add some extra data to the end of the message. For longer messages, you can either split your message up into blocks the cipher can encrypt and then use a block cipher mode to combine the pieces together somehow. Alternatively, you can turn your block cipher into a stream cipher by encrypting a sequence of counters with a block cipher and using that as the stream. This is called “counter mode”. One popular way of encrypting arbitrary length data with a block cipher is a mode called cipher block chaining (CBC).

encryption
decryption

In order to prevent people from tampering with data, encryption is not enough. Data also needs to be integrity-protected. For CBC-mode ciphers, this is done using something called a message-authentication code (MAC), which is like a fancy checksum with a key. Cryptographically strong MACs have the property that finding a MAC value that matches an input is practically impossible unless you know the secret key. There are two ways to combine MACs and CBC-mode ciphers. Either you encrypt first and then MAC the ciphertext, or you MAC the plaintext first and then encrypt the whole thing. In TLS, they chose the latter, MAC-then-Encrypt, which turned out to be the wrong choice.

You can blame this choice for BEAST, as well as a slew of padding oracle vulnerabilities such as Lucky 13 and Lucky Microseconds. Read my previous post on the subject for a comprehensive explanation of these flaws. The interaction between CBC mode and padding was also the cause of the widely publicized POODLE vulnerability in SSLv3 and some implementations of TLS.

RC4 is a classic stream cipher designed by Ron Rivest (the “R” of RSA) that was broadly supported since the early days of TLS. In 2013, it was found to have measurable biases that could be leveraged to allow attackers to decrypt messages.

image2
AEAD Mode

In TLS 1.3, all the troublesome ciphers and cipher modes have been removed. You can no longer use CBC-mode ciphers or insecure stream ciphers such as RC4. The only type of symmetric crypto allowed in TLS 1.3 is a new construction called AEAD (authenticated encryption with additional data), which combines encryption and integrity into one seamless operation.

Fixing digital signatures

Another important part of TLS is authentication. In every connection, the server authenticates itself to the client using a digital certificate, which has a public key. In RSA-encryption mode, the server proves its ownership of the private key by decrypting the pre-main secret and computing a MAC over the transcript of the conversation. In Diffie-Hellman mode, the server proves ownership of the private key using a digital signature. If you’ve been following this blog post so far, it should be easy to guess that this was done incorrectly too.

PKCS#1v1.5

Daniel Bleichenbacher has made a living identifying problems with RSA in TLS. In 2006, he devised a pen-and-paper attack against RSA signatures as used in TLS. It was later discovered that major TLS implemenations including those of NSS and OpenSSL were vulnerable to this attack. This issue again had to do with how difficult it is to implement padding correctly, in this case, the PKCS#1 v1.5 padding used in RSA signatures. In TLS 1.3, PKCS#1 v1.5 is removed in favor of the newer design RSA-PSS.

Signing the entire transcript

We described earlier how the server uses a digital signature to prove that the key exchange hasn’t been tampered with. In TLS 1.2 and earlier, the server’s signature only covers part of the handshake. The other parts of the handshake, specifically the parts that are used to negotiate which symmetric cipher to use, are not signed by the private key. Instead, a symmetric MAC is used to ensure that the handshake was not tampered with. This oversight resulted in a number of high-profile vulnerabilities (FREAK, LogJam, etc.). In TLS 1.3 these are prevented because the server signs the entire handshake transcript.

tls12

The FREAK, LogJam and CurveSwap attacks took advantage of two things:

  1. the fact that intentionally weak ciphers from the 1990s (called export ciphers) were still supported in many browsers and servers, and
  2. the fact that the part of the handshake used to negotiate which cipher was used was not digitally signed.

The on-path attacker can swap out the supported ciphers (or supported groups, or supported curves) from the client with an easily crackable choice that the server supports. They then break the key and forge two finished messages to make both parties think they’ve agreed on a transcript.

FREAK

These attacks are called downgrade attacks, and they allow attackers to force two participants to use the weakest cipher supported by both parties, even if more secure ciphers are supported. In this style of attack, the perpetrator sits in the middle of the handshake and changes the list of supported ciphers advertised from the client to the server to only include weak export ciphers. The server then chooses one of the weak ciphers, and the attacker figures out the key with a brute-force attack, allowing the attacker to forge the MACs on the handshake. In TLS 1.3, this type of downgrade attack is impossible because the server now signs the entire handshake, including the cipher negotiation.

signed transcript

Better living through simplification

TLS 1.3 is a much more elegant and secure protocol with the removal of the insecure features listed above. This hedge-trimming allowed the protocol to be simplified in ways that make it easier to understand, and faster.

No more take-out menu

In previous versions of TLS, the main negotiation mechanism was the ciphersuite. A ciphersuite encompassed almost everything that could be negotiated about a connection:

  • type of certificates supported
  • hash function used for deriving keys (e.g., SHA1, SHA256, …)
  • MAC function (e.g., HMAC with SHA1, SHA256, …)
  • key exchange algorithm (e.g., RSA, ECDHE, …)
  • cipher (e.g., AES, RC4, …)
  • cipher mode, if applicable (e.g., CBC)

Ciphersuites in previous versions of TLS had grown into monstrously large alphabet soups. Examples of commonly used cipher suites are: DHE-RC4-MD5 or ECDHE-ECDSA-AES-GCM-SHA256. Each ciphersuite was represented by a code point in a table maintained by an organization called the Internet Assigned Numbers Authority (IANA). Every time a new cipher was introduced, a new set of combinations needed to be added to the list. This resulted in a combinatorial explosion of code points representing every valid choice of these parameters. It had become a bit of a mess.

take-out menu

TLS 1.2

prix fixe

TLS 1.3

TLS 1.3 removes many of these legacy features, allowing for a clean split between three orthogonal negotiations:

  • Cipher + HKDF Hash
  • Key Exchange
  • Signature Algorithm

negotiation

This simplified cipher suite negotiation and radically reduced set of negotiation parameters opens up a new possibility. This possibility enables the TLS 1.3 handshake latency to drop from two round-trips to only one round-trip, providing the performance boost that will ensure that TLS 1.3 will be popular and widely adopted.

Performance

When establishing a new connection to a server that you haven’t seen before, it takes two round-trips before data can be sent on the connection. This is not particularly noticeable in locations where the server and client are geographically close to each other, but it can make a big difference on mobile networks where latency can be as high as 200ms, an amount that is noticeable for humans.

1-RTT mode

TLS 1.3 now has a radically simpler cipher negotiation model and a reduced set of key agreement options (no RSA, no user-defined DH parameters). This means that every connection will use a DH-based key agreement and the parameters supported by the server are likely easy to guess (ECDHE with X25519 or P-256). Because of this limited set of choices, the client can simply choose to send DH key shares in the first message instead of waiting until the server has confirmed which key shares it is willing to support. That way, the server can learn the shared secret and send encrypted data one round trip earlier. Chrome’s implementation of TLS 1.3, for example, sends an X25519 keyshare in the first message to the server.

DH in 1.2
DH in 1.3

In the rare situation that the server does not support one of the key shares sent by the client, the server can send a new message, the HelloRetryRequest, to let the client know which groups it supports. Because the list has been trimmed down so much, this is not expected to be a common occurrence.

0-RTT resumption

A further optimization was inspired by the QUIC protocol. It lets clients send encrypted data in their first message to the server, resulting in no additional latency cost compared to unencrypted HTTP. This is a big deal, and once TLS 1.3 is widely deployed, the encrypted web is sure to feel much snappier than before.

In TLS 1.2, there are two ways to resume a connection, session ids and session tickets. In TLS 1.3 these are combined to form a new mode called PSK (pre-shared key) resumption. The idea is that after a session is established, the client and server can derive a shared secret called the “resumption main secret”. This can either be stored on the server with an id (session id style) or encrypted by a key known only to the server (session ticket style). This session ticket is sent to the client and redeemed when resuming a connection.

For resumed connections, both parties share a resumption main secret so key exchange is not necessary except for providing forward secrecy. The next time the client connects to the server, it can take the secret from the previous session and use it to encrypt application data to send to the server, along with the session ticket. Something as amazing as sending encrypted data on the first flight does come with its downfalls.

Replayability

There is no interactivity in 0-RTT data. It’s sent by the client, and consumed by the server without any interactions. This is great for performance, but comes at a cost: replayability. If an attacker captures a 0-RTT packet that was sent to server, they can replay it and there’s a chance that the server will accept it as valid. This can have interesting negative consequences.

0-rtt-attack-@2x

An example of dangerous replayed data is anything that changes state on the server. If you increment a counter, perform a database transaction, or do anything that has a permanent effect, it’s risky to put it in 0-RTT data.

As a client, you can try to protect against this by only putting “safe” requests into the 0-RTT data. In this context, “safe” means that the request won’t change server state. In HTTP, different methods are supposed to have different semantics. HTTP GET requests are supposed to be safe, so a browser can usually protect HTTPS servers against replay attacks by only sending GET requests in 0-RTT. Since most page loads start with a GET of “/” this results in faster page load time.

Problems start to happen when data sent in 0-RTT are used for state-changing requests. To help prevent against this failure case, TLS 1.3 also includes the time elapsed value in the session ticket. If this diverges too much, the client is either approaching the speed of light, or the value has been replayed. In either case, it’s prudent for the server to reject the 0-RTT data.

For more details about 0-RTT, and the improvements to session resumption in TLS 1.3, check out this previous blog post.

Deployability

TLS 1.3 was a radical departure from TLS 1.2 and earlier, but in order to be deployed widely, it has to be backwards compatible with existing software. One of the reasons TLS 1.3 has taken so long to go from draft to final publication was the fact that some existing software (namely middleboxes) wasn’t playing nicely with the new changes. Even minor changes to the TLS 1.3 protocol that were visible on the wire (such as eliminating the redundant ChangeCipherSpec message, bumping the version from 0x0303 to 0x0304) ended up causing connection issues for some people.

Despite the fact that future flexibility was built into the TLS spec, some implementations made incorrect assumptions about how to handle future TLS versions. The phenomenon responsible for this change is called ossification and I explore it more fully in the context of TLS in my previous post about why TLS 1.3 isn’t deployed yet. To accommodate these changes, TLS 1.3 was modified to look a lot like TLS 1.2 session resumption (at least on the wire). This resulted in a much more functional, but less aesthetically pleasing protocol. This is the price you pay for upgrading one of the most widely deployed protocols online.

Conclusions

TLS 1.3 is a modern security protocol built with modern tools like formal analysis that retains its backwards compatibility. It has been tested widely and iterated upon using real world deployment data. It’s a cleaner, faster, and more secure protocol ready to become the de facto two-party encryption protocol online. Draft 28 of TLS 1.3 is enabled by default for all Cloudflare customers, and we will be rolling out the final version soon.

Publishing TLS 1.3 is a huge accomplishment. It is one the best recent examples of how it is possible to take 20 years of deployed legacy code and change it on the fly, resulting in a better internet for everyone. TLS 1.3 has been debated and analyzed for the last three years and it’s now ready for prime time. Welcome, RFC 8446.

from:https://blog.cloudflare.com/rfc-8446-aka-tls-1-3/

使用 Spring Boot AOP 实现 Web 日志处理和分布式锁

AOP

AOP 的全称为 Aspect Oriented Programming,译为面向切面编程。实际上 AOP 就是通过预编译和运行期动态代理实现程序功能的统一维护的一种技术。在不同的技术栈中 AOP 有着不同的实现,但是其作用都相差不远,我们通过 AOP 为既有的程序定义一个切入点,然后在切入点前后插入不同的执行内容,以达到在不修改原有代码业务逻辑的前提下统一处理一些内容(比如日志处理、分布式锁)的目的。

为什么要使用 AOP

在实际的开发过程中,我们的应用程序会被分为很多层。通常来讲一个 Java 的 Web 程序会拥有以下几个层次:

  • Web 层:主要是暴露一些 Restful API 供前端调用。
  • 业务层:主要是处理具体的业务逻辑。
  • 数据持久层:主要负责数据库的相关操作(增删改查)。

虽然看起来每一层都做着全然不同的事情,但是实际上总会有一些类似的代码,比如日志打印和安全验证等等相关的代码。如果我们选择在每一层都独立编写这部分代码,那么久而久之代码将变的很难维护。所以我们提供了另外的一种解决方案: AOP。这样可以保证这些通用的代码被聚合在一起维护,而且我们可以灵活的选择何处需要使用这些代码。

AOP 的核心概念

  • 切面(Aspect):通常是一个类,在里面可以定义切入点和通知。
  • 连接点(Joint Point):被拦截到的点,因为 Spring 只支持方法类型的连接点,所以在 Spring 中连接点指的就是被拦截的到的方法,实际上连接点还可以是字段或者构造器。
  • 切入点(Pointcut):对连接点进行拦截的定义。
  • 通知(Advice):拦截到连接点之后所要执行的代码,通知分为前置、后置、异常、最终、环绕通知五类。
  • AOP 代理:AOP 框架创建的对象,代理就是目标对象的加强。Spring 中的 AOP 代理可以使 JDK 动态代理,也可以是 CGLIB 代理,前者基于接口,后者基于子类。

Spring AOP

Spring 中的 AOP 代理还是离不开 Spring 的 IOC 容器,代理的生成,管理及其依赖关系都是由 IOC 容器负责,Spring 默认使用 JDK 动态代理,在需要代理类而不是代理接口的时候,Spring 会自动切换为使用 CGLIB 代理,不过现在的项目都是面向接口编程,所以 JDK 动态代理相对来说用的还是多一些。在本文中,我们将以注解结合 AOP 的方式来分别实现 Web 日志处理和分布式锁。

Spring AOP 相关注解

  • @Aspect: 将一个 java 类定义为切面类。
  • @Pointcut:定义一个切入点,可以是一个规则表达式,比如下例中某个 package 下的所有函数,也可以是一个注解等。
  • @Before:在切入点开始处切入内容。
  • @After:在切入点结尾处切入内容。
  • @AfterReturning:在切入点 return 内容之后切入内容(可以用来对处理返回值做一些加工处理)。
  • @Around:在切入点前后切入内容,并自己控制何时执行切入点自身的内容。
  • @AfterThrowing:用来处理当切入内容部分抛出异常之后的处理逻辑。

其中 @Before@After@AfterReturning@Around@AfterThrowing 都属于通知。

AOP 顺序问题

在实际情况下,我们对同一个接口做多个切面,比如日志打印、分布式锁、权限校验等等。这时候我们就会面临一个优先级的问题,这么多的切面该如何告知 Spring 执行顺序呢?这就需要我们定义每个切面的优先级,我们可以使用 @Order(i) 注解来标识切面的优先级, i 的值越小,优先级越高。假设现在我们一共有两个切面,一个 WebLogAspect,我们为其设置 @Order(100);而另外一个切面 DistributeLockAspect 设置为 @Order(99),所以 DistributeLockAspect 有更高的优先级,这个时候执行顺序是这样的:在 @Before 中优先执行 @Order(99) 的内容,再执行 @Order(100) 的内容。而在 @After 和 @AfterReturning 中则优先执行 @Order(100) 的内容,再执行 @Order(99) 的内容,可以理解为先进后出的原则。

基于注解的 AOP 配置

使用注解一方面可以减少我们的配置,另一方面注解在编译期间就可以验证正确性,查错相对比较容易,而且配置起来也相当方便。相信大家也都有所了解,我们现在的 Spring 项目里面使用了非常多的注解替代了之前的 xml 配置。而将注解与 AOP 配合使用也是我们最常用的方式,在本文中我们将以这种模式实现 Web 日志统一处理和分布式锁两个注解。下面就让我们从准备工作开始吧。

准备工作

准备一个 Spring Boot 的 Web 项目

你可以通过 Spring Initializr 页面生成一个空的 Spring Boot 项目,当然也可以下载 springboot-pom.xml 文件,然后使用 maven 构建一个 Spring Boot 项目。项目创建完成后,为了方便后面代码的编写你可以将其导入到你喜欢的 IDE 中,我这里选择了 Intelli IDEA 打开。

添加依赖

我们需要添加 Web 依赖和 AOP 相关依赖,只需要在 pom.xml 中添加如下内容即可:

清单 1. 添加 web 依赖
1
2
3
4
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
清单 2. 添加 AOP 相关依赖
1
2
3
4
<dependency>
     <groupId>org.springframework.boot</groupId>
     <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

其他准备工作

为了方便测试我还在项目中集成了 Swagger 文档,具体的集成方法可以参照在 Spring Boot 项目中使用 Swagger 文档。另外编写了两个接口以供测试使用,具体可以参考本文源码。由于本教程所实现的分布式锁是基于 Redis 缓存的,所以需要安装 Redis 或者准备一台 Redis 服务器。

利用 AOP 实现 Web 日志处理

为什么要实现 Web 日志统一处理

在实际的开发过程中,我们会需要将接口的出请求参数、返回数据甚至接口的消耗时间都以日志的形式打印出来以便排查问题,有些比较重要的接口甚至还需要将这些信息写入到数据库。而这部分代码相对来讲比较相似,为了提高代码的复用率,我们可以以 AOP 的方式将这种类似的代码封装起来。

Web 日志注解

清单 3. Web 日志注解代码
1
2
3
4
5
6
7
8
@Documented
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface ControllerWebLog {
     String name();
     boolean intoDb() default false;
}

其中 name 为所调用接口的名称,intoDb 则标识该条操作日志是否需要持久化存储,Spring Boot 连接数据库的配置,可以参考 SpringBoot 项目配置多数据源这篇文章,具体的数据库结构可以点击这里获取。现在注解有了,我们接下来需要编写与该注解配套的 AOP 切面。

实现 WebLogAspect 切面

  1. 首先我们定义了一个切面类 WebLogAspect 如清单 4 所示。其中@Aspect 注解是告诉 Spring 将该类作为一个切面管理,@Component 注解是说明该类作为一个 Spring 组件。
    清单 4. WebLogAspect
    1
    2
    3
    4
    5
    @Aspect
    @Component
    @Order(100)
    public class WebLogAspect {
    }
  2. 接下来我们需要定义一个切点。
    清单 5. Web 日志 AOP 切点
    1
    2
    @Pointcut("execution(* cn.itweknow.sbaop.controller..*.*(..))")
    public void webLog() {}

    对于 execution 表达式,官网的介绍为(翻译后):

    清单 6. 官网对 execution 表达式的介绍
    1
    execution(<修饰符模式>?<返回类型模式><方法名模式>(<参数模式>)<异常模式>?)

    其中除了返回类型模式、方法名模式和参数模式外,其它项都是可选的。这个解释可能有点难理解,下面我们通过一个具体的例子来了解一下。在 WebLogAspect 中我们定义了一个切点,其 execution 表达式为 * cn.itweknow.sbaop.controller..*.*(..),下表为该表达式比较通俗的解析:

    表 1. execution() 表达式解析
  3. @Before 修饰的方法中的内容会在进入切点之前执行,在这个部分我们需要打印一个开始执行的日志,并将请求参数和开始调用的时间存储在 ThreadLocal 中,方便在后面结束调用时打印参数和计算接口耗时。
    清单 7. @Before 代码
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    @Before(value = "webLog()& &  @annotation(controllerWebLog)")
        public void doBefore(JoinPoint joinPoint, ControllerWebLog controllerWebLog) {
            // 开始时间。
            long startTime = System.currentTimeMillis();
            Map<String, Object> threadInfo = new HashMap<>();
            threadInfo.put(START_TIME, startTime);
            // 请求参数。
            StringBuilder requestStr = new StringBuilder();
            Object[] args = joinPoint.getArgs();
            if (args != null && args.length > 0) {
                for (Object arg : args) {
                    requestStr.append(arg.toString());
                }
            }
            threadInfo.put(REQUEST_PARAMS, requestStr.toString());
            threadLocal.set(threadInfo);
            logger.info("{}接口开始调用:requestData={}", controllerWebLog.name(), threadInfo.get(REQUEST_PARAMS));
     }
  4. @AfterReturning,当程序正常执行有正确的返回时执行,我们在这里打印结束日志,最后不能忘的是清除 ThreadLocal 里的内容。
    清单 8. @AfterReturning 代码
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    @AfterReturning(value = "webLog()&& @annotation(controllerWebLog)", returning = "res")
    public void doAfterReturning(ControllerWebLog controllerWebLog, Object res) {
            Map<String, Object> threadInfo = threadLocal.get();
            long takeTime = System.currentTimeMillis() - (long) threadInfo.getOrDefault(START_TIME, System.currentTimeMillis());
            if (controllerWebLog.intoDb()) {
                insertResult(controllerWebLog.name(), (String) threadInfo.getOrDefault(REQUEST_PARAMS, ""),
                            JSON.toJSONString(res), takeTime);
            }
            threadLocal.remove();
            logger.info("{}接口结束调用:耗时={}ms,result={}", controllerWebLog.name(),
                    takeTime, res);
    }
  5. 当程序发生异常时,我们也需要将异常日志打印出来:
    清单 9. @AfterThrowing 代码
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    @AfterThrowing(value = "webLog()& &  @annotation(controllerWebLog)", throwing = "throwable")
        public void doAfterThrowing(ControllerWebLog controllerWebLog, Throwable throwable) {
            Map< String, Object> threadInfo = threadLocal.get();
            if (controllerWebLog.intoDb()) {
                insertError(controllerWebLog.name(), (String)threadInfo.getOrDefault(REQUEST_PARAMS, ""),
                        throwable);
            }
            threadLocal.remove();
            logger.error("{}接口调用异常,异常信息{}",controllerWebLog.name(), throwable);
    }
  6. 至此,我们的切面已经编写完成了。下面我们需要将 ControllerWebLog 注解使用在我们的测试接口上,接口内部的代码已省略,如有需要的话,请参照本文源码
    清单 10. 测试接口代码
    1
    2
    3
    4
    5
    @PostMapping("/post-test")
    @ApiOperation("接口日志 POST 请求测试")
    @ControllerWebLog(name = "接口日志 POST 请求测试", intoDb = true)
    public BaseResponse postTest(@RequestBody BaseRequest baseRequest) {
    }
  7. 最后,启动项目,然后打开 Swagger 文档进行测试,调用接口后在控制台就会看到类似图 1 这样的日志。
    图 1. 基于 Redis 的分布式锁测试效果

    基于 Redis 的分布式锁测试效果

利用 AOP 实现分布式锁

为什么要使用分布式锁

我们程序中多多少少会有一些共享的资源或者数据,在某些时候我们需要保证同一时间只能有一个线程访问或者操作它们。在传统的单机部署的情况下,我们简单的使用 Java 提供的并发相关的 API 处理即可。但是现在大多数服务都采用分布式的部署方式,我们就需要提供一个跨进程的互斥机制来控制共享资源的访问,这种互斥机制就是我们所说的分布式锁。

注意

  1. 互斥性。在任时刻,只有一个客户端能持有锁。
  2. 不会发生死锁。即使有一个客户端在持有锁的期间崩溃而没有主动解锁,也能保证后续其他客户端能加锁。这个其实只要我们给锁加上超时时间即可。
  3. 具有容错性。只要大部分的 Redis 节点正常运行,客户端就可以加锁和解锁。
  4. 解铃还须系铃人。加锁和解锁必须是同一个客户端,客户端自己不能把别人加的锁给解了。

分布式锁注解

清单 11. 分布式锁注解
1
2
3
4
5
6
7
8
@Documented
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface DistributeLock {
    String key();
    long timeout() default 5;
    TimeUnit timeUnit() default TimeUnit.SECONDS;
}

其中,key 为分布式所的 key 值,timeout 为锁的超时时间,默认为 5,timeUnit 为超时时间的单位,默认为秒。

注解参数解析器

由于注解属性在指定的时候只能为常量,我们无法直接使用方法的参数。而在绝大多数的情况下分布式锁的 key 值是需要包含方法的一个或者多个参数的,这就需要我们将这些参数的位置以某种特殊的字符串表示出来,然后通过参数解析器去动态的解析出来这些参数具体的值,然后拼接到 key 上。在本教程中我也编写了一个参数解析器 AnnotationResolver。篇幅原因,其源码就不直接粘在文中,需要的读者可以查看源码

获取锁方法

清单 12. 获取锁
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
private String getLock(String key, long timeout, TimeUnit timeUnit) {
        try {
            String value = UUID.randomUUID().toString();
            Boolean lockStat = stringRedisTemplate.execute((RedisCallback< Boolean>)connection ->
                    connection.set(key.getBytes(Charset.forName("UTF-8")), value.getBytes(Charset.forName("UTF-8")),
                            Expiration.from(timeout, timeUnit), RedisStringCommands.SetOption.SET_IF_ABSENT));
            if (!lockStat) {
                // 获取锁失败。
                return null;
            }
            return value;
        } catch (Exception e) {
            logger.error("获取分布式锁失败,key={}", key, e);
            return null;
        }
}

RedisStringCommands.SetOption.SET_IF_ABSENT 实际上是使用了 setNX 命令,如果 key 已经存在的话则不进行任何操作返回失败,如果 key 不存在的话则保存 key 并返回成功,该命令在成功的时候返回 1,结束的时候返回 0。我们随机产生了一个 value 并且在获取锁成功的时候返回出去,是为了在释放锁的时候对该值进行比较,这样可以做到解铃还须系铃人,由谁创建的锁就由谁释放。同时还指定了超时时间,这样可以保证锁释放失败的情况下不会造成接口永远不能访问。

释放锁方法

清单 13. 释放锁
1
2
3
4
5
6
7
8
9
10
11
12
13
private void unLock(String key, String value) {
        try {
            String script = "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end";
            boolean unLockStat = stringRedisTemplate.execute((RedisCallback< Boolean>)connection ->
                    connection.eval(script.getBytes(), ReturnType.BOOLEAN, 1,
                            key.getBytes(Charset.forName("UTF-8")), value.getBytes(Charset.forName("UTF-8"))));
            if (!unLockStat) {
                logger.error("释放分布式锁失败,key={},已自动超时,其他线程可能已经重新获取锁", key);
            }
        } catch (Exception e) {
            logger.error("释放分布式锁失败,key={}", key, e);
        }
}

切面

切点和 Web 日志处理的切点一样,这里不再赘述。我们在切面中使用的通知类型为 @Around,在切点之前我们先尝试获取锁,若获取锁失败则直接返回错误信息,若获取锁成功则执行方法体,当方法结束后(无论是正常结束还是异常终止)释放锁。

清单 14. 环绕通知
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
@Around(value = "distribute()&& @annotation(distributeLock)")
public Object doAround(ProceedingJoinPoint joinPoint, DistributeLock distributeLock) throws Exception {
        String key = annotationResolver.resolver(joinPoint, distributeLock.key());
        String keyValue = getLock(key, distributeLock.timeout(), distributeLock.timeUnit());
        if (StringUtil.isNullOrEmpty(keyValue)) {
            // 获取锁失败。
            return BaseResponse.addError(ErrorCodeEnum.OPERATE_FAILED, "请勿频繁操作");
        }
        // 获取锁成功
        try {
            return joinPoint.proceed();
        } catch (Throwable throwable) {
            return BaseResponse.addError(ErrorCodeEnum.SYSTEM_ERROR, "系统异常");
        } finally {
            // 释放锁。
            unLock(key, keyValue);
        }
}

测试

清单 15. 分布式锁测试代码
1
2
3
4
5
6
7
8
9
10
11
12
@PostMapping("/post-test")
@ApiOperation("接口日志 POST 请求测试")
@ControllerWebLog(name = "接口日志 POST 请求测试", intoDb = true)
@DistributeLock(key = "post_test_#{baseRequest.channel}", timeout = 10)
public BaseResponse postTest(@RequestBody BaseRequest baseRequest) {
        try {
            Thread.sleep(10000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return BaseResponse.addResult();
}

在本次测试中我们将锁的超时时间设置为 10 秒钟,在接口中让当前线程睡眠 10 秒,这样可以保证 10 秒钟之内锁不会被释放掉,测试起来更加容易些。启动项目后,我们快速访问两次该接口,注意两次请求的 channel 传值需要一致(因为锁的 key 中包含该值),会发现第二次访问时返回如下结果:

图 2. 基于 Redis 的分布式锁测试效果

基于 Redis 的分布式锁测试效果

这就说明我们的分布式锁已经生效。

结束语

在本教程中,我们主要了解了 AOP 编程以及为什么要使用 AOP。也介绍了如何在 Spring Boot 项目中利用 AOP 实现 Web 日志统一处理和基于 Redis 的分布式锁。你可以在 Github 上找到本教程的完整实现,如果你想对本教程做补充的话欢迎发邮件(gancy.programmer@gmail.com)给我或者直接在 Github 上提交 Pull Reqeust。

参考资源

from:https://www.ibm.com/developerworks/cn/java/j-spring-boot-aop-web-log-processing-and-distributed-locking/index.html

Git:Please make sure you have the correct access rights and the repository exists

1、首先我得重新在git设置一下身份的名字和邮箱
git config –global user.name “yourname”
git config –global user.email“your@email.com”
2.删除.ssh文件夹(直接搜索该文件夹)下的known_hosts
3.$ ssh-keygen -t rsa -C “your@email.com”
.ssh文件夹下生成id_rsa和id_rsa.pub 2个文件,复制id_rsa.pub的全部内容
4.打开https://github.com/,登陆你的账户,[设置]->[ssh设置],将第3步复制的内容粘贴在key中后提交。
5.ssh -T git@github.com