All posts by dotte

数据库架构的升级和变更

SQLServer2008在数据的高安全、高性能、高可用方面的技术已经比较成熟,这些技术和方案都是随着很多公司的业务和数据访问压力的增加而不断的升级和

变迁的,同时经历了方方面面的考验,证明了它们都是成熟可靠的,下面就这方面的技术方案和变迁过程来做一些分析。

阶段一:

裸奔时代:

优点:裸奔最大的好处就是简单,成本低。

缺点:一旦服务器出现问题,恢复起来比较麻烦;如果访问压力变大,服务器可能不堪重负。

 

阶段二:

单库+Mirror+BackUp方案:

说明:Mirror有两种方式,同步和异步;同步方式能保证主库和Mirror端数据的一致性,而且不需要使用企业版,但是对主库的性能影响也比较大;异步方式需要

企业版才支,绝大部分时刻能保证数据的一致性,但是也有丢失小部分的数据可能,不过它主库的影响比较小。

 

优点:此方案对主库的数据提供了可靠的保护,一旦主库出现问题,从库能在比较短的时间内恢复,尤其是数据库很大时(从备份恢复需要的时间会很长),能尽

快的恢复业务使用,而且Mirror端能生成快照,能给实时性要求不高的业务使用。

 

缺点:Mirror会影响主库的部分性能(异步方式影响比较小),主库出现问题后,前端需要更改访问的IP地址(或者将从服务器的IP地址改成主服务器的IP地址),

还需要账号、权限和作业等信息迁移过去。

 

单库+Replication+ BackUp方案:

优点:Replication端可以提供给前段访问,可以将读操作放到从库,分担主库的部分压力,还能提供数据库的备份功能,不过这种备份很可能数据会有丢失。

 

缺点:不能提供安全的数据保护功能,对主库有一些性能影响。

 

阶段三:

单库+Replication+Mirror+BackUp方案:

优点:这种方案是前面两种方案的结合,既能够解决数据保护的问题,也能够提供读写分离的功能。

缺点:主库上既有Mirror又有Replication,这种方式对主库影响会比较大,而且实际证明,Mirror和Replication在同一台机器上部署,在一个出现问题时,

会对另一成影响

 

阶段四:

Cluster(双A)+BackUp方案:

说明:图中矩形部分代表存储,两台服务器做了双A的群集。

 

优点:Cluster能确保其中的一个服务器出现问题时所有的数据和服务能切换到另外一台机器,切换的时间很短,能尽快的恢复业务访问。

缺点:双A群集一般要求配置比较好,价格比较高;因数据都存放在存储上,所以群集不能保护数据,一旦数据或者存储出现问题,需要从备份中恢复数据;

SQLServer的群不能提供负载均衡的功能

 

阶段五:

Cluster(双A)+Mirror+BackUp方案:

 

说明:双A群集再加两个服务器上库的Mirror保护。

 

优点:这个方案能对数据提供可靠的保护,无论是服务器故障还是存储故障,都能保证数据的安全,而且数据恢复的时间比较短。

缺点:Mirror会消耗主服务器的部分性能,多了两台Mirror机器,成本会增加,如果存储出现问题,快速恢复的方案是启用Mirror机器,后面可能需要重做群集。

 

阶段六:

Cluster(双A)+Mirror+BackUp+Replication+单分发方案:

 

说明:双A 群集,Mirror保护,单分发机器和读写分离方案。

 

优点:群集和Mirror能充分保护数据的安全,读写分离能提高系统整体的性能。

缺点:成本较高,单分发机存在单点故障,如果分发机器出现问题,将需要重建,此时读和写都将集中到主库,压力会比较大。

 

Cluster(双A)+Mirror+BackUp+Replication+双分发方案:

 

 

优点:与单分发机相比,没有单点故障,即使某台分发机出现问题,也能保证读写分离机制继续运行。

缺点:成本增加,维护方面更复杂。

 

阶段七:

Cluster(双A)+双存储+BackUp+Replication+双分发方案:

 

优点:双存储方案使得数据能得到有效的保护,而且避开了Mirror和Replication同时在主库运行对主服务器造成的影响,节省主服务器资源,而且恢复比较方便。

缺点:成本增加。

 

阶段八:

Cluster(双A)+双存储+BackUp+Replication+双分发+SSB异步方案:

 

此方式的主要优势是将数据流异步处理,缓解瞬时高流量主库的压力,因为此方案比较复杂,暂时不做说明,可以参考数据库架构

 

阶段九:

拆分业务和数据、采用分布式数据库、使用能负载均衡集群功能的数据库等。

 

此文档大致描述了随公司的发展、服务器压力的增加,数据库架构方面的变迁阶段,当然我们应该根据公司的具体情况,选择性的采用其中的技术,也可能是

直接跳过某些阶段,而上更高效的方案(如果成本能够接受),因此技术和方案的选择应该根据实际情况,灵活应对。

from:http://www.cnblogs.com/fygh/archive/2012/03/23/2413164.html

250+ SQL Queries Interview Question Answers

This is a never ending topic and there is no limit to questions that all depends on profile and interviewer.

I am trying my level best to divide questionnaire based on Experience in Query Writing skills

SQL Queries Interview Questions for Experts (5-10 Years of experience)

Suppose, you are build a application like FACEBOOK and as a business you need to suggest Friend by displaying Friends of Friend’s. So you need to build a query that should return resultset which should have Friend Friend name. Just for reference, this is how friend table looks like,

create table Friend
(
ProfileID uniqueidentifier,
Name varchar(50),
Friend_Name varchar(50)
)
insert Friend values (NEWid(),'RAM', 'Shyam')
insert Friend values ('FFCB96AD-0F16-4A77-B634-3CE1F697A3D1','RAM', 'Tony')
insert Friend values ('FFCB96AD-0F16-4A77-B634-3CE1F697A3D1','RAM', 'Vibha')

insert Friend values (NEWid(),'SHYAM', 'RAM')
insert Friend values ('42A6A0EC-8EE5-4611-92C7-A23B0970B463','SHYAM', 'SAM')
insert Friend values ('42A6A0EC-8EE5-4611-92C7-A23B0970B463','SHYAM', 'Vibha')
insert Friend values ('42A6A0EC-8EE5-4611-92C7-A23B0970B463','SHYAM', 'John')

insert Friend values (NEWid(),'VIBHA', 'RAM')
insert Friend values ('AC40284F-4E54-495A-BF62-9701474C44C0','VIBHA', 'SHYAM')
insert Friend values ('AC40284F-4E54-495A-BF62-9701474C44C0','VIBHA', 'George')
insert Friend values ('AC40284F-4E54-495A-BF62-9701474C44C0','VIBHA', 'TOM')

insert Friend values (NEWid(),'TOM', 'RAM')
insert Friend values ('DE86E5EC-9748-47A0-936E-0BB6BCBCA1A0','TOM', 'DNATAG')
insert Friend values ('DE86E5EC-9748-47A0-936E-0BB6BCBCA1A0','TOM', 'Reddy')

FACEBOOK Query Question

Solution

-- Query Solution
declare @user varchar(50)
set @user = 'RAM'
SELECT  f1.name,
        f2.friend_name as friend_of_friend
FROM    friend f1,
        friend f2
WHERE   f1.name = @user
AND     f1.friend_name = f2.name
AND     NOT EXISTS
        (SELECT 1 FROM friend f3
         WHERE f3.name = f1.name
         AND   f3.friend_name = f2.friend_name) and f2.friend_name  <>@user

solution

What is blocking and how would you troubleshoot it?

Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first.

What are the steps you will take to improve performance of a poor performing query?

This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables.

Some of the tools/ways that help you troubleshooting performance problems are:

  • SET SHOWPLAN_ALL ON,
  • SET SHOWPLAN_TEXT ON,
  • SET STATISTICS IO ON,
  • SQL Server Profiler,
  • Windows NT /2000 Performance monitor,
  • Graphical execution plan in Query Analyzer.

You are being you being assigned a task to move 5 million rows from one server to another using T-SQL with a linked-server. What will you consider to avoid transaction log fill up at destination server?

Will prefer to use SET ROWCOUNT and a while loop to commit data in batches.

What is the optimal Disk configuration for a database server and what RAID configurations would you use if budget is not a constraint?

  1. RAID 1 for the OS / Applications
  2. RAID 1 for the page file
  3. RAID 10 for the Data file (or RAID 5 for few writes)
  4. RAID 1 (or 10) for the transaction log

What is a deadlock and what is a live lock? How will you go about resolving deadlocks?

Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other’s piece. Each process  would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user’s process.

A livelock is one, where a  request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.

Check out SET DEADLOCK_PRIORITY and “Minimizing Deadlocks”  in SQL Server books online. Also check out the article Q169960 from Microsoft knowledge base.

What is blocking and how would you troubleshoot it?

Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first.

Read up the following topics in SQL Server books online: Understanding and avoiding blocking, Coding efficient transactions.

What are statistics, under what circumstances they go out of date, how do you update them?

Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.

Some situations under which you should update statistics:

  1. If there is significant change in the key values in the index
  2. If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
  3. Database is upgraded from a previous version

Look up SQL Server books online for the following commands:

UPDATE STATISTICS,
STATS_DATE,

DBCC SHOW_STATISTICS,

CREATE STATISTICS,

DROP STATISTICS,

sp_autostats,

sp_createstats,

sp_updatestats

Write SQL query to find the products which have continuous increase in sales every year considering the following Schema and tell which optimized query?

Table Structure

CREATE TABLE PRODUCTS
(
       PRODUCT_ID     INTEGER,
       PRODUCT_NAME   VARCHAR(30)
);
CREATE TABLE SALES
(
       SALE_ID        INTEGER,
       PRODUCT_ID     INTEGER,
       YEAR           INTEGER,
       Quantity       INTEGER,
       PRICE          INTEGER
);

This table, contains the following rows,

SQL Queries Interview Questions

Solution

SELECT PRODUCT_NAME
FROM
(
SELECT P.PRODUCT_NAME,
       S.QUANTITY -
       LEAD(S.QUANTITY,1,0) OVER (
                            PARTITION BY P.PRODUCT_ID
                            ORDER BY S.YEAR DESC
                            ) QUAN_DIFF
FROM   PRODUCTS P,
       SALES S
WHERE  P.PRODUCT_ID = S.PRODUCT_ID
)A
GROUP BY PRODUCT_NAME
HAVING MIN(QUAN_DIFF) >= 0;

Note – This Solution is using a LEAD fn, which is available in SQL Server 2012.

Based on above mentioned tables, Write two alternative SQL query to find the products which does not have sales at all and identify, which is more optimized.

 

 

-- OPTION 1
SELECT P.PRODUCT_NAME
FROM   PRODUCTS P
       LEFT OUTER JOIN
       SALES S
ON     (P.PRODUCT_ID = S.PRODUCT_ID)
WHERE  S.QUANTITY IS NULL

-- OPTION 2
SELECT P.PRODUCT_NAME
FROM   PRODUCTS P
WHERE  P.PRODUCT_ID NOT IN
       (SELECT DISTINCT PRODUCT_ID FROM SALES);

-- Option 3
SELECT P.PRODUCT_NAME
FROM   PRODUCTS P
WHERE  NOT EXISTS
       (SELECT 1 FROM SALES S WHERE S.PRODUCT_ID = P.PRODUCT_ID);

Based on above mentioned tables,Write two alternatives query to select the top product sold in each year?

 

-- OPTION 1
SELECT P.PRODUCT_NAME,
S.YEAR,
S.QUANTITY
FROM
(
SELECT YEAR,
MAX(QUANTITY) QUAN
FROM SALES
GROUP BY YEAR
)A, SALES S,
PRODUCTS P
WHERE A.YEAR = S.YEAR
AND A.QUAN = S.QUANTITY
AND S.PRODUCT_ID = P.PRODUCT_ID;

-- Option 2
SELECT PRODUCT_NAME,
       YEAR
FROM
(
SELECT P.PRODUCT_NAME,
       S.YEAR,
       RANK() OVER (
              PARTITION BY S.YEAR
              ORDER BY S.QUANTITY DESC
              ) RNK
FROM   PRODUCTS P,
       SALES S
WHERE  P.PRODUCT_ID = S.PRODUCT_ID
) A
WHERE RNK = 1;

OUTPUT

SQL Server Queries Interview Questions

Based on above mentioned tables structure, Write a query to find the products whose quantity sold in a year should be greater than the average quantity sold across all the years??

products whose quantity sold in a year should be greater than the average quantity sold across all the years

Based on above mentioned tables structure, Write a query to compare the products sales of “IPhone” and “Samsung” in each year?

 

compare the products sales each year

Could you please some items which you may see in an execution plan indicating the query
is not optimized.

  1. Index Scan or Table Scan
  2. Hash Joins
  3. Thick arrows (indicating large work tables)
  4. Parallel streams (Parallelism)
  5. Bookmark lookup (or key lookup)

What structure can you implement for the database to speed up table reads?

Follow the rules of DB tuning we have to:

1] properly use indexes ( different types of indexes)

2] properly locate different DB objects across different tablespaces, files and so on.

3] create a special space (tablespace) to locate some of the data with special datatype ( for example CLOB, LOB and …)

SQL Queries Interview Questions for Intermediate (2-5 Years of experience)

What is Lock Escalation?

Lock escalation is the process of converting a lot of low level locks (like row locks, page locks) into higher level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks. Lock escalation threshold was definable in SQL Server 6.5, but from SQL Server 7.0 onwards it’s dynamically managed by SQL Server.

What is a bookmark lookup?

When a non clustered index is used for the seek and the data needed was not
at the leaf level.

What is a key lookup?

Same as bookmark lookup, when a non clustered index is used for the seek and the data needed was not
at the leaf level.

What new indexes are introduced in SQL Server onwards ?

– Spatial
– XML

Could you please describe some properties / Facts about NULL during comparisons.

NULL can not used with any comparing operator, NULL

  • NULL will never be true or false
  • NULL can not compared as =,<>,<,>, <=,<= etc.
  • NILL is always equates will NULL only

What are cursors? Explain different types of cursors. What are the disadvantages of cursors? How can you avoid cursors?

Cursors allow row-by-row processing of the resultsets.

Types of cursors:

Static,
Dynamic,

Forward-only,

Keyset-driven.

See books online for more information.

Disadvantages of cursors: Because we know cursor doing roundtrip it will make network line busy and also make time consuming methods. First of all select query gernate output and after that cursor goes one by one so roundtrip happen.Another disadvange of cursor are ther are too costly because they require lot of resources and temporary storage so network is quite busy. Moreover, there are restrictions on SELECT statements that can be used with some types of cursors.

Most of the times, set based operations can be used instead of cursors. Here is an example:

If you have to give a flat hike to your employees using the following criteria:

Salary between 30000 and 40000 — 5000 hike
Salary between 40000 and 55000 — 7000 hike

Salary between 55000 and 65000 — 9000 hike

In this situation many developers tend to use a cursor, determine each employee’s salary and update his salary according to the above formula. But the same can be achieved by multiple update statements or can be combined in a single UPDATE statement as shown below:

UPDATE tbl_emp SET salary =
CASE WHEN salary BETWEEN 30000 AND 40000 THEN salary + 5000

WHEN salary BETWEEN 40000 AND 55000 THEN salary + 7000

WHEN salary BETWEEN 55000 AND 65000 THEN salary + 10000

END

Another situation in which developers tend to use cursors: You need to call a stored procedure when a column in a particular row meets certain condition. You don’t have to use cursors for this. This can be achieved using WHILEloop, as long as there is a unique key to identify each row.

Could you please explain, how to use Cursors ?

  1. Declare a CURSOR
  2. OPEN a CURSOR
  3. FETCH data in CURSOR
  4. finally close CURSOR

Suppose, we are building a UAT environment and we need to build a algorithm to export exact 50% of the rows (I means to say alternative rows either even or ODD). So write a query to select prime number rows from table.

         select profileid,name,friend_name
  from (select f.*, row_number() over (order by profileid asc) rn
         from Friend f) a
where rn%2=1;

Testing Script

create table Friend
(
ProfileID uniqueidentifier,
Name varchar(50),
Friend_Name varchar(50)
)
insert Friend values (NEWid(),'RAM', 'Shyam')
insert Friend values ('FFCB96AD-0F16-4A77-B634-3CE1F697A3D1','RAM', 'Tony')
insert Friend values ('FFCB96AD-0F16-4A77-B634-3CE1F697A3D1','RAM', 'Vibha')

insert Friend values (NEWid(),'SHYAM', 'RAM')
insert Friend values ('42A6A0EC-8EE5-4611-92C7-A23B0970B463','SHYAM', 'SAM')
insert Friend values ('42A6A0EC-8EE5-4611-92C7-A23B0970B463','SHYAM', 'Vibha')
insert Friend values ('42A6A0EC-8EE5-4611-92C7-A23B0970B463','SHYAM', 'John')

insert Friend values (NEWid(),'VIBHA', 'RAM')
insert Friend values ('AC40284F-4E54-495A-BF62-9701474C44C0','VIBHA', 'SHYAM')
insert Friend values ('AC40284F-4E54-495A-BF62-9701474C44C0','VIBHA', 'George')
insert Friend values ('AC40284F-4E54-495A-BF62-9701474C44C0','VIBHA', 'TOM')

insert Friend values (NEWid(),'TOM', 'RAM')
insert Friend values ('DE86E5EC-9748-47A0-936E-0BB6BCBCA1A0','TOM', 'DNATAG')
insert Friend values ('DE86E5EC-9748-47A0-936E-0BB6BCBCA1A0','TOM', 'Reddy')
go

         select profileid,name,friend_name
  from (select e.*, row_number() over (order by profileid asc) rn
         from Friend e) a
where rn%2=1;

 

What is the system function to get the current user’s user id?

USER_ID(). Also check out other system functions like

USER_NAME(),
SYSTEM_USER,

SESSION_USER,

CURRENT_USER,

USER,

SUSER_SID(),

HOST_NAME().

 

What is precedence constraints and can you name some?

Precedence constraints are used in DTS / SSIS packages to move from one
task to another. We have three type of precedence constraints

  1. Success
  2. Failure
  3. Completion

What sort of resource contention we can have, which can cause database to run slow?

  1. CPU bottleneck
  2. Memory bottleneck
  3. Network IO bottleneck
  4. Disk IO bottleneck
  5. Paging File (process trimming)
  6. Lock contention
  7. Corrupt index
  8. Recompilation

Can you have a nested transaction?

Yes, SQL Server do support nested transaction up to 32 levels. Check out BEGIN TRAN, COMMIT, ROLLBACK, SAVE TRAN and @@TRANCOUNT

Which is faster a Table Scan, or a Clustered Index Scan?

Same speed in case a table has a clustered index that it’s always show index scan instead of table scan.

What is an extended stored procedure? Can you instantiate a COM object by using T-SQL?

An extended stored procedure is a function within a DLL (written in a programming language like C, C++ using Open Data Services (ODS) API) that can be called from T-SQL, just the way we call normal stored procedures using theEXEC statement. See books online to learn how to create extended stored procedures and how to add them to SQL Server.

Yes, you can instantiate a COM (written in languages like VB, VC++) object from T-SQL by using sp_OACreatestored procedure.

Also see books online for sp_OAMethod, sp_OAGetProperty, sp_OASetProperty, sp_OADestroy.

What is recompilation?

When the cached execution plan for a query cannot be used so the procedure
recompiles.

What is parallelism?

SQL Server can perform a query or index operation in parallel by using several operating system threads, the operation can be completed quickly. When a single query runs of multiple CPUs is known as query parallelism.

What is the default query threshold for parallelism?

The query optimizer decides to utilize multiple SPIDS running on different processors to query / transfer data. Default threshold is 5 seconds.

What are the main reasons for statement recompilation ?

Recompilation happen mainly because of

  • underlying statistics change
  • DDL changes within the procedure.
  • The parameters the procedure was compiled with vary from the recently passed in parameters.
  • The query plan was flushed from cache.

How will you handle exceptions in SQL Server programming

By using TRY-CATCH constructs,

What is difference between Co-related sub query and nested sub query?

Correlated subquery runs once for each row selected by the outer query. It contains a reference to a value from the row selected by the outer query.

Nested subquery runs only once for the entire nesting (outer) query. It does not contain any reference to the outer query row.

For example,

Correlated Subquery:

select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal = (select max(basicsal) from emp e2 where e2.deptno = e1.deptno)

Nested Subquery:

select empname, basicsal, deptno from emp where (deptno, basicsal) in (select deptno, max(basicsal) from emp group by deptno)

Let’s assume, you are working for online product selling company like AMAZON, now you need to write two alternatives query to return the firstname, lastname and the most recent OrderID for all customers.

-- OPTION 1
SELECT o.OrderID,c.FirstName,c.LastName FROM Orders o
JOIN Customers c
ON o.CustomerID = c.CustomerID
WHERE OrderDate = (SELECT MAX(OrderDate)
FROM Orders
WHERE CustomerID = o.CustomerID

-- OPTION 2
SELECT c.FirstName,c.LastName,o.OrderNumber FROM Orders o
JOIN
(
SELECT MAX(OrderDate) AS MaxOrderDate,
custid
FROM orders
GROUP BY custid
) o_2
ON o.custid = o_2.custid
AND o.orderdate = o_2.MaxOrderDate
JOIN customers c
ON c.CustID = o.CustID

What tools do you use for performance tuning?

Query Analyzer, Profiler, Index Wizard, Performance Monitor

What are extended stored procedures?

How can you execute a DOS command from SQL or through SQL query by using xp_cmdshell?

exec xp_cmdshell 'dir c:\*.exe'

What sp_MSforeachtable  does ?

You can use sp_MSforeachtable undocumented stored procedure to rebuild all indexes in your database. Try to schedule it to execute during CPU idle time and slow production periods.
sp_MSforeachtable @command1=”print ‘?’ DBCC DBREINDEX (‘?’)”

How do I prevent SQL injection in my applications?

Check my previous post, “How to secure against SQL injection“.

What you do when a particular query is slow?

  1. Run SQL profiler and determine if abnormal amounts of IO or CPU is used.
  2. Run profiler to determine if recompilation is a factor.
  3. Update the statistics.
  4. Check the execution plan

Can we convert the column datatype in to different datatype in a exiting table ?

YES, we can change data type of column for an exiting table too but we need to make sure the new datatype is compateble with old datatype. For Example, we can not convert a column which store varchar values and has textual data  in it to a int type data type. Following TSQL can be used to modify (alter) a exiting column

alter table tablename alter Column Columnname newdatatype

SQL Queries Interview Questions for Beginner (1-2 Years of experience)

In what sequence SQL statement are processed?

The clauses of the select are processed in the following sequence

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause
  6. ORDER BY clause
  7. TOP clause

Can we write a distributed query and get some data which is located on other server and on Oracle Database ?

SQL Server can be lined to any server provided it has OLE-DB provider from Microsoft to allow a link.
E.g. Oracle has a OLE-DB provider for oracle that Microsoft provides to add it as linked server to SQL Server group.

If we drop a table, does it also drop related objects like constraints, indexes, columns, defaults, Views and Stored Procedures ?

YES, SQL Server drops all related objects, which exists inside a table like, constraints, indexes, columns, defaults etc. BUT dropping a table will not drop Views and Stored Procedures as they exists outside the table.

How would you determine the time zone under which a database was operating?

 

Can we add identity column to decimal datatype?
YES
, SQL Server support this

What is the Difference between LEFT JOIN with WHERE clause & LEFT JOIN with no WHERE clause ?

OUTER LEFT/RIGHT JOIN with WHERE clause can act like an INNER JOIN if not used wisely or logically.

What are the Multiple ways to execute a dynamic query ?

EXEC sp_executesql, EXECUTE()

What is the Difference between COALESCE() & ISNULL() ?

ISNULL accepts only 2 parameters. The first parameter is checked for NULL value, if it is NULL then the second parameter is returned, otherwise it returns first parameter.
COALESCE accepts two or more parameters. One can apply 2 or as many parameters, but it returns only the first non NULL parameter,

How do you generate file output from SQL?

While using SQL Server Management Studio or Query Analyzer, we have an option in Menu BAR.QUERTY >> RESULT TO >> Result to FILE

How do you prevent SQL Server from giving you informational messages during and after a SQL statement execution?

SET NOCOUNT OFF

By Mistake, Duplicate records exists in a table, how can we delete copy of a record ?

;with T as
(
    select * , row_number() over (partition by Emp_ID order by Emp_ID) as rank
    from employee
)

delete
from T
where rank > 1

WHAT OPERATOR PERFORMS PATTERN MATCHING?

Pattern matching operator is LIKE and it has to used with two attributes

1. %  means matches zero or more characters and

2. _ ( underscore ) means matching exactly one character

What’s the logical difference, if any, between the following SQL expressions?

-- Statement 1
SELECT COUNT ( * ) FROM Employees

-- Statement 2
SELECT SUM ( 1 ) FROM Employees

They’re the same unless table Employee table is empty, in which case the first yields a one-column, one-row table containing a zero and the second yields a one-column, one-row table “containing a null.”

Is it possible to update Views? If yes, How, If Not, Why?

Yes, We can modify views but a DML statement on a join view can modify only one base table of the view (so even if the view is created upon a join of many tables, only one table, the key preserved table can be modified through the view).

Could you please name different kinds of Joins available in SQL Server ?

  • OUTER JOIN – LEFT, RIGHT, CROSS, FULL ;
  • INNER JOIN

How important do you consider cursors or while loops for a transactional database?

would like to avoid cursor in OLTP database as much as possible, Cursors are mainly only used for maintenance or warehouse operations.

What is a correlated sub query?

When a sub query is tied to the outer query. Mostly used in self joins.

What is faster, a correlated sub query or an inner join?

Correlated sub query.

You are supposed to work on SQL optimization and given a choice which one runs faster, a correlated sub query or an exists?

Exists

Can we call .DLL from SQL server?

YES, We can call .Dll from SQL Server. Please check my previous post, “How to call a .dll file in SQL Server

What are the pros and cons of putting a scalar function in a queries select list or in the where clause?

Should be avoided if possible as Scalar functions in these places make the query slow down dramatically.

What is difference between truncate and drop statement ?

Check my previous post difference between TRUNCATE & DELETE

What is difference between truncate and delete statement ?

Check my previous post difference between TRUNCATE & DELETE

What are user defined data types and when you should go for them?

User defined data types let you extend the base SQL Server data types by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called Flight_Num which appears in many tables. In all these tables it should be varchar(8). In this case you could create a user defined data type calledFlight_num_type of varchar(8) and use it across all your tables.

See sp_addtype, sp_droptype in books online.

Can You Explain Integration Between SQL Server 2005 And Visual Studio 2005 ?

This integration provide wider range of development with the help of CLR for database server because CLR helps developers to get flexibility for developing database applications and also provides language interoperability just like Visual C++, Visual Basic .Net and Visual C# .Net. The CLR helps developers to get the arrays, classes and exception handling available through programming languages such as Visual C++ or Visual C# which is use in stored procedures, functions and triggers for creating database application dynamically and also provide more efficient reuse of code and faster execution of complex tasks. We particularly liked the error-checking powers of the CLR environment, which reduces run-time errors

Please check my previous post, “Can we call .DLL from SQL server?

You are being assigned to create a complex View and you have completed that task and that view is ready to be get pushed to production server now. you are supposed to fill a deployment form before any change is pushed to production server.

One of the Filed in that deployment form asked, “Expected Storage requirement”. What all factors you will consider to calculate storage requirement for that view ?

Very tricky, View, doesn’t takes space in Database, Views are virtual tables. Storage is required to store Index, incase you are developing a indexed view.

 

What is Index, cluster index and non cluster index ?

Clustered Index:- A Clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table may have only one clustered index.Non-Clustered Index:- A Non-Clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows in the disk. The leaf nodes of a non-clustered index does not consists of the data pages. instead the leaf node contains index rows.

Write down the general syntax for a SELECT statements covering all the options.

Here’s the basic syntax: (Also checkout SELECT in books online for advanced syntax).

SELECT select_list
[INTO new_table_]

FROM table_source

[WHERE search_condition]

[GROUP BY group_by__expression]

[HAVING search_condition]

[ORDER BY order__expression [ASC | DESC] ]

What is a join and explain different types of joins?

Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table.

Types of joins:

INNER JOINs,
OUTER JOINs,

CROSS JOINs

OUTER JOINs are further classified as

LEFT OUTER JOINS,
RIGHT OUTER JOINS and

FULL OUTER JOINS.

For more information see pages from books online titled: “Join Fundamentals” and “Using Joins“.

What is OSQL utility ?

OSQL is command line tool which is used execute query and display the result same a query analyzer but everything is in command prompt.

What Is Difference Between OSQL And Query Analyzer ?

OSQL is command line tool which executes query and display the result same a query analyzer but query analyzer is graphical and OSQL is a command line tool. OSQL is quite useful for batch processing or executing remote queries.

What Is Cascade delete / update  ?

CASCADE allows deletions or updates of key values to cascade through the tables defined to have foreign key relationships that can be traced back to the table on which the modification is performed.

What is a self join? Explain it with an example.

Self join is just like any other join, except that two instances of the same table will be joined in the query. Here is an example: Employees table which contains rows for normal employees as well as managers. So, to find out the managers of all the employees, you need a self join.

CREATE TABLE emp
(

empid int,

mgrid int,

empname char(10)

)

INSERT emp SELECT 1,2,’Vyas’
INSERT emp SELECT 2,3,’Mohan’

INSERT emp SELECT 3,NULL,’Shobha’

INSERT emp SELECT 4,2,’Shridhar’

INSERT emp SELECT 5,2,’Sourabh’

SELECT t1.empname [Employee], t2.empname [Manager]
FROM emp t1, emp t2

WHERE t1.mgrid = t2.empid

Here is an advanced query using a LEFT OUTER JOIN that even returns the employees without managers (super bosses)

SELECT t1.empname [Employee], COALESCE(t2.empname, ‘No manager’) [Manager]
FROM emp t1

LEFT OUTER JOIN

emp t2

ON

t1.mgrid = t2.empid

What are some of the join algorithms used when SQL Server joins tables.

  1. Loop Join (indexed keys unordered)
  2. Merge Join (indexed keys ordered)
  3. Hash Join (non-indexed keys)

 

What is maximum number of tables that can joins in a single query ?

256, check SQL Server Limits

What is Magic Tables in SQL Server ?

The MAGIC tables are automatically created and dropped, in case you use TRIGGERS. SQL Server has two magic tables named, INSERTED and DELETED

These are mantained by SQL server for there Internal processing. When we use update insert or delete on tables these magic tables are used.These are not physical tables but are Internal tables.When ever we use insert statement is fired the Inserted table is populated with newly inserted Row and when ever delete statement is fired the Deleted table is populated with the delete

d row.But in case of update statement is fired both Inserted and Deleted table used for records the Original row before updation get store in Deleted table and new row Updated get store in Inserted table.

Can we disable a triger?, if yes HOW ?

YES, we can disable a single trigger on the database by using  “DISABLE TRIGGER triggerName ON <<TableName>>”

we also have an option to disable all the trigger by using, “DISABLE Trigger ALL ON ALL SERVER”

Why you need indexing? where that is Stored and what you mean by schema object? For what purpose we are using view?

We can’t create an Index on Index.. Index is stoed in user_index table. Every object that has been created on Schema is Schema Object like Table, View etc. If we want to share the particular data to various users we have to use the virtual table for the Base table. So that is a view.


Indexing is used for faster search or to retrieve data faster from various table. Schema containing set of tables, basically schema means logical separation of the database. View is crated for faster retrieval of data. It’s customized virtual table. we can create a single view of multiple tables. Only the drawback is..view needs to be get refreshed for retrieving updated data.

What the difference between UNION and UNIONALL?

Union will remove the duplicate rows from the result set while Union all does’nt.

Which system table contains information on constraints on all the tables created ?

USER_CONSTRAINTS,

system table contains information on constraints on all the tables created

What are different Types of Join?

  1. Cross Join A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table. The common example is when company wants to combine each product with a pricing table to analyze each product at each price.
  2. Inner Join A join that displays only the rows that have a match in both joined tables is known as inner Join. This is the default type of join in the Query and View Designer.
  3. Outer JoinA join that includes rows even if they do not have related rows in the joined table is an Outer Join. You can create three different outer join to specify the unmatched rows to be included:
    1. Left Outer Join: In Left Outer Join all rows in the first-named table i.e. “left” table, which appears leftmost in the JOIN clause are included. Unmatched rows in the right table do not appear.
    2. Right Outer Join: In Right Outer Join all rows in the second-named table i.e. “right” table, which appears rightmost in the JOIN clause are included. Unmatched rows in the left table are not included.
    3. Full Outer Join: In Full Outer Join all rows in all joined tables are included, whether they are matched or not.
  4. Self Join This is a particular case when one table joins to itself, with one or two aliases to avoid confusion. A self join can be of any type, as long as the joined tables are the same. A self join is rather unique in that it involves a relationship with only one table. The common example is when company has a hierarchal reporting structure whereby one member of staff reports to another. Self Join can be Outer Join or Inner Join.

What is Data-Warehousing?

  1. Subject-oriented, meaning that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
  2. Time-variant, meaning that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
  3. Non-volatile, meaning that data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting.
  4. Integrated, meaning that the database contains data from most or all of an organization’s operational applications, and that this data is made consistent.

What is a live lock?

A live lock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A live lock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.

How SQL Server executes a statement with nested subqueries?

When SQL Server executes a statement with nested subqueries, it always executes the innermost query first. This query passes its results to the next query and so on until it reaches the outermost query. It is the outermost query that returns a result set.

How do you add a column to a existing table?

ALTER TABLE Department ADD (AGE, NUMBER);

Can one drop a column from a table?

YES, to delete a column in a table, use  ALTER TABLE table_name DROP COLUMN column_name

Which statement do you use to eliminate padded spaces between the month and day values in a function TO_CHAR(SYSDATE,’Month, DD, YYYY’) ?

To remove padded spaces, you use the “fm” prefix before the date element that contains the spaces. TO_CHAR(SYSDATE,’fmMonth DD, YYYY’)

Which operator do you use to return all of the rows from one query except rows are returned in a second query?

You use the EXCEPT operator to return all rows from one query except where duplicate rows are found in a second query. The UNION operator returns all rows from both queries minus duplicates. The UNION ALL operator returns all rows from both queries including duplicates. The INTERSECT operator returns only those rows that exist in both queries. Check my previous post, “How to use EXCEPT Operator”  for learn more.

How you will create a column alias?

The AS keyword is optional when specifying a column alias.

In what sequence SQL statement are processed?

The clauses of the subselect are processed in the following sequence (DB2): 1. FROM clause 2. WHERE clause 3. GROUP BY clause 4. HAVING clause 5. SELECT clause 6. ORDER BY clause 7. FETCH FIRST clause

How can we determine what objects a user-defined function depends upon?

sp_depends system stored procedure or query the sysdepends system table to return a list of objects that a user-defined function depends upon

SELECT DISTINCT so1.name, so2.name FROM sysobjects so1
INNER JOIN sysdepends sd
ON so1.id = sd.id
INNER JOIN sysobjects so2
ON so2.id = sd.depid
WHERE so1.name = '<<OBJECT_NAME>>'

What is lock escalation ?

A query first takes the lowest level lock possible with the smallest footprint (row-level). When too many rows are locked (requiring too much RAM) the lock is escalated to a range or page lock. If too many pages  are locked, it may escalate to a table lock.

What are the main differences between #temp tables and @table variables and which one is preferred ?

  1. SQL Server can create column statistics on #temp tables
  2. Indexes can be created on #temp tables
  3. @table variables are stored in memory up to a certain threshold.

For many other question and more information on temporary table and variable, check my previous post.

What are Checkpoint In SQL Server ?

When we done operation on SQL SERVER that is not commited directly to the database.All operation must be logged in to Transaction Log files after that they should be done on to the main database.CheckPoint are the point which alert Sql Server to save all the data to main database if no check point is there then log files get full we can use Checkpoint command to commit all data in the SQL SERVER.When we stop the SQL Server it will take long time because Checkpoint is also fired.

Why we use OPENXML clause?

OPENXML parses the XML data in SQL Server in an efficient manner. It’s primary ability is to insert XML data to the DB.

Can we store we store PDF files inside SQL Server table ?

YES, we can store this sort of data using a blob datatype.

Can we store Videos inside SQL Server table ?

YES, we can store Videos inside SQL Server by using FILESTREAM datatype, which was introduced in SQL Server 2008.

Can we hide the definition of a stored procedure from a user ?

YES, while creating stored procedure we can use WITH ENCRYPTION which will convert the original text of the CREATE PROCEDURE statement to an encrypted format.

What are included columns when we talk about SQL Server indexing?

Indexed with included columns were developed in SQL Server 2005 that assists in covering queries. Indexes with Included Columns are non clustered indexes that
have the following benefits:

  • Columns defined in the include statement, called non-key columns, are not counted in the
    number of columns by the Database Engine.
  • Columns that previously could not be used in queries, like nvarchar(max), can be included
    as a non-key column.
  • A maximum of 1023 additional columns can be used as non-key columns.

What is an execution plan? How would you view the execution plan?

An execution plan is basically a road map that graphically or textually shows the data retrieval methods chosen by the SQL Server query optimizer for a stored procedure or ad-hoc query and is a very useful tool for a developer to understand the performance characteristics of a query or stored procedure since  the plan is the one that SQL Server will place in its cache and use to execute the stored procedure or query. From within Query Analyzer is an option called “Show Execution Plan” (located on the Query  drop-down menu). If this option is turned on it will display query execution plan in separate window  when query is ran again.

Explain UNION, MINUS, UNION ALL, INTERSECT ?

INTERSECT returns all distinct rows selected by both queries.
MINUS – returns all distinct rows selected by the first query but not by the second.

UNION – returns all distinct rows selected by either query

UNION  ALL  –  returns  all  rows  selected  by  either query, including all duplicates

What is ROWID ?

 

SQL Queries Interview Questions for Fresher’s (Starters)

How to find second highest value of a column in a table?

-- Option 1
SELECT max(SAL) FROM EMP  WHERE SAL < ( SELECT max(SAL) FROM EMP);

-- Option 2
SELECT max( SAL) FROM EMP WHERE SAL NOT IN( SELECT max(SAL) FROM EMP ); 

-- Option 3
SELECT max(value1) FROM ( SELECT value1 FROM val EXCEPT SELECT max(value1) FROM val );

-- Option 4
SELECT * FROM EMP
WHERE SAL IN (SELECT MAX(SAL) FROM EMP
WHERE SAL <> (SELECT MAX(SAL) FROM EMP))

What is the difference between DDL and DML commands?  Can you name some examples of each?

DDL Statements are data definition language commands.  Examples are CREATE,

ALTER and DROP. where as DML Statements are data manipulation language commands.  Examples are INSERT, UPDATE and DELETE.

What are the advantages of using Views ?

  • Views restrict access to the data because the view can display selective columns from  the table.
  • Views provide data independence for ad hoc users and application programs. One view can be used to retrieve data from several tables.
  • Views provide groups of users access to data according to their particular criteria.
  • Views provides an  additional level of table security, by restricting access to a predetermined set of rows and columns of a table.
  • Hide data complexity.
  • Simplify commands for the user.
  • Present the data in a different perpecetive from that of the base table.
  • Store complex queries.

What is query optimization?

Query optimization is the part of the query process in which the database system compares different query strategies and chooses the one with the least expected cost

What are the main components of Database management systems software.

The database management system software includes components for storage management, concurrency control, transaction processing, database manipulation interface, database definition interface, and database control interface.

What is transaction?

A transaction is a collection of applications code and database manipulation code bound into an indivisible unit of execution. it consists from: BEGIN-TRANSACTION Name Code END TRANSACTION Name

What is schema?

A schema is collection of database objects of a Use.

How to find structure of an existing table ?

What is difference between alias and synonym ?

Alias is a temporary used with in a query but Synonyms are permanent in a database.

Are Views automatically updated, when we insert / update the base table ?

YES, Views always displays the updated data and there is no need to manually update them. Views doesn’t store any data they refer base table for data.

What is the difference between a “where” clause and a “having” clause?

“Where” is a kind of restiriction statement. You use where clause to restrict all the data from DB.Where clause is using before result retrieving. But Having clause is using after retrieving the data.Having clause is a kind of filtering command.

Can you name some other alternatives to SQL Server ?

Informix, Oracle, DB2, MySQL

What is the basic form of a SQL statement to read data out of a table?

The basic form to read data out of table is ‘SELECT * FROM table_name; ‘ An answer: ‘SELECT * FROM table_name WHERE xyz= ‘whatever’;’ cannot be called basic form because of WHERE clause.

What structure can you implement for the database to speed up table reads?

Follow the rules of DB tuning we have to: 1] properly use indexes ( different types of indexes) 2] properly locate different DB objects across different tablespaces, files and so on.3] create a special space (tablespace) to locate some of the data with special datatype ( for example CLOB, LOB and …)

What are the tradeoffs with having indexes?

1. Faster selects, slower updates. 2. Extra storage space to store indexes. Updates are slower because in addition to updating the table you have to update the index.

What is a “join”?

Join used to connect two or more tables logically with or without common field.

What is normalization? “Denormalization”? Why do we sometimes want to denormalize?

Normalizing data means eliminating redundant information from a table and organizing the data so that future changes to the table are easier. Denormalization means allowing redundancy in a table. The main benefit of denormalization is improved performance with simplified data retrieval and manipulation. This is done by reduction in the number of joins needed for data processing.

What is a “constraint”?

A constraint allows you to apply simple referential integrity checks to a table. There are four primary types of constraints that are currently supported by SQL Server: PRIMARY/UNIQUE – enforces uniqueness of a particular table column. DEFAULT – specifies a default value for a column in case an insert operation does not provide one. FOREIGN KEY – validates that every value in a column exists in a column of another table. CHECK – checks that every value stored in a column is in some specified list. Each type of constraint performs a specific type of action. Default is not a constraint. NOT NULL is one more constraint which does not allow values in the specific column to be null. And also it the only constraint which is not a table level constraint.

What types of index data structures can you have?

An index helps to faster search values in tables. The three most commonly used index-types are: – B-Tree: builds a tree of possible values with a list of row IDs that have the leaf value. Needs a lot of space and is the default index type for most databases. – Bitmap: string of bits for each possible value of the column. Each bit string has one bit for each row. Needs only few space and is very fast.(however, domain of value cannot be large, e.g. SEX(m,f); degree(BS,MS,PHD) – Hash: A hashing algorithm is used to assign a set of characters to represent a text string such as a composite of keys or partial keys, and compresses the underlying data. Takes longer to build and is supported by relatively few databases.

What is a “primary key”?

A PRIMARY INDEX or PRIMARY KEY is something which comes mainly from
database theory. From its behavior is almost the same as an UNIQUE INDEX, i.e. there may only be one of each value in this column. If you call such an INDEX PRIMARY instead of UNIQUE, you say something about

your table design, which I am not able to explain in few words. Primary Key is a type of a constraint enforcing uniqueness and data integrity for each row of a table. All columns participating in a primary key constraint must possess the NOT NULL property.

What is a “functional dependency”? How does it relate to database table design?

Functional dependency relates to how one object depends upon the other in the database. for example, procedure/function sp2 may be called by procedure sp1. Then we say that sp1 has functional dependency on sp2.

What is a “trigger”?

Triggers are stored procedures created in order to enforce integrity rules in a database. A trigger is executed every time a data-modification operation occurs (i.e., insert, update or delete). Triggers are executed automatically on occurance of one of the data-modification operations. A trigger is a database object directly associated with a particular table. It fires whenever a specific statement/type of statement is issued against that table. The types of statements are insert,update,delete and query statements. Basically, trigger is a set of SQL statements A trigger is a solution to the restrictions of a constraint. For instance: 1.A database column cannot carry PSEUDO columns as criteria where a trigger can. 2. A database constraint cannot refer old and new values for a row where a trigger can.

What is “index covering” of a query?

Index covering means that “Data can be found only using indexes, without touching the tables”

What types of join algorithms can you have?

What is a SQL view?

An output of a query can be stored as a view. View acts like small table which meets our criterion. View is a precomplied SQL query which is used to select data from one or more tables. A view is like a table but it doesn’t physically take any space. View is a good way to present data in a particular format if you use that query quite often. View can also be used to restrict users from accessing the tables directly.

How do you implement one-to-one, one-to-many and many-to-many relationships while designing tables?

One-to-One relationship can be implemented as a single table and rarely as two tables with primary and foreign key relationships. One-to-Many relationships are implemented by splitting the data into two tables with primary key and foreign key relationships. Many-to-Many relationships are implemented using a junction table with the keys from both the tables forming the composite primary key of the junction table. It will be a good idea to read up a database designing fundamentals text book.

What’s the difference between a primary key and a unique key?

Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a non-clustered index by default. Another major difference is that, primary key does not allow NULLs, but unique key allows one NULL only.

What is bit data type and what’s the information that can be stored inside a bit column?

Bit data type is used to store Boolean information like 1 or 0 (true or false). Until SQL Server 6.5 bit data type could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit data type can represent a third state, which is NULL

Define candidate key, alternate key, composite key.

A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.

A key formed by combining at least two or more columns is called composite key.

What are defaults? Is there a column to which a default cannot be bound?

A default is a value that will be used by a column, if no value is supplied to that column while inserting data.IDENTITY columns and timestamp columns can’t have defaults bound to them. See CREATE DEFAULT in books online.

What is a transaction and what are ACID properties?

A transaction is a logical unit of work in which, all the steps must be performed or none. ACID stands for Atomicity, Consistency, Isolation, Durability. These are the properties of a transaction. For more information and explanation of these properties, see SQL Server books online or any RDBMS fundamentals text book.

Explain different isolation levels

An isolation level determines the degree of isolation of data between concurrent transactions. The default SQL Server isolation level is Read Committed. Here are the other isolation levels (in the ascending order of isolation): Read Uncommitted, Read Committed, Repeatable Read, Serializable. See SQL Server books online for an explanation of the isolation levels. Be sure to read about SET TRANSACTION ISOLATION LEVEL, which lets you customize the isolation level at the connection level.

CREATE INDEX myIndex ON myTable (myColumn)

What type of Index will get created after executing the above statement?

Non-clustered index. Important thing to note: By default a clustered index gets created on the primary key, unless specified otherwise.

What is the maximum size of a row?

8060 bytes. Do not be surprised with questions like ‘What is the maximum number of columns per table’. Check out SQL Server books online for the page titled: “Maximum Capacity Specifications”.

What are constraints? Explain different types of constraints.

Constraints enable the RDBMS enforce the integrity of the database automatically, without needing you to create triggers, rule or defaults.

Types of constraints: NOT NULL, CHECK, UNIQUE, PRIMARY KEY, FOREIGN KEY

For an explanation of these constraints see books online for the pages titled: “Constraints” and “CREATE TABLE”, “ALTER TABLE”

What is an index? What are the types of indexes? How many clustered indexes can be created on a table? I create a separate index on each column of a table. what are the advantages and disadvantages of this approach?

Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.

  • Indexes are of two types. Clustered indexes and non-clustered indexes. When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it’s row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same time, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.

    What are triggers? How many triggers you can have on a table? How to invoke a trigger on demand?

    Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETEoperation takes place on a table.

    In SQL Server 6.5 you could define only 3 triggers per table, one for INSERT, one for UPDATE and one for DELETE. From SQL Server 7.0 onwards, this restriction is gone, and you could create multiple triggers per each action. But in 7.0 there’s no way to control the order in which the triggers fire. In SQL Server 2000 you could specify which trigger fires first or fires last using sp_settriggerorder

    Triggers cannot be invoked on demand. They get triggered only when an associated action (INSERT, UPDATE, DELETE) happens on the table on which they are defined.

    Triggers are generally used to implement business rules, auditing. Triggers can also be used to extend the referential integrity checks, but wherever possible, use constraints for this purpose, instead of triggers, as constraints are much faster.

    Till SQL Server 7.0, triggers fire only after the data modification operation happens. So in a way, they are called post triggers. But in SQL Server 2000 you could create pre triggers also. Search SQL Server 2000 books online forINSTEAD OF triggers.

    Also check out books online for ‘inserted table’, ‘deleted table’ and COLUMNS_UPDATED()

    There is a trigger defined for INSERT operations on a table, in an OLTP system. The trigger is written to instantiate a COM object and pass the newly inserted rows to it for some custom processing.

    What is the difference between oracle, sql and sql server ?

    • Oracle is based on RDBMS.
    • SQL is Structured Query Language.
    • SQL Server is another tool for RDBMS provided by MicroSoft.

    Difference between Stored Procedure and Trigger?

    • we can call stored procedure explicitly.
    • but trigger is automatically invoked when the action defined in trigger is done.
      ex: create trigger after Insert on
    • this trigger invoked after we insert something on that table.
    • Stored procedure can’t be inactive but trigger can be Inactive.
    • Triggers are used to initiate a particular activity after fulfilling certain condition.It need to define and can be enable and disable according to need.

    What is the advantage to use trigger in your PL?

    A trigger is a database object directly associated with a particular table. It fires whenever a specific statement/type of statement is issued against that table. The types of statements are insert,update,delete and query statements. Basically, trigger is a set of SQL statements A trigger is a solution to the restrictions of a constraint. For instance: 1.A database column cannot carry PSEUDO columns as criteria where a trigger can. 2. A database constraint cannot refer old and new values for a row where a trigger can.

    Triggers are fired implicitly on the tables/views on which they are created. There are various advantages of using a trigger. Some of them are:

    • Suppose we need to validate a DML statement(insert/Update/Delete) that modifies a table then we can write a trigger on the table that gets fired implicitly whenever DML statement is executed on that table.
    • Another reason of using triggers can be for automatic updation of one or more tables whenever a DML/DDL statement is executed for the table on which the trigger is created.
    • Triggers can be used to enforce constraints. For eg : Any insert/update/ Delete statements should not be allowed on a particular table after office hours. For enforcing this constraint Triggers should be used.
    • Triggers can be used to publish information about database events to subscribers. Database event can be a system event like Database startup or shutdown or it can be a user even like User loggin in or user logoff.

    What is a SQL view?

    An output of a query can be stored as a view. View acts like small table which meets our criterion. View is a pre-complied SQL query which is used to select data from one or more tables. A view is like a table but it doesn’t physically take any space. View is a good way to present data in a particular format if you use that query quite often. View can also be used to restrict users from accessing the tables directly.

    What is GROUP BY?

    The GROUP BY keywords has been added to SQL because aggregate functions (like SUM) return the aggregate of all column values every time they are called. Without the GROUP BY functionality, finding the sum for each individual group of column values was not possible.

    What are defaults? Is there a column to which a default can’t be bound?

    A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can’t have defaults bound to them.

    What does COMMIT do?

    Saving all changes made by DML statements

    List all the possible values that can be stored in a BOOLEAN data field.

    There are only two values that can be stored in a BOOLEAN data field: -1(true) and 0(false).

    What is the highest value that can be stored in a BYTE data field?

    The highest value that can be stored in a BYTE field is 255. or from -128 to 127.

    Explain SQL SELECT example:

     

    select j.FILE_NUM
      from DB_name.job j, DB_name.address a
    where j.JOB_TYPE ='C'
      AND j.COMPANY_NAME = 'TEST6'
      AND j.OFFICE_ID = '101'
      AND j.ACTIVE_IND = 'Y'
      AND a.ADDRESS_STATUS_ID = 'H'
      AND a.OFFICE_ID = '101'
      AND a.FILE_NUM = j.FILE_NUM order by j.FILE_NUM;

    J and A are aliases for table names. this is outer join select statement from two tables.

    Describe some Group Functions that you know

    • The COUNT function tells you how many rows were in the result set. SELECT COUNT(*) FROM Employees
    • The AVG function tells you the average value of a numeric column. SELECT MAX(SALARY) FROM Employees
    • The MAX and MIN functions tell you the maximum and minimum value of a numeric column. SELECT MIN(SALARY) FROM Employees
    • The SUM function tells you the sum value of a numeric column. SELECT SUM(SALARY) FROM Employees

    What does DML stand for?  DML is Data Manipulation Language statements. (SELECT)

    What does DDL stand for?

    DDL is Data Definition Language statements. (CREATE)

    What does DCL stand for?

    DCL is Data Control Language statements. (COMMIT)

    What is SQL comments and how to inser comments in SQL statements?

    SQL comments are introduced by two consecutive hyphens (–) and ended by the end of the line.

    What is the difference between a return parameter and an OUTPUT parameter?

    A return parameter is always returned by a stored procedure,and it is meant to indicate the success or failure of the stored procedure. The return parameter is always an INT data type.

    An OUTPUT parameter is designated specifically by the developer, and it can return other types of data, such as characters and numeric values.

    You can use multiple OUTPUT parameters in a stored procedure,whereas you can only use one return parameter.

    What is the minimum number of tables required to perform a SQL join?

    One, you can join a table as a self join too, consider an example of employee table which has a column EMPID, NAME and Mgr ID and we need to display EMPID, name of the employee and MGRID of that employee,

    How can present Summarizing Data in SQL Server?

    CUBE or ROLLUP operators to generate summary reports. Both are part of the GROUP BY Clause

    What is Service Broker

    Its a message queuing technology in SQL to helps developer to develop fully ditributed applications.Its helps to send asynchronous, transactional message.Its also helps to send message to another database.

    What is SQL Profiler
    SQL Profiler is a graphical tool thats helps administrator to capture events in instance of Microsoft Sql Server. We can get all the events that done on file or on SQL Table.We can filter the events that we need for us.We can also get the subset of event that we need.

    Why not to use prefix sp in store procedure
    Thses prefix is used by master database so SQL server first searches in the master database and then in the current session database. So its time taken is much higher master database causes extra overhead and also get wrong result in case of same name in master database.

    What is the advantage of SET NOCOUNT ON
    When we use SELECT and DML statement in SQL .SQL server return a message which specify the number of rows effected by these statements. This information helps coder when they are debugging the code other wise this is not useful we can disable this by typing SET NOCOUNT ON. It is very helpful when we are doing on store procedure contains lots of statements,loops its also increase in performance and boost network traffic.

    Why to use SQL Sequence and its drawbacks
    In SQL Sequences are used for creating sequence numbers without any overhead of locking but one drawback is that when any of transaction is rolled back the sequence number is lost.

    What are the different ways of moving data from database
    There are different methods of moving data

    • BACKUP and RESTORE
    • detach and attach
    • Attaching databases
    • Replication
    • DTS
    • BCP
    • logshipping
    • INSERT…SELECT
    • SELECT…INTO
    • SQL Server 2012 HADR
    • creating INSERT scripts to generate data.

    How to get top two records without Top keyword
    set rowcount 2

    select column,column1 from tblEmployeeMaster

    What do you mean by KEYSET Cursor

    KEYSET Cursor uses the set of keys that are primary key or we can saw uniquely identify the cursor’s rows. SQL Server uses a table in tempdb to store keyset. The KEYSET cursor helps to updates non key values from being made through this cursor, but when inserts made by other users are not visible. Updates nonkey values made by other users are visible as the owner scrolls around the cursor, but updates key values made by other users are not visible.

    Difference between Set and Select

    • Set is a ANSI standard for variable assignment.
    • Select is a Non-ANSI standard when assigning variables.
    • Set  – We can assign only one variable at a time
    • Select – We can assign multiple variable at a time

    When assigning from a query that returns more than one value, SET will fail with an error.
    When assigning from a query that returns more than one value, SELECT will assign the last value returned by the query and hide the fact that the query returned

    What is Network Packet Size in SQL

    NPS(Network Packet Size) is the size of the TDS (tabular data stream) packets used to communicate between your applications and your relational database engine and default packet size is 4 kilobytes and its manily depends on network packet size configuration option.

    What are Sparse Columns in Sql Server2008

    Sparse column is a tool that helps to reduce amount of physical storage used in a database. These are ordinary columns that have an optimized storage for all null values.SPARSE column are better at managing NULL and ZERO values in SQL Server. It does not take any space in database at all.

    Can we create non primary key as cluster index
    Yes we can do this on non-primary key column but that column must be unique and the primary key column of that table must have non-clustered index because there is one cluster index in table. By default primary key column contains clustered index so its recommended to create such non-primary key clustered index column first and then should create primary key column so in such case the primary key on that column will be with non-clustered. But its highly recommended to create primary key column as a clustered indexed column.

    Could you please give some Optimization Tips in writing SQL Queries ?

    • Always try to use views and stored procedures instead of doing work with heavy queries.
    • Make a habit to use constraints instead of triggers whenever it is possible.
    • When you need n number of row from database try to use top keyword or SET ROWCOUNT statement
    • Always use table variables in place of temporary tables.
    • Avoid Union and try to use UNION ALL statement.
    • Always avoid using the DISTINCT clause, whenever possible.
    • Always try to avoid using SQL Server cursors.
    • Always try to avoid the HAVING clause.
    • Do not use select count(*) to get number of rows
    • Try to include SET NOCOUNT ON statement into your stored procedures to stop the message indicating the number of rows affected by a T-SQL statement.
    • Always use file system to store large binary objects and use the file path in database.
    • Sometimes we may have to apply more than one sub queries in our main query. Try to minimize the number of sub query block in your query.
    • Try to use column name instead of *

    Why Group BY and Order By clause are so expensive
    These both of these requires Temporary table to process the result of query so these are expensive

    What is the default value of int datatype

    The default value of all datatype is NULL

    Is it possible to create foreign key without primary key

    Yes we can do this by the help of Unique Key constraint . Means table must have atleast Primary key or Unique key.

    Does SQL Server supports Merge statement ?

    YES, This is newly introduced feature in SQL Server 2008

    What is Trace frag in SQL

    The Trace Tags is used to set temporary setting of specific server characteristics. DBCC TRACEON is the command to set the trace flags. Once activated, trace flag will be in effect until the server is restarted. Trace frags are frequently used for diagnosing performance issues.For example, the trace flag 3205 is used for disabling hard compression for tape drives, when an instance of SQL Server starts.

    What is Pivot and Unpivot

    We can sort, count, and total the data stored in one table or spreadsheet and create a second table displaying the summarized data with the Pivot tables. The PIVOT operator turns the values of a specified column into column names, effectively rotating a table.
    UNPIVOT table is reverse of PIVOT Table.

    Can we call a Trigger in store procedure

    A Trigger is also a special kind of Stored Procedure which will fire automatically on the happening of an event like before or after insert, update or delete. We cannot call a trigger explicitly from a Stored Procedure.

    Why we use SET ROWCOUNT in Sql

    This syntax is used in SQL Server to stop processing the query after the specified number of rows are returned.

    Why we use Unicode In Sql server
    Unicode data is stored using the nchar, nvarchar,and ntext data types in SQL Server. Use these data types for columns that store characters from more than one character set. The SQL Server Unicode data types are based on the National Character data types in the SQL-92 standard.

    What is SQL Cache Dependency in ASP.NET 2.0

    SQL cache dependencies is new technique in ASP.NET 2.0 which can automatically invalidate a cached data object just like a Dataset. when the related data is modified in the database. So for instance if you have a dataset which is tied up to a database tables any changes in the database table will invalidate the cached data object which can be a dataset or a data source.To enable this we need a syntax that is as follows:- aspnet_regsql -ed -E -d Northwind

    from:http://www.sqlserver-training.com/250-sql-queries-interview-question-answers/

Introduction to IIS Architecture

Introduction

Internet Information Services (IIS) 7 and above provides a request-processing architecture that includes:

  • The Windows Process Activation Service (WAS), which enables sites to use protocols other than HTTP and HTTPS.
  • A Web server engine that can be customized by adding or removing modules.
  • Integrated request-processing pipelines from IIS and ASP.NET.

This article describes the components, modules, and request-processing architecture in the following sections:

Components in IIS

IIS contains several components that perform important functions for the application and Web server roles in Windows Server® 2008 (IIS 7.0) and Windows Server 2008 R2 (IIS 7.5). Each component has responsibilities, such as listening for requests made to the server, managing processes, and reading configuration files. These components include protocol listeners, such as HTTP.sys, and services, such as World Wide Web Publishing Service (WWW service) and Windows Process Activation Service (WAS).

Protocol Listeners

Protocol listeners receive protocol-specific requests, send them to IIS for processing, and then return responses to requestors. For example, when a client browser requests a Web page from the Internet, the HTTP listener, HTTP.sys, picks up the request and sends it to IIS for processing. Once IIS processes the request, HTTP.sys returns a response to the client browser.

By default, IIS provides HTTP.sys as the protocol listener that listens for HTTP and HTTPS requests. HTTP.sys was introduced in IIS 6.0 as an HTTP-specific protocol listener for HTTP requests. HTTP.sys remains the HTTP listener in IIS 7 and above, but includes support for Secure Sockets Layer (SSL).

To support services and applications that use protocols other than HTTP and HTTPS, you can use technologies such as Windows Communication Foundation (WCF). WCF has listener adapters that provide the functionality of both a protocol listener and a listener adapter. Listener adapters are covered later in this document. For more information about WCF, see Windows Communication Foundation on MSDN.

Hypertext Transfer Protocol Stack (HTTP.sys)

The HTTP listener is part of the networking subsystem of Windows operating systems, and it is implemented as a kernel-mode device driver called the HTTP protocol stack (HTTP.sys). HTTP.sys listens for HTTP requests from the network, passes the requests onto IIS for processing, and then returns processed responses to client browsers.

In IIS 6.0, HTTP.sys replaced Windows Sockets API (Winsock), which was a user-mode component used by previous versions of IIS to receive HTTP requests and send HTTP responses. IIS 7 and above continue to rely on HTTP.sys for HTTP requests.

HTTP.sys provides the following benefits:

  • Kernel-mode caching. Requests for cached responses are served without switching to user mode.
  • Kernel-mode request queuing. Requests cause less overhead in context switching because the kernel forwards requests directly to the correct worker process. If no worker process is available to accept a request, the kernel-mode request queue holds the request until a worker process picks it up.
  • Request pre-processing and security filtering.

World Wide Web Publishing Service (WWW service)

In IIS 7 and above, functionality that was previously handled by the World Wide Web Publishing Service (WWW Service) alone is now split between two services: WWW Service and a new service, Windows Process Activation Service (WAS). These two services run as LocalSystem in the same Svchost.exe process, and share the same binaries.

Note   You may also see the WWW Service referred to as W3SVC in documentation.

How WWW Service works in IIS 6.0

In IIS 6.0, WWW Service manages the following main areas in IIS:

  • HTTP administration and configuration
  • Process management
  • Performance monitoring

HTTP Administration and Configuration

The WWW Service reads configuration information from the IIS metabase and uses that information to configure and update the HTTP listener, HTTP.sys. In addition, WWW service starts, stops, monitors, and manages worker processes that process HTTP requests.

Performance Monitoring

The WWW Service monitors performance and provides performance counters for Web sites and for the IIS cache.

Process Management

The WWW Service manages application pools and worker processes, such as starting, stopping, and recycling worker processes. Additionally, the WWW Service monitors the health of the worker processes, and invokes rapid fail detection to stop new processes from starting when several worker processes fail in a configurable amount of time.

How the WWW Service works in IIS

In IIS, the WWW service no longer manages worker processes. Instead, the WWW Service is the listener adapter for the HTTP listener, HTTP.sys. As the listener adapter, the WWW Service is primarily responsible for configuring HTTP.sys, updating HTTP.sys when configuration changes, and notifying WAS when a request enters the request queue.

Additionally, the WWW Service continues to collect the counters for Web sites. Because performance counters remain part of the WWW Service, they are HTTP specific and do not apply to WAS.

Windows Process Activation Service (WAS)

In IIS 7 and above, Windows Process Activation Service (WAS) manages application pool configuration and worker processes instead of the WWW Service. This enables you to use the same configuration and process model for HTTP and non-HTTP sites.

Additionally, you can run WAS without the WWW Service if you do not need HTTP functionality. For example, you can manage a Web service through a WCF listener adapter, such as NetTcpActivator, without running the WWW Service if you do not need to listen for HTTP requests in HTTP.sys. For information about WCF listener adapters and about how to host WCF applications in IIS 7 and above by using WAS, see Hosting in WCF on MSDN.

Configuration Management in WAS

On startup, WAS reads certain information from the ApplicationHost.config file, and passes that information to listener adapters on the server. Listener adapters are components that establish communication between WAS and protocol listeners, such as HTTP.sys. Once listener adapters receive configuration information, they configure their related protocol listeners and prepare the listeners to listen for requests.

In the case of WCF, a listener adapter includes the functionality of a protocol listener. So, a WCF listener adapter, such as NetTcpActivator, is configured based on information from WAS. Once NetTcpActivator is configured, it listens for requests that use the net.tcp protocol. For more information about WCF listener adapters, see WAS Activation Architecture on MSDN.

The following list describes the type of information that WAS reads from configuration:

  • Global configuration information
  • Protocol configuration information for both HTTP and non-HTTP protocols
  • Application pool configuration, such as the process account information
  • Site configuration, such as bindings and applications
  • Application configuration, such as the enabled protocols and the application pools to which the applications belong

If ApplicationHost.config changes, WAS receives a notification and updates the listener adapters with the new information.

Process Management

WAS manages application pools and worker processes for both HTTP and non-HTTP requests. When a protocol listener picks up a client request, WAS determines if a worker process is running or not. If an application pool already has a worker process that is servicing requests, the listener adapter passes the request onto the worker process for processing. If there is no worker process in the application pool, WAS will start a worker process so that the listener adapter can pass the request to it for processing.

Note: Because WAS manages processes for both HTTP and non-HTTP protocols, you can run applications with different protocols in the same application pool. For example, you can develop an application, such as an XML service, and host it over both HTTP and net.tcp.

 

Modules in IIS

IIS provides a new architecture that is different from previous versions of IIS. Instead of keeping the majority of functionality within the server itself, IIS include a Web server engine in which you can add or remove components, called modules, depending on your needs.

Modules are individual features that the server uses to process requests. For example, IIS uses authentication modules to authenticate client credentials, and cache modules to manage cache activity.

The new architecture provides the following advantages over previous versions of IIS:

  • You can control which modules you want on the server.
  • You can customize a server to a specific role in your environment.
  • You can use custom modules to replace existing modules or to introduce new features.

The new architecture also improves security and simplifies administration. By removing unnecessary modules, you reduce the server’s attack surface and memory footprint, which is the amount of memory that server worker processes use on the machine. You also eliminate the need to manage features that are unnecessary for your sites and applications.

Native Modules

The following sections describe the native modules that are available with a full installation of IIS 7 and above. You can remove them or replace them with custom modules, depending on your needs.

HTTP Modules

Several modules in IIS 7 and above perform tasks specific to Hypertext Transfer Protocol (HTTP) in the request-processing pipeline. HTTP modules include modules to respond to information and inquiries sent in client headers, to return HTTP errors, to redirect requests, and more.

Module Name Description Resource
CustomErrorModule Sends default and configured HTTP error messages when an error status code is set on a response. Inetsrv\Custerr.dll
HttpRedirectionModule Supports configurable redirection for HTTP requests. Inetsrv\Redirect.dll
ProtocolSupportModule Performs protocol-related actions, such as setting response headers and redirecting headers based on configuration. Inetsrv\Protsup.dll
RequestFilteringModule Added in IIS 7.5. Filters requests as configured to control protocol and content behavior. Inetsrv\modrqflt.dll
WebDAVModule Added in IIS 7.5. Allows more secure publishing of content by using HTTP over SSL. Inetsrv\WebDAV.dll

Security Modules

Several modules in IIS perform tasks related to security in the request-processing pipeline. In addition, there are separate modules for each of the authentication schemes, which enable you to select modules for the types of authentication you want on your server. There are also modules that perform URL authorization, and a module that filters requests.

Module Name Description Resource
AnonymousAuthenticationModule Performs Anonymous authentication when no other authentication method succeeds. Inetsrv\Authanon.dll
BasicAuthenticationModule Performs Basic authentication. Inetsrv\Authbas.dll
CertificateMappingAuthenticationModule Performs Certificate Mapping authentication using Active Directory. Inetsrv\Authcert.dll
DigestAuthenticationModule Performs Digest authentication. Inetsrv\Authmd5.dll
IISCertificateMappingAuthenticationModule Performs Certificate Mapping authentication using IIS certificate configuration. Inetsrv\Authmap.dll
RequestFilteringModule Performs URLScan tasks such as configuring allowed verbs and file name extensions, setting limits, and scanning for bad character sequences. Inetsrv\Modrqflt.dll
UrlAuthorizationModule Performs URL authorization. Inetsrv\Urlauthz.dll
WindowsAuthenticationModule Performs NTLM integrated authentication. Inetsrv\Authsspi.dll
IpRestrictionModule Restricts IPv4 addresses listed in the ipSecurity list in configuration. Inetsrv\iprestr.dll

Content Modules

Several modules in IIS perform tasks related to content in the request-processing pipeline. Content modules include modules to process requests for static files, to return a default page when a client doesn’t specify a resource in a request, to list the contents of a directory, and more.

Module Name Description Resource
CgiModule Executes Common Gateway Interface (CGI) processes to build response output. Inetsrv\Cgi.dll
DefaultDocumentModule Attempts to return a default document for requests made to the parent directory. Inetsrv\Defdoc.dll
DirectoryListingModule Lists the contents of a directory. Inetsrv\dirlist.dll
IsapiModule Hosts ISAPI extension DLLs. Inetsrv\Isapi.dll
IsapiFilterModule Supports ISAPI filter DLLs. Inetsrv\Filter.dll
ServerSideIncludeModule Processes server-side includes code. Inetsrv\Iis_ssi.dll
StaticFileModule Serves static files. Inetsrv\Static.dll
FastCgiModule Supports FastCGI, which provides a high-performance alternative to CGI. Inetsrv\iisfcgi.dll

Compression Modules

Two modules in IIS perform compression in the request-processing pipeline.

Module Name Description Resource
DynamicCompressionModule Compresses responses and applies Gzip compression transfer coding to responses. Inetsrv\Compdyn.dll
StaticCompressionModule Performs pre-compression of static content. Inetsrv\Compstat.dll

Caching Modules

Several modules in IIS perform tasks related to caching in the request-processing pipeline. Caching improves the performance of your Web sites and Web applications by storing processed information, such as Web pages, in memory on the server, and then reusing that information in subsequent requests for the same resource.

Module Name Description Resource
FileCacheModule Provides user mode caching for files and file handles. Inetsrv\Cachfile.dll
HTTPCacheModule Provides kernel mode and user mode caching in HTTP.sys. Inetsrv\Cachhttp.dll
TokenCacheModule Provides user mode caching of user name and token pairs for modules that produce Windows user principals. Inetsrv\Cachtokn.dll
UriCacheModule Provides user mode caching of URL information. Inetsrv\Cachuri.dll

Logging and Diagnostics Modules

Several modules in IIS perform tasks related to logging and diagnostics in the request-processing pipeline. The logging modules support loading of custom modules and passing information to HTTP.sys. The diagnostics modules follow and report events during request processing.

Module Name Description Resource
CustomLoggingModule Loads custom logging modules. Inetsrv\Logcust.dll
FailedRequestsTracingModule Supports the Failed Request Tracing feature. Inetsrv\Iisfreb.dll
HttpLoggingModule Passes information and processing status to HTTP.sys for logging. Inetsrv\Loghttp.dll
RequestMonitorModule Tracks requests currently executing in worker processes and reports information with Runtime Status and Control Application Programming Interface (RSCA). Inetsrv\Iisreqs.dll
TracingModule Reports events to Microsoft Event Tracing for Windows (ETW). Inetsrv\Iisetw.dll

Managed Support Modules

A couple of modules in IIS support managed integration in the IIS request-processing pipeline.

Module Name Description Resource
ManagedEngine Provides integration of managed code modules in the IIS request-processing pipeline. Microsoft.NET\Framework\v2.0.50727\webengine.dll
ConfigurationValidationModule Validates configuration issues, such as when an application is running in Integrated mode but has handlers or modules declared in the system.web section. Inetsrv\validcfg.dll

Managed Modules

In addition to native modules, IIS enables you to use managed code modules to extend IIS functionality. Some of the managed modules, such as UrlAuthorization, have a native module counterpart that provides a native alternative to the managed module.

Note   Managed modules depend on the ManagedEngine module.

The following table lists the managed modules that are available with a full installation of IIS 7 and above. For more information about the managed modules, see the .NET Framework SDK 2.0 on MSDN.

Module Name Description Resource
AnonymousIdentification Manages anonymous identifiers, which are used by features that support anonymous identification such as ASP.NET profile. System.Web.Security.AnonymousIdentificationModule
DefaultAuthentication Ensures that an authentication object is present in the context. System.Web.Security.DefaultAuthenticationModule
FileAuthorization Verifies that a user has permission to access the requested file. System.Web.Security.FileAuthorizationModule
FormsAuthentication Supports authentication by using Forms authentication. System.Web.Security.FormsAuthenticationModule
OutputCache Supports output caching. System.Web.Caching.OutputCacheModule
Profile Manages user profiles by using ASP.NET profile, which stores and retrieves user settings in a data source such as a database. System.Web.Profile.ProfileModule
RoleManager Manages a RolePrincipal instance for the current user. System.Web.Security.RoleManagerModule
Session Supports maintaining session state, which enables storage of data specific to a single client within an application on the server. System.Web.SessionState.SessionStateModule
UrlAuthorization Determines whether the current user is permitted access to the requested URL, based on the user name or the list of roles of which a user is a member. System.Web.Security.UrlAuthorizationModule
UrlMappingsModule Supports mapping a real URL to a more user-friendly URL. System.Web.UrlMappingsModule
WindowsAuthentication Sets the identity of the user for an ASP.NET application when Windows authentication is enabled. System.Web.Security.WindowsAuthenticationModule

 

Request Processing in IIS

In IIS, the IIS and ASP.NET request pipelines combine to process requests with an integrated approach. The new request-processing architecture consists of an ordered list of native and managed modules that perform specific tasks in response to requests.

This design provides several benefits over previous versions of IIS. First, all file types can use features that were originally available only to managed code. For example, you can now use ASP.NET Forms authentication and Uniform Resource Locator (URL) authorization for static files, Active Server Pages (ASP) files, and all other file types in your sites and applications.

Second, this design eliminates the duplication of several features in IIS and ASP.NET. For example, when a client requests a managed file, the server calls the appropriate authentication module in the integrated pipeline to authenticate the client. In previous versions of IIS, this same request would go through an authentication process in both the IIS pipeline and in the ASP.NET pipeline.

Third, you can manage all of the modules in one location, instead of managing some features in IIS and some in the ASP.NET configuration. This simplifies the administration of sites and applications on the server.

Application Pools in IIS

Application pools separate applications by process boundaries to prevent an application from affecting another application on the server. In IIS 7 and above, application pools continue to use IIS 6.0 worker process isolation mode. In addition, you can now specify a setting that determines how to process requests that involve managed resources: Integrated mode or Classic mode.

Note: In IIS 6.0, worker process isolation mode and IIS 5.0 isolation mode are set at the server level. This makes it impossible to run both isolation modes on the same server. However, in IIS 7 and above, Integrated mode and Classic mode are set at the application pool level, which enables you to run applications simultaneously in application pools with different process modes on the same server.

Integrated application pool mode

When an application pool is in Integrated mode, you can take advantage of the integrated request-processing architecture of IIS and ASP.NET. When a worker process in an application pool receives a request, the request passes through an ordered list of events. Each event calls the necessary native and managed modules to process portions of the request and to generate the response.

There are several benefits to running application pools in Integrated mode. First the request-processing models of IIS and ASP.NET are integrated into a unified process model. This model eliminates steps that were previously duplicated in IIS and ASP.NET, such as authentication. Additionally, Integrated mode enables the availability of managed features to all content types.

Classic application pool mode

When an application pool is in Classic mode, IIS 7 and above handles requests in the same way as in IIS 6.0 worker process isolation mode. ASP.NET requests first go through native processing steps in IIS and are then routed to Aspnet_isapi.dll for processing of managed code in the managed runtime. Finally, the request is routed back through IIS to send the response.

This separation of the IIS and ASP.NET request-processing models results in duplication of some processing steps, such as authentication and authorization. Additionally, managed code features, such as Forms authentication, are only available to ASP.NET applications or applications for which you have script mapped all requests to be handled by aspnet_isapi.dll.

Be sure to test your existing applications for compatibility in Integrated mode before upgrading a production environment to IIS 7 and above and assigning applications to application pools in Integrated mode. You should only add an application to an application pool in Classic mode if the application fails to work in Integrated mode. For example, your application might rely on an authentication token passed from IIS to the managed runtime, and, due to the new architecture in IIS 7 and above, the process breaks your application.

HTTP Request Processing in IIS

IIS 7 and above have a similar HTTP request-processing flow as IIS 6.0. The diagrams in this section provide an overview of an HTTP request in process.

The following list describes the request-processing flow that is shown in Figure 1:

  1. When a client browser initiates an HTTP request for a resource on the Web server, HTTP.sys intercepts the request.
  2. HTTP.sys contacts WAS to obtain information from the configuration store.
  3. WAS requests configuration information from the configuration store, applicationHost.config.
  4. The WWW Service receives configuration information, such as application pool and site configuration.
  5. The WWW Service uses the configuration information to configure HTTP.sys.
  6. WAS starts a worker process for the application pool to which the request was made.
  7. The worker process processes the request and returns a response to HTTP.sys.
  8. The client receives a response.

Figure 1: Overview of an HTTP Request In a worker process, an HTTP request passes through several ordered steps, called events, in the Web Server Core. At each event, a native module processes part of the request, such as authenticating the user or adding information to the event log. If a request requires a managed module, the native ManagedEngine module creates an AppDomain, where the managed module can perform the necessary processing, such as authenticating a user with Forms authentication. When the request passes through all of the events in the Web Server Core, the response is returned to HTTP.sys. Figure 2, below, shows an HTTP request entering the worker process.

Figure 2: Detail of a HTTP request inside the Worker Process

IIS是如何处理ASP.NET请求的

前言

这不是一篇描述asp.net生命周期的文章,仅仅是关于IIS操作的。在我们开始之前,先了解这些会有助于对全文的理解,同时欢迎反馈和建议。

什么是Web Server?

每当我们通过VS运行ASP.NET网站时,VS集成的ASP.NET引擎会响应各种请求,这个引擎的名字叫“WebDev.WebServer.exe”。

当我们配置一个Web程序时,总会涉及到一个词“Web Server”,它的功能便是会响应所有请求。

什么是IIS?

IIS(Internet Information Server)是微软Web Server的一种,用来配置ASP.NET站点。IIS拥有自己的ASP.NET处理引擎来处理请求,因此,当一个请求到达时,IIS接收并处理请求,然后返回内容。

请求处理过程

现在,你应能搞清楚Web Server和IIS的区别。现在我们来看一下核心部分。在继续之前,你需要搞清两个概念:

1、工作进程(Worker Process)

2、应用程序池(Application Pool)

工作进程:在IIS中,工作进程(w3wp.exe)运行着ASP.NET应用程序,管理并响应所有的请求,ASP.NET所有的功能都运行在工作进程下,当请求到来时,工作进程会生成Request和Response相关的信息。简而言之,工作进程就是ASP.NET程序的心脏。

应用程序池:应用程序池是工作进程的容器,通常用来隔开不同配置的工作进程。当一个程序出错或进程资源回收时,其他池中的程序不会受到影响。

:当一个应用程序池包含多个工作进程时,被叫做“Web Garden”。

如果我们看一下IIS 6.0的结构,就会发现,可以把它分成两部分:

1、内核模块(Kernel Mode)

2、用户模块(User Mode)

内核模式是从IIS 6.0被引入的,它包含了一个叫HTTP.SYS的文件,每当请求进来时,会首先触发该文件的响应。

HTTP.SYS文件负责把请求传入相应的应用程序池中。但HTTP.SYS如何知道应传给哪个应用程序池呢?当然不是随机抽取,每当创建一个应用程序池,该池的ID就会生成并在HTTP.SYS文件中注册,因此该文件才能确定将请求往哪传。

以上便是IIS处理请求的第一步。接着,我们来看一下请求如何从HTTP.SYS传入应用程序池。

在IIS的用户模块中,通过Web Admin Services (WAS)从HTTP.SYS接收请求,并传入相应的应用程序池中。

当应用程序池接收到请求,会接着传给工作进程(w3wp.exe),该进程检查来请求的URL后缀以确定加载哪个ISAPI扩展。ASP.NET加载时会附带自己的ISAPI扩展(aspnet_isapi.dll),以便在IIS中映射。

注意:如果先安装了asp.net,然后再安装IIS,就需要通过aspnet_regiis命令来注册ASP.NET中的ISAPI扩展。

一旦工作进程加载了aspnet_isapi.dll,就会构造一个HttpRuntime类,该类是应用程序的入口,通过ProcessRequest方法处理请求。

一旦这个方法被调用,一个HttpContext的实例就产生了。可通过HTTPContent.Current获取到这个实例,且该实例会在整个生命周期中存活,我们通过它可以获取到一些常用对象,如Request,Response,Session 等。

之后HttpRuntime会通过HttpApplicationFactory类加载一个HttpApplication对象。每一次请求都要穿过一堆HttpModule到达HttpHandler,以便被响应。而这些HttpModule就被配置在HttpApplication中。

有一个概念叫“Http管道”,被叫做管道是因为它包含了一系列的HttpModule,这些HttpModule拦截请求并将其导向相应的HttpHandler。我们也可自定义HttpModule,以便在请求响应之间做点特别的处理。

HttpHandler是“Http管道”的终点。所有请求穿过HttpModule需抵达相应的HttpHandler,然后HttpHandler根据请求资源,产生并输出内容。也正因此,我们请求任何aspx页面才会得到响应的Html内容。

结语

每当请求Web服务器上的某些信息时,该请求首先会到达Http.SYS,然后Http.SYS将其发送到相应的应用程序池,应用程序池传给工作进程并加载ISAPI扩展,然后HttpRuntime对象会被创建,并通过HttpModule和HttpHandler处理请求。

最后,ASP.NET页面生命周期就开始了。

这只是大致描述IIS处理过程的文章,如果你想进一步了解相应细节,请点击下面链接来进一步学习。

A low-level Look at the ASP.NET Architecture

IIS Architecture

Stack Overflow Architecture Update – Now at 95 Million Page Views a Month

A lot has happened since my first article on the Stack Overflow Architecture. Contrary to the theme of that last article, which lavished attention on Stack Overflow’s dedication to a scale-up strategy, Stack Overflow has both grown up and out in the last few years.

Stack Overflow has grown up by more then doubling in size to over 16 million users and multiplying its number of page views nearly 6 times to 95 million page views a month.

Stack Overflow has grown out by expanding into the Stack Exchange Network, which includes Stack Overflow, Server Fault, and Super User for a grand total of 43 different sites. That’s a lot of fruitful multiplying going on.

What hasn’t changed is Stack Overflow’s openness about what they are doing. And that’s what prompted this update. A recent series of posts talks a lot about how they’ve been handling their growth: Stack Exchange’s Architecture in Bullet Points, Stack Overflow’s New York Data Center, Designing For Scalability of Management and Fault Tolerance, Stack Overflow Search — Now 81% Less, Stack Overflow Network Configuration, Does StackOverflow use caching and if so, how?, Which tools and technologies build the Stack Exchange Network?.

Some of the more obvious differences across time are:

  • Just More. More users, more page views, more datacenters, more sites, more developers, more operating systems, more databases, more machines. Just a lot more of more.
  • Linux. Stack Overflow was known for their Windows stack, now they are using a lot more Linux machines for HAProxy, Redis, Bacula, Nagios, logs, and routers. All support functions seem to be handled by Linux, which has required the development of parallel release processes.
  • Fault Tolerance. Stack Overflow is now being served by two different switches on two different internet connections, they’ve added redundant machines, and some functions have moved to a second datacenter.
  • NoSQL. Redis is now used as a caching layer for the entire network. There wasn’t a separate caching tier before so this a big change, as is using a NoSQL database on Linux.

Unfortunately, I couldn’t find any coverage on some of the open questions I had last time, like how they were going to deal with multi-tenancy across so many diffrent properties, but there’s still plenty to learn from. Here’s a roll up a few different sources:

The Stats

  • 95 Million Page Views a Month
  • 800 HTTP requests a second
  • 180 DNS requests a second
  • 55 Megabits per second
  • 16 Million Users  – Traffic to Stack Overflow grew 131% in 2010, to 16.6 million global monthly uniques.

Data Centers

  • 1 Rack with Peak Internet in OR (Hosts our chat and Data Explorer)
  • 2 Racks with Peer 1 in NY (Hosts the rest of the Stack Exchange Network)

Hardware

  • 10 Dell R610 IIS web servers (3 dedicated to Stack Overflow):
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz Quad Core with 8 threads
    • 16 GB RAM
    • Windows Server 2008 R2
  • 2 Dell R710 database servers:
    • 2x Intel Xeon Processor X5680 @ 3.33 GHz
    • 64 GB RAM
    • 8 spindles
    • SQL Server 2008 R2
  • 2 Dell R610  HAProxy servers:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 4 GB RAM
    • Ubuntu Server
  • 2 Dell R610 Redis servers:
    • 2x Intel Xeon Processor E5640 @ 2.66 GHz
    • 16 GB RAM
    • CentOS
  • 1 Dell R610 Linux backup server running Bacula:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 32 GB RAM
  • 1 Dell R610 Linux management server for Nagios and logs:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 32 GB RAM
  • 2 Dell R610 VMWare ESXi domain controllers:
    • 1x Intel Xeon Processor E5640 @ 2.66 GHz
    • 16 GB RAM
  • 2 Linux routers
  • 5 Dell Power Connect switches

Dev Tools

  • C#: Language
  • Visual Studio 2010 Team Suite: IDE
  • Microsoft ASP.NET (version 4.0): Framework
  • ASP.NET MVC 3: Web Framework
  • Razor: View Engine
  • jQuery 1.4.2: Browser Framework:
  • LINQ to SQL, some raw SQL: Data Access Layer
  • Mercurial and Kiln: Source Control
  • Beyond Compare 3: Compare Tool

Software and Technologies Used

  • Stack Overflow uses a WISC stack via BizSpark
  • Windows Server 2008 R2 x64: Operating System
  • SQL Server 2008 R2 running Microsoft Windows Server 2008 Enterprise Edition x64: Database
  • Ubuntu Server
  • CentOS
  • IIS 7.0: Web Server
  • HAProxy: for load balancing
  • Redis: used as the distributed caching layer.
  • CruiseControl.NET: for builds and automated deployment
  • Lucene.NET:  for search
  • Bacula: for backups
  • Nagios: (with n2rrd and drraw plugins) for monitoring
  • Splunk: for logs
  • SQL Monitor: from Red Gate – for SQL Server monitoring
  • Bind: for DNS
  • Rovio:  a little robot (a real robot) allowing remote developers to visit the office “virtually.”
  • Pingdom:  an external monitor and alert service.

External Bits

Code that is not included as part of the development tools:

  • reCAPTCHA
  • DotNetOpenId
  • WMD – Now developed as open source. See github network graph
  • Prettify
  • Google Analytics
  • Cruise Control .NET
  • HAProxy
  • Cacti
  • MarkdownSharp
  • Flot
  • Nginx
  • Kiln
  • CDN: none, all static content is served off the sstatic.net, which is a fast, cookieless domain intended for static content delivered to the Stack Exchange family of websites.

Developers and System Administrators

  • 14 Developers
  • 2 System Administrators

Content

  • License: Creative Commons Attribution-Share Alike 2.5 Generic
  • Standards: OpenSearch, Atom
  • Host: PEAK Internet

More Architecture and Lessons Learned

  • HAProxy is used instead of Windows NLB because HAProxy is cheap, easy, free, works great as a 512MB VM “device” on a network via Hyper-V. It also works in front of the boxes so it’s completely transparent to them, and easier to troubleshoot as a different networking layer instead of being intermixed with all your windows configuration.
  • A CDN is not used because even “cheap” CDNs like Amazon one are very expensive relative to the bandwidth they get bundled into their existing host’s plan. The least they could pay is $1k/month based on Amazon’s CDN rates and their bandwidth usage.
  • Backup is to disk for fast retrieval and to tape for historical archiving.
  • Full Text Search in SQL Server is very badly integrated, buggy, deeply incompetent, so they went to Lucene.
  • Mostly interested in peak HTTP request figures as this is what they need to make sure they can handle.
  • All properties now run on the same Stack Exchange platform. That means Stack Overflow, Super User, Server Fault, Meta, WebApps, and Meta Web Apps are all running on the same software.
  • There are separate StackExchange sites because people have different sets of expertise that shouldn’t cross over to different topic sites. You can be the greatest chef in the world, but that doesn’t qualify you for fixing a server.
  • They aggressively cache everything.
  • All pages accessed by (and subsequently served to) annonymous users are cached via Output Caching.
  • Each site has 3 distinct caches: local, site, global.
  • local cache: can only be accessed from 1 server/site pair
    • To limit network latency they use a local “L1” cache, basically HttpRuntime.Cache, of recently set/read values on a server. This would reduce the cache lookup overhead to 0 bytes on the network.
    • Contains things like user sessions, and pending view count updates.
    • This resides purely in memory, no network or DB access.
  • site cache:  can be accessed by any instance (on any server) of a single site
    • Most cached values go here, things like hot question id lists and user acceptance rates are good examples
    • This resides in Redis (in a distinct DB, purely for easier debugging)
    • Redis is so fast that the slowest part of a cache lookup is the time spent reading and writing bytes to the network.
    • Values are compressed before sending them to Redis. They have plenty of CPU and most of their data are strings so they get a great compression ratio.
    • The CPU usage on their Redis machines is 0%.
  • global cache: which is shared amongst all sites and servers
    • Inboxes, API usage quotas, and a few other truly global things live here
    • This resides in Redis (in DB 0, likewise for easier debugging)
  • Most items in the cache expire after a timeout period (a few minutes usually) and are never explicitly removed. When a specific cache invalidation is required they use Redis messaging to publish removal notices to the “L1” caches.
  • Joel Spolsky is not a Microsoft Loyalist, he doesn’t make the technical decisions for Stack Overflow, and considers Microsoft licensing a rounding error. Consider yourself corrected Hacker News commentor.
  • For their IO system they selected a RAID 10 array of Intel X25 solid state drives . The RAID array eased any concerns about reliability and the SSD drives performed really well in comparision to FusionIO at a much cheaper price.
  • The full boat cost for their Microsoft licenses would be approximately $242K. Since Stack Overflow is using Bizspark they are not paying near the full sticker price, but that’s the max they could pay.
  • Intel NICs are replacing Broadcom NICs and their primary production servers. This solved problems they were having with  connectivity loss, packet loss, and corrupted arp tables.

Related Articles