SQL Server


SQL Server Books Online




Welcome to Microsoft SQL Server Books Online. This set of documentation helps you understand SQL Server, and how to implement data management and business intelligence projects. SQL Server includes several data management and analysis technologies. To learn more about these technologies, click the following links.


Categories









Database Engine iconDatabase EngineThe Database Engine is the core service for storing, processing and securing data. The Database Engine provides controlled access and rapid transaction processing to meet the requirements of the most demanding data consuming applications within your enterprise. The Database Engine also provides rich support for sustaining high availability.

Analysis Services iconAnalysis Services - Multidimensional DataAnalysis Services supports OLAP by allowing you to design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases.

Data mining iconAnalysis Services - Data MiningAnalysis Services enables you to design, create, and visualize data mining models. These mining models can be constructed from other data sources by using a wide variety of industry-standard data mining algorithms.

Integration Services iconIntegration ServicesIntegration Services is a platform for building high performance data integration solutions, including packages that provide extract, transform, and load (ETL) processing for data warehousing.


Master Data Services icon

Master Data ServicesMaster Data Services is the source of master data for your organization. By integrating disparate operational and analytic systems with Master Data Services, you ensure that all applications across the organization rely on a central, accurate source of information. Using Master Data Services, you create a single source of master data and maintain an auditable record of that data as it changes over time.


Replication iconReplicationReplication is a set of technologies for copying and distributing data and database objects from one database to another, and then synchronizing between databases to maintain consistency. By using replication, you can distribute data to different locations and to remote or mobile users by means of local and wide area networks, dial-up connections, wireless connections, and the Internet.

Reporting Services iconReporting ServicesReporting Services delivers enterprise, Web-enabled reporting functionality so you can create reports that draw content from a variety of data sources, publish reports in various formats, and centrally manage security and subscriptions.


ms130214.SharePoint_icon(en-us,SQL.105).gif

SharePoint IntegrationSQL Server 2008 R2 offers new self-service business intelligence capability through integration with SharePoint products and technologies. In this release, both Analysis Services and Reporting Services support deployment in a SharePoint farm.


Service Broker iconService BrokerService Broker helps developers build scalable, secure database applications. This new Database Engine technology provides a message-based communication platform that enables independent application components to perform as a functioning whole. Service Broker includes infrastructure for asynchronous programming that can be used for applications within a single database or a single instance, and also for distributed applications.


Getting Started (SQL Server 2008 R2)




Find sources of information, overviews, system requirements, installation instructions, and more.

Small File Folder Icon Overview
Feature and Tools OverviewInitial InstallationConsiderations for Installing SQL Server Samples and Sample DatabasesGetting Started with SQL Server Books Online

Small File Folder Icon Getting Assistance
How Do IThe Database Engine is the core service for storing, processing, and securing data. The Database Engine provides controlled access and rapid transaction processing to meet the requirements of the most demanding data consuming applications within your enterprise.Use the Database Engine to create relational databases for online transaction processing or online analytical processing data. This includes creating tables for storing data, and database objects such as indexes, views, and stored procedures for viewing, managing, and securing data. You can use SQL Server Management Studio to manage the database objects, and SQL Server Profiler for capturing server events.

Browse Content by Life Cycle
Small File Folder Icon Product EvaluationSmall File Folder Icon Getting StartedSmall File Folder Icon Planning and Architecture (Database Engine)Small File Folder Icon Development

Small File Folder Icon Deployment

Small File Folder Icon Operations

Small File Folder Icon Security and Protection

Small File Folder Icon Troubleshooting

Small File Folder Icon Technical Reference

Browse Content by Role
Information Worker icon (small) Information WorkerAdministrator icon (small) AdministratorDeveloper icon (small) DeveloperArchitect icon (small) Architect


Product Evaluation (Integration Services)


Determine whether Microsoft SQL Server 2008 Integration Services (SSIS) is right for your organization with these helpful product and features overviews. For SQL Server 2008 product information, see Microsoft SQL Server 2008.

Small File Folder Icon Product Features Overview
What's New (Integration Services)Overview (Integration Services)Microsoft Business Intelligence home page
Small File Folder Icon News and Reviews
SQL Server News and Reviews Web page
Small File Folder Icon Case Studies and White Papers
SQL Server 2008 Product OverviewBusiness Intelligence Conference resources site 
Small File Folder Icon Blogs and Newsgroup Postings
SQL Server 2008 blogs Web site
Integration Services icon (small) Stay Up to Date with Integration Services
For the latest downloads, articles, samples, and videos from Microsoft, as well as selected solutions from the community, visit the Integration Services page on MSDN or TechNet:

For automatic notification of these updates, subscribe to the RSS feeds available on the page.


Planning and Architecture (Integration Services)


Typical Uses of Integration Services




Integration Services provides a rich set of built-in tasks, containers, transformations, and data adapters that support the development of business applications. Without writing a single line of code, you can create SSIS solutions that solve complex business problems using ETL and business intelligence, manage SQL Server databases, and copy SQL Server objects between instances of SQL Server.

The following scenarios describe typical uses of SSIS packages.



Merging Data from Heterogeneous Data Stores







Data is typically stored in many different data storage systems, and extracting data from all sources and merging the data into a single, consistent dataset is challenging. This situation can occur for a number of reasons. For example:

  • Many organizations archive information that is stored in legacy data storage systems. This data may not be important to daily operations, but it may be valuable for trend analysis that requires data collected over a long period of time.

  • Branches of an organization may use different data storage technologies to store the operational data. The package may need to extract data from spreadsheets as well as relational databases before it can merge the data.

  • Data may be stored in databases that use different schemas for the same data. The package may need to change the data type of a column or combine data from multiple columns into one column before it can merge the data.


Integration Services can connect to a wide variety of data sources, including multiple sources in a single package. A package can connect to relational databases by using .NET and OLE DB providers, and to many legacy databases by using ODBC drivers. It can also connect to flat files, Excel files, and Analysis Services projects.

Integration Services includes source components that perform the work of extracting data from flat files, Excel spreadsheets, XML documents, and tables and views in relational databases from the data source to which the package connects.

Next, the data is typically transformed by using the transformations that Integration Services includes. After the data is transformed to compatible formats, it can be merged physically into one dataset.

After the data is merged successfully and transformations are applied to data, the data is usually loaded into one or more destinations. Integration Services includes destination for loading data into flat files, raw files, and relational databases. The data can also be loaded into an in-memory recordset and accessed by other package elements.



Populating Data Warehouses and Data Marts







The data in data warehouses and data marts is usually updated frequently, and the data loads are typically very large.

Integration Services includes a task that bulk loads data directly from a flat file into SQL Server tables and views, and a destination component that bulk loads data into a SQL Server database as the last step in a data transformation process.

An SSIS package can be configured to be restartable. This means you can rerun the package from a predetermined checkpoint, either a task or container in the package. The ability to restart a package can save a lot of time, especially if the package processes data from a large number of sources.

You can use SSIS packages to load the dimension and fact tables in the database. If the source data for a dimension table is stored in multiple data sources, the package can merge the data into one dataset and load the dimension table in a single process, instead of using a separate process for each data source.

Updating data in data warehouses and data marts can be complex, because both types of data stores typically include slowly changing dimensions that can be difficult to manage through a data transformation process. The Slowly Changing Dimension Wizard automates support for slowly changing dimensions by dynamically creating the SQL statements that insert and update records, update related records, and add new columns to tables.

Additionally, tasks and transformations in Integration Services packages can process Analysis Services cubes and dimensions. When the package updates tables in the database that a cube is built on, you can use Integration Services tasks and transformations to automatically process the cube and to process dimensions as well. Processing the cubes and dimensions automatically helps keep the data current for users in both environments; users who access information in the cubes and dimensions, and users who access data in a relational database.

Integration Services can also compute functions before the data is loaded into its destination. If your data warehouses and data marts store aggregated information, the SSIS package can compute functions such as SUM, AVERAGE, and COUNT. An SSIS transformation can also pivot relational data and transform it into a less-normalized format that is more compatible with the table structure in the data warehouse.



Cleaning and Standardizing Data







Whether data is loaded into an online transaction processing (OLTP) or online analytic processing (OLAP) database, an Excel spreadsheet, or a file, it needs to be cleaned and standardized before it is loaded. Data may need to be updated for the following reasons:

  • Data is contributed from multiple branches of an organization, each using different conventions and standards. Before the data can be used, it may need to be formatted differently. For example, you may need to combine the first name and the last name into one column.

  • Data is rented or purchased. Before it can be used, the data may need to be standardized and cleaned to meet business standards. For example, an organization wants to verify that all the records use the same set of state abbreviations or the same set of product names.

  • Data is locale-specific. For example, the data may use varied date/time and numeric formats. If data from different locales is merged, it must be converted to one locale before it is loaded to avoid corruption of data.


Integration Services includes built-in transformations that you can add to packages to clean and standardize data, change the case of data, convert data to a different type or format, or create new column values based on expressions. For example, the package could concatenate first and last name columns into a single full name column, and then change the characters to uppercase.

An Integration Services package can also clean data by replacing the values in columns with values from a reference table, using either an exact lookup or fuzzy lookup to locate values in a reference table. Frequently, a package applies the exact lookup first, and if the lookup fails, it applies the fuzzy lookup. For example, the package first attempts to look up a product name in the reference table by using the primary key value of the product. When this search fails to return the product name, the package attempts the search again, this time using fuzzy matching on the product name.

Another transformation cleans data by grouping values in a dataset that are similar. This is useful for identifying records that may be duplicates and therefore should not be inserted into your database without further evaluation. For example, by comparing addresses in customer records you may identify a number of duplicate customers.



Building Business Intelligence into a Data Transformation Process







A data transformation process requires built-in logic to respond dynamically to the data it accesses and processes.

The data may need to be summarized, converted, and distributed based on data values. The process may even need to reject data, based on an assessment of column values.

To address this requirement, the logic in the SSIS package may need to perform the following types of tasks:

  • Merging data from multiple data sources.

  • Evaluating data and applying data conversions.

  • Splitting a dataset into multiple datasets based on data values.

  • Applying different aggregations to different subsets of a dataset.

  • Loading subsets of the data into different or multiple destinations.


Integration Services provides containers, tasks, and transformations for building business intelligence into SSIS packages.

Containers support the repetition of workflows by enumerating across files or objects and by evaluating expressions. A package can evaluate data and repeat workflows based on results. For example, if the date is in the current month, the package performs one set of tasks; if not, the package performs an alternative set of tasks.

Tasks that use input parameters can also build business intelligence into packages. For example, the value of an input parameter can filter the data that a task retrieves.

Transformations can evaluate expressions and then, based on the results, send rows in a dataset to different destinations. After the data is divided, the package can apply different transformations to each subset of the dataset. For example, an expression can evaluate a date column, add the sales data for the appropriate period, and then store only the summary information.

It is also possible to send a data set to multiple destinations, and then apply different sets of transformation to the same data. For example, one set of transformations can summarize the data, while another set of transformations expands the data by looking up values in reference tables and adding data from other sources.


Automating Administrative Functions and Data Loading







Administrators frequently want to automate administrative functions such as backing up and restoring databases, copying SQL Server databases and the objects they contain, copying SQL Server objects, and loading data. Integration Services packages can perform these functions.

Integration Services includes tasks that are specifically designed to copy SQL Server database objects such as tables, views, and stored procedures; copy SQL Server objects such as databases, logins, and statistics; and add, change, and delete SQL Server objects and data by using Transact-SQL statements.

Administration of an OLTP or OLAP database environment frequently includes the loading of data. Integration Services includes several tasks that facilitate the bulk loading of data. You can use a task to load data from text files directly into SQL Server tables and views, or you can use a destination component to load data into SQL Server tables and views after applying transformations to the column data.

An Integration Services package can run other packages. A data transformation solution that includes many administrative functions can be separated into multiple packages so that managing and reusing the packages is easier.

If you need to perform the same administrative functions on different servers, you can use packages. A package can use looping to enumerate across the servers and perform the same functions on multiple computers. To support administration of SQL Server, Integration Services provides an enumerator that iterates across SQL Server Management Objects (SMO) objects. For example, a package can use the SMO enumerator to perform the same administrative functions on every job in the Jobs collection of a SQL Server installation.

SSIS packages can also be scheduled using SQL Server Agent Jobs.

International Considerations for Integration Services




Microsoft SQL Server Integration Services supports the parsing and manipulation of multilingual data, supports all Windows locales, and provides special comparison options for sorting and comparing string data.

The Integration Services transformations for text mining and the fuzzy matching may not work as well with non-English languages as they do with English. However, both the text mining and fuzzy matching transformations may provide useful results when used with many non-English languages, depending on the language.



Locale Insensitive Parsing







Integration Services includes locale-insensitive parsing routines that you can use for data that is in certain formats. These parsing routines, collectively called Fast Parse, support only the most frequently used date format representations, do not perform locale-specific parsing, do not recognize special characters in currency data, and cannot convert hexadecimal or scientific representation of integers. Fast parse can significantly improve the performance of Integration Services packages that do not have locale dependencies. For more information, see Parsing Data.



Locale Settings







Integration Services supports locales at the level of the package object, container, task, and data flow component. You can also set the locale of event handlers.

A package can use multiple different locales. For example, the package may use the English (United States) locale while one task in the package uses the German (Germany) locale and another task uses the Japanese (Japan) locale.

You can use any Windows locale in an Integration Services package. You set the locale when you construct the package, and unless the package uses configurations to update locale properties, the package is guaranteed to behave the same when deployed to computers that may use different regional and language options than the development environment.

However, if a package must use different locales when deployed to different servers, you can create configurations that provide the updated locales to use when the package is run. For more information, see Setting Package Properties and Package Configurations.



Comparison Options







The locale provides the basic rules for comparing string data in a data flow. For example, the locale specifies the sort position of each letter in the alphabet. However, these rules may not be sufficient for the comparisons that you want to perform, and Integration Services supports a set of advanced comparison options that go beyond the comparison rules of a locale. For example, if you choose to ignore nonspacing characters, "a" and "á" are equivalent for comparison purposes. For more information, see Comparing String Data.



Text Mining







The transformations for text mining, Term Extraction and Term Lookup, use their own dictionary. This dictionary is only available in English, and the results from using the text mining transformations with languages other than English may be limited. Microsoft supports the use of these transformations only with English.

However, depending on the linguistic similarity between the non-English and English languages, you may find that the Text Extraction transformation can extract terms in non-English languages, and that the Term Lookup transformation can be used to look up terms and calculate term frequency. The greater the similarity between the languages, the more successful the term mining will be. For example, using the Term Extraction transformation for text mining of Swedish strings could be effective because the Swedish language uses word and sentence delimiters that are similar to those in the English language. On the other hand, using the Text Extraction transformation is not likely to be as successful with Japanese. For more information, see Term Extraction Transformation and Term Lookup Transformation.


Fuzzy Matching







The two transformations, Fuzzy Grouping and Fuzzy Lookup, use fuzzy matching to group similar records in a dataset or perform lookups in a reference table. Both transformations can perform matching most effectively when the text data contains multiple, long words separated by white space or delimiters. The transformations may not be as tolerant of errors in logographic languages, such as Chinese, where words often consist of only a few characters and may not be separated by white space. Generally, the transformations may be less likely to catch spelling errors, extra words, and missing words in logographic languages. For more information, see Fuzzy Grouping Transformation and Fuzzy Lookup Transformation.

64-bit Considerations for Integration Services




The 64-bit editions of Microsoft SQL Server include Integration Services, but some Integration Services features are available only in 32-bit versions, or have limitations on 64-bit computers, or are not supported on Itanium-based operating systems.



Installing Integration Services on 64-bit Computers







Integration Services Features Installed on 64-bit Computers




When you run SQL Server Setup and select only Integration Services for installation, Setup installs all available 64-bit Integration Services features and tools.

However, if you require Integration Services design-time features, you must install Business Intelligence Development Studio. To install Business Intelligence Development Studio, select Business Intelligence Development Studio during setup.










Important note Important
Business Intelligence Development Studio, the 32-bit development environment for Integration Services packages, is not supported on the Itanium 64-bit operating system and is not installed on Itanium servers.


If you have to run certain packages in 32-bit mode, you have to install the 32-bit versions of the Integration Services tools. To install the 32-bit versions of the tools, you must select either Business Intelligence Development Studio or Management Tools - Complete during setup.

64-bit features are installed under the Program Files directory, and 32-bit features are installed separately under the Program Files (x86) directory. (This behavior is not specific to Integration Services or to SQL Server.)


64-bit Versions of Integration Services and Tools




The Integration Services runtime has a 64-bit version.

The following Integration Services tools also have 64-bit versions:

  • The dtexec utility (dtexec.exe)

  • The dtutil utility (dtutil.exe)

  • The SQL Server Import and Export Wizard (DTSWizard.exe)


On a 64-bit computer, selecting Integration Services during Setup installs only the 64-bit runtime and tools. If you have to run packages in 32-bit mode, you must also select an additional option to install the 32-bit runtime and tools:

  • If the 64-bit computer is running the x86 operating system, select Business Intelligence Development Studio or Management Tools - Complete.

  • If the 64-bit computer is running the Itanium operating system, select Management Tools - Complete.












Note Note
When you use the Execute Package Utility (dtexecui.exe) on a 64-bit computer to develop and test commands, remember that this 32-bit tool is running packages in 32-bit mode. Before you deploy or schedule these commands on a production server, you should use the 64-bit version of the dtexec utility to test the commands in 64-bit mode.


By default, a 64-bit computer that has both the 64-bit and 32-bit versions of an Integration Services command prompt utility installed will run the 32-bit version at the command prompt. The 32-bit version runs because the directory path for the 32-bit version appears in the PATH environment variable before the directory path for the 64-bit version. (Typically, the 32-bit directory path is <drive>:\Program Files(x86)\Microsoft SQL Server\100\DTS\Binn, while the 64-bit directory path is <drive>:\Program Files\Microsoft SQL Server\100\DTS\Binn.)










Note Note
If you use SQL Server Agent to run the utility, SQL Server Agent automatically uses the 64-bit version of the utility. SQL Server Agent uses the registry, not the PATH environment variable, to locate the correct executable for the utility.


To ensure that you run the 64-bit version of the utility at the command prompt, you can take one of the following actions:

  • Open a Command Prompt window, change to the directory that contains the 64-bit version of the utility (<drive>:\Program Files\Microsoft SQL Server\100\DTS\Binn), and then run the utility from that location.

  • At the command prompt, run the utility by entering the full path (<drive>:\Program Files\Microsoft SQL Server\100\DTS\Binn) to the 64-bit version of the utility.

  • Permanently change the order of the paths in the PATH environment variable by placing the 64-bit path (<drive>:\Program Files\Microsoft SQL Server\100\DTS\Binn) before the 32-bit path (<drive>:\ Program Files(x86)\Microsoft SQL Server\100\DTS\Binn) in the variable.



Support for Data Transformation Services on 64-bit Computers




There is no 64-bit design-time or run-time support for Data Transformation Services (DTS) packages that were created in earlier versions of SQL Server. For more information, see Support for Data Transformation Services (DTS) in SQL Server 2008 R2.

On Itanium-based operating systems, there is also no 32-bit design-time or run-time support for DTS packages. Therefore you cannot create, view, modify, or run DTS packages on Itanium-based operating systems.




Designing Integration Services Packages on 64-bit Computers







You cannot design packages in BI Development Studio or debug scripts in the Script task on Itanium-based operating systems.

When you run a package in 64-bit mode, you might not be able to connect to as many data sources as you can when you run a package in 32-bit mode. Some .NET Framework Data Providers and native OLE DB providers might not be available in 64-bit versions. For example, the Microsoft OLE DB Provider for Jet, which connects to Access databases and Excel spreadsheets, is not available in a 64-bit version. Also, the SQL Server Compact Provider, which connects to SQL Server Compact data sources, is not available in a 64-bit version.

The 32-bit SSIS Designer displays only 32-bit providers that are installed on the local computer. To configure a connection manager to use a specific 64-bit provider, you must install the 32-bit version of the provider on the development computer for use at design time. Even though the 32-bit version of the provider is installed, you can still run the package in 64-bit mode, both in the development environment and after deployment. The 32-bit and 64-bit versions of a provider have the same ID. Therefore, the SSIS runtime will select the appropriate version of the provider to use. For example, you run the package in the development environment on a 64-bit computer. By default, the package uses the 64-bit version of the provider because the default value of the Run64BitRuntime project property is True.

You cannot use the Execute DTS 2000 Package task in a package that is running in 64-bit mode. The SQL Server 2000 Data Transformation Services runtime is not available in a 64-bit version.

In regards to package logging, you cannot use the SQL Server Profiler log provider in a package that is running in 64-bit mode.


Running Integration Services Packages on 64-bit Computers







Sometimes you might want to run packages in 32-bit mode on a 64-bit computer. You might want to run packages in 32-bit mode for the following reasons:

  • To run SQL Server 2000 DTS packages.

  • To use a managed .NET Framework Data Provider or native OLE DB provider that is not available in a 64-bit version.

  • To use a Script task or Script component that references other assemblies or COM components for which a 64-bit version is not available or not installed.


Selecting 32-bit or 64-bit Package Execution in SSIS Designer




In the Project Properties of an Integration Services package, you can select 32-bit or 64-bit execution by setting the value of the Run64BitRuntime property on the Debugging page. By default, the value of this property is True. When the 64-bit version of the Integration Services runtime is not installed, this setting is ignored.










Note Note
The Run64BitRuntime project property applies only at design time.



Selecting 32-bit or 64-bit Package Execution in a SQL Server Agent Job


When you configure a SQL Server Agent job with a job step type of SQL Server Integration Services Package, the job invokes the dtexec utility. However, the version of the dtexec utility that the job invokes depends on what versions of SQL Server and SQL Server Agent have been installed and are running on the 64-bit computer:

  • The 64-bit versions of SQL Server and SQL Server Agent have been installed and are running on the computer. Then, a job step type of SQL Server Integration Services Package invokes the 64-bit version of the dtexecutility, and the package runs in 64-bit mode.










    NoteNote
    To run a package in 32-bit mode from a 64-bit version of SQL Server Agent, select Use 32 bit runtime on the Execution options tab of the New Job Step dialog box.


  • The 32-bit versions of SQL Server and SQL Server Agent have been installed and are running on the computer. Then, a job step type of SQL Server Integration Services Package invokes the 32-bit version of the dtexec utility, and the package runs in 32-bit mode.



Architecture of Integration Services


As shown in the following diagram, Microsoft SQL Server Integration Services consists of diverse components.

Integration Services architectureOf the components shown in the previous diagram, here are some important components to using Integration Services succesfully:

SSIS Designer
SSIS Designer is a graphical tool that you can use to create and maintain Integration Services packages. SSIS Designer is available in Business Intelligence Development Studio as part of an Integration Services project.For more information, see SSIS Designer and Integration Services in Business Intelligence Development Studio.
Runtime engine
The Integration Services runtime saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions.For more information, see Integration Services Packages.
Tasks and other executables
The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes. Run-time executables also include custom tasks that you develop.For more information, see Integration Services Tasks, Integration Services Containers, and Integration Services Event Handlers.
Data Flow engine (also known as the pipeline) and Data Flow components
The Data Flow task encapsulates the data flow engine. The data flow engine provides the in-memory buffers that move data from source to destination, and calls the sources that extract data from files and relational databases. The data flow engine also manages the transformations that modify data, and the destinations that load data or make data available to other processes. Integration Services data flow components are the sources, transformations, and destinations that Integration Services includes. You can also include custom components in a data flow.For more information, see Data Flow Task and Data Flow Elements.
API or object model
The Integration Services object model includes managed application programming interfaces (API) for creating custom components for use in packages, or custom applications that create, load, run, and manage packages. Developer can write custom applications or custom tasks or transformations by using any common language runtime (CLR) compliant language.For more information, see Developer's Guide (Integration Services).
Integration Services Service
The Integration Services service lets you use SQL Server Management Studio to monitor running Integration Services packages and to manage the storage of packages.For more information, see Managing Integration Services and Using SQL Server Management Studio.
SQL Server Import and Export Wizard
The SQL Server Import and Export Wizard can copy data to and from any data source for which a managed .NET Framework data provider or a native OLE DB provider is available. This wizard also offers the simplest method to create an Integration Services package that copies data from a source to a destination.For more information, see Importing and Exporting Data by Using the SQL Server Import and Export Wizard.
Other tools, wizards, and command prompt utilities
Integration Services includes additional tools, wizards, and command prompt utilities for running and managing Integration Services packages.For more information, see Integration Services Wizards and Command Prompt Utilities (Integration Services).

Architecture of an Integration Services Package


The objects that compose an Integration Services package include the following:

The package itself
The package, the unit of work that is retrieved, executed, and saved, and the most important Integration Services object.For more information, see Integration Services Packages.
Control flow elements
The control flow elements—tasks and containers—for building the control flow in a package. Control flow elements prepare or copy data, interact with other processes, or implement repeating workflow. Precedence constraints sequence the control flow elements into an ordered control flow and specify the conditions for executing tasks or containers.For more information, see Control Flow Elements.
Data flow components
The data flow components—sources, transformations, and destinations—for building data flows in a package that extract, transform, and load data. Paths sequence the data flow components into an ordered data flow.For more information, see Data Flow Elements.
Connection managers
The connection managers that connect to different types of data sources to extract and load data.For more information, see Integration Services Connections.
Variables
The variables that can be used in expressions to dynamically update column values and property expressions, control execution of repeating control flows, and define the conditions that precedence constraints apply.For more information, see Integration Services Variables.
Event handlers
The event handlers that run in response to the run-time events that packages, tasks, and containers raise.For more information, see Integration Services Event Handlers.
Log providers
The log providers that support logging of package run-time information such as the start time and the stop time of the package and its tasks and containers.For more information, see Integration Services Log Providers.

Understanding the Components of an Integration Services Package




This section describes the objects that Microsoft SQL Server Integration Services provides for building Integration Services packages, and explains how the objects work together in the package workflow and how they can extend package functionality.

The objects include the following:

  • The package, the unit of work that is retrieved, executed, and saved, and the most important Integration Services object.

  • The control flow elements—tasks and containers—for building the control flow in a package. Control flow elements prepare or copy data, interact with other processes, or implement repeating workflow. Precedence constraints sequence the control flow elements into an ordered control flow and specify the conditions for executing tasks or containers.

  • The data flow components—sources, transformations, and destinations—for building data flows in a package that extract, transform, and load data. Paths sequence the data flow components into an ordered data flow.

  • The connection managers that connect to different types of data sources to extract and load data.

  • The variables that can be used in expressions to dynamically update column values and property expressions, control execution of repeating control flows, and define the conditions that precedence constraints apply.

  • The event handlers that run in response to the run-time events that packages, tasks, and containers raise.

  • The log providers that support logging of package run-time information such as the start time and the stop time of the package and its tasks and containers.



In This Section







Integration Services Projects
Describes the projects which serve as containers for Integration Services packages at design time. 
Integration Services Packages
Describes packages. 
Control Flow Elements
Describes containers, tasks, and precedence constraints. 
Data Flow Elements
Describes sources, transformations, destinations, and paths. 
Integration Services Connections
Describes connection managers. 
Integration Services Variables
Describes user-defined variables and the system variables that Integration Services provides. 
Integration Services Event Handlers
Describes the run-time events and the event handlers you can build for the events. 
Integration Services Log Providers
Describes the log providers that Integration Services includes.

Typical Uses of Integration Services




Integration Services provides a rich set of built-in tasks, containers, transformations, and data adapters that support the development of business applications. Without writing a single line of code, you can create SSIS solutions that solve complex business problems using ETL and business intelligence, manage SQL Server databases, and copy SQL Server objects between instances of SQL Server.

The following scenarios describe typical uses of SSIS packages.



Merging Data from Heterogeneous Data Stores







Data is typically stored in many different data storage systems, and extracting data from all sources and merging the data into a single, consistent dataset is challenging. This situation can occur for a number of reasons. For example:

  • Many organizations archive information that is stored in legacy data storage systems. This data may not be important to daily operations, but it may be valuable for trend analysis that requires data collected over a long period of time.

  • Branches of an organization may use different data storage technologies to store the operational data. The package may need to extract data from spreadsheets as well as relational databases before it can merge the data.

  • Data may be stored in databases that use different schemas for the same data. The package may need to change the data type of a column or combine data from multiple columns into one column before it can merge the data.


Integration Services can connect to a wide variety of data sources, including multiple sources in a single package. A package can connect to relational databases by using .NET and OLE DB providers, and to many legacy databases by using ODBC drivers. It can also connect to flat files, Excel files, and Analysis Services projects.

Integration Services includes source components that perform the work of extracting data from flat files, Excel spreadsheets, XML documents, and tables and views in relational databases from the data source to which the package connects.

Next, the data is typically transformed by using the transformations that Integration Services includes. After the data is transformed to compatible formats, it can be merged physically into one dataset.

After the data is merged successfully and transformations are applied to data, the data is usually loaded into one or more destinations. Integration Services includes destination for loading data into flat files, raw files, and relational databases. The data can also be loaded into an in-memory recordset and accessed by other package elements.



Populating Data Warehouses and Data Marts







The data in data warehouses and data marts is usually updated frequently, and the data loads are typically very large.

Integration Services includes a task that bulk loads data directly from a flat file into SQL Server tables and views, and a destination component that bulk loads data into a SQL Server database as the last step in a data transformation process.

An SSIS package can be configured to be restartable. This means you can rerun the package from a predetermined checkpoint, either a task or container in the package. The ability to restart a package can save a lot of time, especially if the package processes data from a large number of sources.

You can use SSIS packages to load the dimension and fact tables in the database. If the source data for a dimension table is stored in multiple data sources, the package can merge the data into one dataset and load the dimension table in a single process, instead of using a separate process for each data source.

Updating data in data warehouses and data marts can be complex, because both types of data stores typically include slowly changing dimensions that can be difficult to manage through a data transformation process. The Slowly Changing Dimension Wizard automates support for slowly changing dimensions by dynamically creating the SQL statements that insert and update records, update related records, and add new columns to tables.

Additionally, tasks and transformations in Integration Services packages can process Analysis Services cubes and dimensions. When the package updates tables in the database that a cube is built on, you can use Integration Services tasks and transformations to automatically process the cube and to process dimensions as well. Processing the cubes and dimensions automatically helps keep the data current for users in both environments; users who access information in the cubes and dimensions, and users who access data in a relational database.

Integration Services can also compute functions before the data is loaded into its destination. If your data warehouses and data marts store aggregated information, the SSIS package can compute functions such as SUM, AVERAGE, and COUNT. An SSIS transformation can also pivot relational data and transform it into a less-normalized format that is more compatible with the table structure in the data warehouse.



Cleaning and Standardizing Data







Whether data is loaded into an online transaction processing (OLTP) or online analytic processing (OLAP) database, an Excel spreadsheet, or a file, it needs to be cleaned and standardized before it is loaded. Data may need to be updated for the following reasons:

  • Data is contributed from multiple branches of an organization, each using different conventions and standards. Before the data can be used, it may need to be formatted differently. For example, you may need to combine the first name and the last name into one column.

  • Data is rented or purchased. Before it can be used, the data may need to be standardized and cleaned to meet business standards. For example, an organization wants to verify that all the records use the same set of state abbreviations or the same set of product names.

  • Data is locale-specific. For example, the data may use varied date/time and numeric formats. If data from different locales is merged, it must be converted to one locale before it is loaded to avoid corruption of data.


Integration Services includes built-in transformations that you can add to packages to clean and standardize data, change the case of data, convert data to a different type or format, or create new column values based on expressions. For example, the package could concatenate first and last name columns into a single full name column, and then change the characters to uppercase.

An Integration Services package can also clean data by replacing the values in columns with values from a reference table, using either an exact lookup or fuzzy lookup to locate values in a reference table. Frequently, a package applies the exact lookup first, and if the lookup fails, it applies the fuzzy lookup. For example, the package first attempts to look up a product name in the reference table by using the primary key value of the product. When this search fails to return the product name, the package attempts the search again, this time using fuzzy matching on the product name.

Another transformation cleans data by grouping values in a dataset that are similar. This is useful for identifying records that may be duplicates and therefore should not be inserted into your database without further evaluation. For example, by comparing addresses in customer records you may identify a number of duplicate customers.



Building Business Intelligence into a Data Transformation Process







A data transformation process requires built-in logic to respond dynamically to the data it accesses and processes.

The data may need to be summarized, converted, and distributed based on data values. The process may even need to reject data, based on an assessment of column values.

To address this requirement, the logic in the SSIS package may need to perform the following types of tasks:

  • Merging data from multiple data sources.

  • Evaluating data and applying data conversions.

  • Splitting a dataset into multiple datasets based on data values.

  • Applying different aggregations to different subsets of a dataset.

  • Loading subsets of the data into different or multiple destinations.


Integration Services provides containers, tasks, and transformations for building business intelligence into SSIS packages.

Containers support the repetition of workflows by enumerating across files or objects and by evaluating expressions. A package can evaluate data and repeat workflows based on results. For example, if the date is in the current month, the package performs one set of tasks; if not, the package performs an alternative set of tasks.

Tasks that use input parameters can also build business intelligence into packages. For example, the value of an input parameter can filter the data that a task retrieves.

Transformations can evaluate expressions and then, based on the results, send rows in a dataset to different destinations. After the data is divided, the package can apply different transformations to each subset of the dataset. For example, an expression can evaluate a date column, add the sales data for the appropriate period, and then store only the summary information.

It is also possible to send a data set to multiple destinations, and then apply different sets of transformation to the same data. For example, one set of transformations can summarize the data, while another set of transformations expands the data by looking up values in reference tables and adding data from other sources.


Automating Administrative Functions and Data Loading







Administrators frequently want to automate administrative functions such as backing up and restoring databases, copying SQL Server databases and the objects they contain, copying SQL Server objects, and loading data. Integration Services packages can perform these functions.

Integration Services includes tasks that are specifically designed to copy SQL Server database objects such as tables, views, and stored procedures; copy SQL Server objects such as databases, logins, and statistics; and add, change, and delete SQL Server objects and data by using Transact-SQL statements.

Administration of an OLTP or OLAP database environment frequently includes the loading of data. Integration Services includes several tasks that facilitate the bulk loading of data. You can use a task to load data from text files directly into SQL Server tables and views, or you can use a destination component to load data into SQL Server tables and views after applying transformations to the column data.

An Integration Services package can run other packages. A data transformation solution that includes many administrative functions can be separated into multiple packages so that managing and reusing the packages is easier.

If you need to perform the same administrative functions on different servers, you can use packages. A package can use looping to enumerate across the servers and perform the same functions on multiple computers. To support administration of SQL Server, Integration Services provides an enumerator that iterates across SQL Server Management Objects (SMO) objects. For example, a package can use the SMO enumerator to perform the same administrative functions on every job in the Jobs collection of a SQL Server installation.

SSIS packages can also be scheduled using SQL Server Agent Jobs.

 



Leave a Reply