World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Thursday, October 12, 2017

Temporal table maker

Temporal table maker

This post is another in the continuing theme of "making things consistent." We were voluntold to help another team get their staging environment set up. Piece of cake, SQL Compare made it trivial to snap the tables over.

Oh, we don't want these tables in Custom schema, we want them in dbo. No problem, SQL Compare again and change owner mappings and bam, out come all the tables.

Oh, can we get this in near real-time? Say every 15 minutes. ... Transaction replication to the rescue!

Oh, we don't know what data we need yet so could you keep it all, forever? ... Temporal tables to the rescue?

Yes, temporal tables is perfect. But don't put the history table in the same schema as the table, put in this one. And put all of that in its own file group.

And that's what this script does. It

  • generates a table definition for an existing table, copying it into a new schema while also adding in the start/stop columns for temporal tables.
  • crates the clustered column store index command
  • creates a non-clustered index against the start/stop columns and the natural key(s)
  • Alters the original table to add in our start/stop columns with defaults and the period
  • Alters the original table to turn on versioning

    How does it do all that? It finds all the tables that exist in our source schema and doesn't yet exist in the target schema. I build out a select * query against that table and feed it into sys.dm_exec_describe_first_result_set to identify the columns. And since sys.dm_exec_describe_first_result_set so nicely brings back the data type with length, precision and scale specified, we might as well use that as well. By specifying a value of 1 for browse_information_mode parameter, we will get the key columns defined for us. Which is handy when we want to make our non-clustered index.

    DECLARE
        @query nvarchar(4000)
    ,   @targetSchema sysname = 'dbo_HISTORY'
    ,   @tableName sysname
    ,   @targetFileGroup sysname = 'History'
    
    DECLARE
        CSR CURSOR
    FAST_FORWARD
    FOR
    SELECT ALL
        CONCAT(
        'SELECT * FROM '
        ,   s.name
        ,   '.'
        ,   t.name) 
    ,   t.name
    FROM 
        sys.schemas AS S
        INNER JOIN sys.tables AS T
        ON T.schema_id = S.schema_id
    WHERE
        1=1
        AND S.name = 'dbo'
        AND T.name NOT IN
        (SELECT TI.name FROM sys.schemas AS SI INNER JOIN sys.tables AS TI ON TI.schema_id = SI.schema_id WHERE SI.name = @targetSchema)
    
    ;
    OPEN CSR;
    FETCH NEXT FROM CSR INTO @query, @tableName;
    WHILE @@FETCH_STATUS = 0
    BEGIN
        -- do something
        SELECT
            CONCAT
        (
            'CREATE TABLE '
        ,   @targetSchema
        ,   '.'
        ,   @tableName
        ,   '('
        ,   STUFF
            (
                (
                SELECT
                    CONCAT
                    (
                        ','
                    ,   DEDFRS.name
                    ,   ' '
                    ,   DEDFRS.system_type_name
                    ,   ' '
                    ,   CASE DEDFRS.is_nullable
                        WHEN 1 THEN ''
                        ELSE 'NOT '
                        END
                    ,   'NULL'
                    )
                FROM
                    sys.dm_exec_describe_first_result_set(@query, N'', 1) AS DEDFRS
                ORDER BY
                    DEDFRS.column_ordinal
                FOR XML PATH('')
                )
            ,   1
            ,   1
            ,   ''
            )
            ,   ', SysStartTime datetime2(7) NOT NULL'
            ,   ', SysEndTime datetime2(7) NOT NULL'
            ,   ')'
            ,   ' ON '
            ,   @targetFileGroup
            ,   ';'
            ,   CHAR(13)
            ,   'CREATE CLUSTERED COLUMNSTORE INDEX CCI_'
            ,   @targetSchema
            ,   '_'
            ,   @tableName
            ,   ' ON '
            ,   @targetSchema
            ,   '.'
            ,   @tableName
            ,   ' ON '
            ,   @targetFileGroup
            ,   ';'
            ,   CHAR(13)
            ,   'CREATE NONCLUSTERED INDEX IX_'
            ,   @targetSchema
            ,   '_'
            ,   @tableName
            ,   '_PERIOD_COLUMNS '
            ,   ' ON '
            ,   @targetSchema
            ,   '.'
            ,   @tableName
    
            ,   '('
            ,   'SysEndTime'
            ,   ',SysStartTime'
            ,   (
                    SELECT
                        CONCAT
                        (
                            ','
                        ,   DEDFRS.name
                        )
                    FROM
                        sys.dm_exec_describe_first_result_set(@query, N'', 1) AS DEDFRS
                    WHERE
                        DEDFRS.is_part_of_unique_key = 1
                    ORDER BY
                        DEDFRS.column_ordinal
                    FOR XML PATH('')
                    )
            ,   ')'
            ,   ' ON '
            ,   @targetFileGroup
            ,   ';'
            ,   CHAR(13)
            ,   'ALTER TABLE '
            ,   'dbo'
            ,   '.'
            ,   @tableName
            ,   ' ADD '
            ,   'SysStartTime datetime2(7) GENERATED ALWAYS AS ROW START HIDDEN'
            ,   ' CONSTRAINT DF_'
            ,   'dbo_'
            ,   @tableName
            ,   '_SysStartTime DEFAULT SYSUTCDATETIME()'
            ,   ', SysEndTime datetime2(7) GENERATED ALWAYS AS ROW END HIDDEN'
            ,   ' CONSTRAINT DF_'
            ,   'dbo_'
            ,   @tableName
            ,   '_SysEndTime DEFAULT DATETIME2FROMPARTS(9999, 12, 31, 23,59, 59,9999999,7)'
            ,   ', PERIOD FOR SYSTEM_TIME (SysStartTime, SysEndTime);'
            ,   CHAR(13)
            ,   'ALTER TABLE '
            ,   'dbo'
            ,   '.'
            ,   @tableName
            ,   ' SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = '
            ,   @targetSchema
            ,   '.'
            ,   @tableName
            ,   '));'
    
        )
    
    FETCH NEXT FROM CSR INTO @query, @tableName;
    END
    CLOSE CSR;
    DEALLOCATE CSR;

    Lessons learned

    The exampled I cobbled together from MSDN were great, until they weren't. Be wary of anyone who doesn't specify lengths - one example used datetime2 for the start/stop columns, the other specified datetime2(0). The default precision with datetime2 is 7, which is very much not 0. Those data types differences were incompatible for temporal table and history.

    Cleaning up from that mess was ugly. I couldn't drop the start/stop columns until I dropped the PERIOD column. One doesn't drop a PERIOD though, one has to DROP PERIOD FOR SYSTEM_TIME

    I prefer to use the *FromParts methods where I can so that's in my default instead of casting strings. Out ambiguity of internationalization!

    This doesn't account for tables with bad names and potentially without primary/unique keys defined. My domain was clean so beware of this a general purpose temporal table maker.

    Improvements

    How can you make this better? My hard coded dbo should have been abstracted out to a @sourceSchema variable. I should have used QUOTENAME for all my entity names. I could have stuffed all those commands into either a table or invoked it directly with a sp_execute_sql call. I should have abused CONCAT more Wait, that's done. That's very well done.

    Finally, you are responsible for the results of this script. Don't run it anywhere without evaluating and understanding the consequences.

  • Thursday, October 5, 2017

    Broken View Finder

    Broken View Finder

    Shh, shhhhhh, we're being very very quiet, we're hunting broken views. Recently, we were asked to migrate some code changes and after doing so, the requesting team told us we had broken all of their views, but they couldn't tell us what was broken, just that everything was. After a quick rollback to snapshot, thank you Red Gate SQL Compare, I thought it'd be enlightening to see whether anything was broken before our code had been deployed.

    You'll never guess what we discovered </clickbain&grt;

    How can you tell a view is broken

    The easiest way is SELECT TOP 1 * FROM dbo.MyView; but then you need to figure out all of your views.

    That's easy enough, SELECT * FROM sys.schemas AS S INNER JOIN sys.views AS V ON V.schema_id = S.schema_id;

    But you know, there's something built into SQL Server that will actually test your views - sys.sp_refreshview. That's much cleaner than running sys.sp_executesql with our SELECT TOP 1s

    -- This script identifies broken views
    -- and at least the first error with it
    SET NOCOUNT ON;
    DECLARE
        CSR CURSOR
    FAST_FORWARD
    FOR
    SELECT
        CONCAT(QUOTENAME(S.name), '.', QUOTENAME(V.name)) AS vname
    FROM
        sys.views AS V
        INNER JOIN
            sys.schemas AS S
            ON S.schema_id = V.schema_id;
    
    DECLARE
        @viewname nvarchar(776);
    DECLARE
        @BROKENVIEWS table
    (
        viewname nvarchar(776)
    ,   ErrorMessage nvarchar(4000)
    ,   ErrorLine int
    );
    
    OPEN
        CSR;
    FETCH
        NEXT FROM CSR INTO @viewname;
    
    WHILE
        @@FETCH_STATUS = 0
    BEGIN
    
        BEGIN TRY
            EXECUTE sys.sp_refreshview
                @viewname;
        END TRY
        BEGIN CATCH
            INSERT INTO @BROKENVIEWS(viewname, ErrorMessage, ErrorLine)
            VALUES
            (
                @viewname
            ,   ERROR_MESSAGE()
            ,   ERROR_LINE()
            );
            
        END CATCH
    
        FETCH
            NEXT FROM CSR INTO @viewname;
    END
    
    CLOSE CSR;
    DEALLOCATE CSR;
    
    SELECT
        B.*
    FROM
        @BROKENVIEWS AS B
    

    Can you think of ways to improve this? Either way, happy hunting!

    Tuesday, September 5, 2017

    Biml Query Table Builder

    Biml Query Table Builder

    We previously noted the awesomeness that is SQL Server 2012+'s ability to expose a query's metadata. Let's look how we can couple that information with creating Biml Table objects.

    Prep work

    Add a static Biml file to your project that defines and OLE DB Connection and then a database and schema. e.g.

    <Biml xmlns="http://schemas.varigence.com/biml.xsd">
        <Connections>
            <OleDbConnection Name="msdb" ConnectionString="Provider=SQLNCLI11;Data Source=localhost\dev2016;Integrated Security=SSPI;Initial Catalog=msdb" />
        </Connections>
        <Databases>
            <Database Name="msdb" ConnectionName="msdb" />
        </Databases>
        <Schemas>
            <Schema Name="dbo" DatabaseName="msdb" />
        </Schemas>
    </Biml>

    Now that we have a database connection named msdb and a valid database and schema, save the file and let's get into the good stuff.

    Given the reference query in previous post, Drive level latency information, we would need to declare the following Biml within our Tables collection.

            <Table Name="Query 28" SchemaName="msdb.dbo">
                <Columns>
                    <Column Name="Drive" DataType="String" Length="4" Precision="0" Scale="0" IsNullable="true" />
                    <Column Name="Volume Mount Point" DataType="String" Length="512" Precision="0" Scale="0" IsNullable="true" />
                    <Column Name="Read Latency" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                    <Column Name="Write Latency" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                    <Column Name="Overall Latency" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                    <Column Name="Avg Bytes/Read" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                    <Column Name="Avg Bytes/Write" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                    <Column Name="Avg Bytes/Transfer" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                </Columns>
            </Table>

    That could be accomplished purely within the declarative nature of Biml wherein we do lots of text nuggets <#= "foo" #> but that's going to be ugly to maintain as there's a lot of conditional logic to muck with. Instead, I'm going to create a C# method that returns the the Biml table object (AstTableNode). To do that, we will need to create a Biml Class nugget <#+ #>. I ended up creating two methods: GetAstTableNodeFromQuery and then a helper method to translate the SQL Server data types into something Biml understood.

    <#+
        /// <summary>
        /// Build out a Biml table based on the supplied query and connection. 
        /// This assumes a valid SQL Server 2012+ OLEDB Connection is provided but the approach
        /// can be adapted based on providers and information schemas.
        /// We further assume that column names in the query are unique.
        /// </summary>
        /// <param name="connection">An OleDbConnection</param>
        /// <param name="query">A SQL query</param>
        /// <param name="schemaName">The schema our table should be created in</param>
        /// <param name="queryName">A name for our query</param>
        /// <returns>Best approximation of a SQL Server data type</returns>
        public AstTableNode GetAstTableNodeFromQuery(AstOleDbConnectionNode connection, string query, string schemaName, string queryName)
        {
            string template = @"SELECT
        DEDFRS.name
    ,   DEDFRS.is_nullable
    ,   DEDFRS.system_type_name
    ,   DEDFRS.max_length
    ,   DEDFRS.precision
    ,   DEDFRS.scale
    FROM
        sys.dm_exec_describe_first_result_set(N'{0}', NULL, NULL) AS DEDFRS ORDER BY DEDFRS.column_ordinal;";
            AstTableNode atn = null;
    
            atn = new AstTableNode(null);
            atn.Name = queryName;
            atn.Schema = this.RootNode.Schemas[schemaName];
            string queryActual = string.Format(template, query.Replace("'", "''"));
            
            string colName = string.Empty;
            string typeText = string.Empty;
            System.Data.DbType dbt = DbType.UInt16;
            int length = 0;
            int precision = 0;
            int scale = 0;
    
            try
            {
                System.Data.DataTable dt = null;
                dt = ExternalDataAccess.GetDataTable(connection, queryActual);
                foreach (System.Data.DataRow row in dt.Rows)
                {
                    try
                    {
                        AstTableColumnBaseNode col = new AstTableColumnNode(atn);
                        // This can be empty -- see DBCC TRACESTATUS (-1)
                        if(row[0] == DBNull.Value)
                        {
                            atn.Annotations.Add(new AstAnnotationNode(atn){Tag = "Invalid", Text = "No Metadata generated"});
                            break;
                        }
                        else
                        {
                            colName = row[0].ToString();
                        }
                        
                        typeText = row[2].ToString();
                        dbt = TranslateSqlServerTypes(row[2].ToString());
                        length = int.Parse(row[3].ToString());
                        precision = int.Parse(row[4].ToString());
                        scale = int.Parse(row[5].ToString());
                        
                        col.Name = colName;
                        col.IsNullable = (bool)row[1];
                        col.DataType = dbt;
                        col.Length = length;
                        col.Precision = precision;
                        col.Scale = scale;
                        
                        atn.Columns.Add(col);
                    }
                    catch (Exception ex)
                    {
                        // Something went awry with making a column for our table
                        AstTableColumnBaseNode col = new AstTableColumnNode(atn);
                        col.Name = "FailureColumn";
                        col.Annotations.Add(new AstAnnotationNode(col){Tag = "colName", Text = colName});
                        col.Annotations.Add(new AstAnnotationNode(col){Tag = "typeText", Text = typeText});
                        col.Annotations.Add(new AstAnnotationNode(col){Tag = "dbtype", Text = dbt.ToString()});
                        col.Annotations.Add(new AstAnnotationNode(col){Tag = "Error", Text = ex.Message});
                        col.Annotations.Add(new AstAnnotationNode(col){Tag = "Stack", Text = ex.StackTrace});
                        atn.Columns.Add(col);
                    }
                }
            }
            catch (Exception ex)
            {
                // Table level failures
                AstTableColumnBaseNode col = new AstTableColumnNode(atn);
                col.Name = "Failure";
                col.Annotations.Add(new AstAnnotationNode(col){Tag = "Error", Text = ex.ToString()});
                col.Annotations.Add(new AstAnnotationNode(col){Tag = "SourceQuery", Text = query});
                col.Annotations.Add(new AstAnnotationNode(col){Tag = "QueryActual", Text = queryActual});
                atn.Columns.Add(col);
            }
            return atn;
        }
        
        /// <summary>
        /// A rudimentary method to convert SQL Server data types to Biml types. Doesn't cover
        /// UDDT, sql_variant(well)
        /// </summary>
        /// <param name="typeName">Data type with optional length/scale/precision</param>
        /// <returns>Best approximation of a SQL Server data type</returns>
        public DbType TranslateSqlServerTypes(string typeName)
        {
            // typeName might contain length - strip it
            string fixedName = typeName;
            if(typeName.Contains("("))
            {
                fixedName = typeName.Substring(0, typeName.IndexOf("("));
            }
            // Approximate translation of https://msdn.microsoft.com/en-us/library/System.Data.DbType.aspx
            // https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-data-type-mappings
            Dictionary<string, DbType> translate = new Dictionary<string, DbType> {
                
                {"bigint", DbType.Int64 }
            ,   {"binary", DbType.Binary }
            ,   {"bit", DbType.Boolean }
            ,   {"char", DbType.AnsiStringFixedLength }
            ,   {"date", DbType.Date }
            ,   {"datetime", DbType.DateTime }
            ,   {"datetime2", DbType.DateTime2 }
            ,   {"datetimeoffset", DbType.DateTimeOffset }
            ,   {"decimal", DbType.Decimal }
            ,   {"float", DbType.Double }
            //,   {"geography", 
            //,   {"geometry", 
            //,   {"hierarchyid", 
            ,   {"image", DbType.Binary }
            ,   {"int", DbType.Int32 }
            ,   {"money", DbType.Decimal }
            ,   {"nchar", DbType.StringFixedLength }
            ,   {"ntext", DbType.String }
            ,   {"numeric", DbType.Decimal }
            ,   {"nvarchar", DbType.String }
            ,   {"real", DbType.Single }
            ,   {"smalldatetime", DbType.DateTime }
            ,   {"smallint", DbType.Int16 }
            ,   {"smallmoney", DbType.Decimal }
            ,   {"sql_variant", DbType.Object }
            ,   {"sysname", DbType.String }
            ,   {"text", DbType.String }
            ,   {"time", DbType.Time }
            ,   {"timestamp", DbType.Binary }
            ,   {"tinyint", DbType.Byte }
            ,   {"uniqueidentifier", DbType.Guid }
            ,   {"varbinary", DbType.Binary }
            ,   {"varchar", DbType.AnsiString }
            ,   {"xml", DbType.Xml }
            };
            
            try
            {
                return translate[fixedName];
            }
            catch
            {
                return System.Data.DbType.UInt64;
            }
        }
    <#+
    

    Good grief, that's a lot of code, how do I use it? The basic usage would be something like

        <Tables>
            <#=  GetAstTableNodeFromQuery(this.RootNode.OleDbConnections["msdb"], "SELECT 100 AS demo", "dbo", "DemoQuery").GetBiml() #>
        </Tables>

    The call to GetAstTableNodeFromQuery return an AstTableNode which is great, but what we really want is the Biml behind it so we chain a call to .GetBiml() onto the end.

    What would make that better though is to make it a little more dynamic. Let's improve the code to create tables based on a pairs of names and queries. I'm going to use a Dictionary called namedQueries to hold the names and queries and then enumerate through them, calling our GetAstTableNodeFromQuery for each entry.

    <#
        Dictionary<string, string> namedQueries = new Dictionary<string,string>{{"Query 28", @"-- Drive level latency information (Query 28) (Drive Level Latency)
    -- Based on code from Jimmy May
    SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 
        CASE 
            WHEN num_of_reads = 0 THEN 0 
            ELSE (io_stall_read_ms/num_of_reads) 
        END AS [Read Latency],
        CASE 
            WHEN num_of_writes = 0 THEN 0 
            ELSE (io_stall_write_ms/num_of_writes) 
        END AS [Write Latency],
        CASE 
            WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
            ELSE (io_stall/(num_of_reads + num_of_writes)) 
        END AS [Overall Latency],
        CASE 
            WHEN num_of_reads = 0 THEN 0 
            ELSE (num_of_bytes_read/num_of_reads) 
        END AS [Avg Bytes/Read],
        CASE 
            WHEN num_of_writes = 0 THEN 0 
            ELSE (num_of_bytes_written/num_of_writes) 
        END AS [Avg Bytes/Write],
        CASE 
            WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
            ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 
        END AS [Avg Bytes/Transfer]
    FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
                 SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
                 SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
                 SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 
          FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
          INNER JOIN sys.master_files AS mf WITH (NOLOCK)
          ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
          CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 
          GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
    ORDER BY [Overall Latency] OPTION (RECOMPILE);"}};
    #>
        <Tables>
            <# foreach(var kvp in namedQueries){ #>
            <#=  GetAstTableNodeFromQuery(this.RootNode.OleDbConnections["msdb"], kvp.Value, "dbo", kvp.Key).GetBiml() #>
            <# } #>
        </Tables>
    

    How can we improve this? Let's get rid of the hard coded query names and actual queries. Tune in to the next installment to see how we'll make that work.

    Full code is over on github

    Tuesday, August 29, 2017

    SQL Server Query Metadata

    SQL Server Query Metadata

    Pop quiz, how you determine the metadata of a query in SQL Server? For a table, you can query the sys.schemas/sys.tables/sys.columns tables but a query? You might start pulling the query apart and looking up each column and its metadata but then you have to factor in function calls and suddenly, you're writing a parser within your query and you have an infinite recursion error.

    But, if you're on SQL Server 2012+, you have a friend in sys.dm_exec_describe_first_result_set.

    Let's start with a random query from Glen Berry's diagnostic query set

    -- Drive level latency information (Query 28) (Drive Level Latency)
    -- Based on code from Jimmy May
    SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 
        CASE 
            WHEN num_of_reads = 0 THEN 0 
            ELSE (io_stall_read_ms/num_of_reads) 
        END AS [Read Latency],
        CASE 
            WHEN num_of_writes = 0 THEN 0 
            ELSE (io_stall_write_ms/num_of_writes) 
        END AS [Write Latency],
        CASE 
            WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
            ELSE (io_stall/(num_of_reads + num_of_writes)) 
        END AS [Overall Latency],
        CASE 
            WHEN num_of_reads = 0 THEN 0 
            ELSE (num_of_bytes_read/num_of_reads) 
        END AS [Avg Bytes/Read],
        CASE 
            WHEN num_of_writes = 0 THEN 0 
            ELSE (num_of_bytes_written/num_of_writes) 
        END AS [Avg Bytes/Write],
        CASE 
            WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
            ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 
        END AS [Avg Bytes/Transfer]
    FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
                 SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
                 SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
                 SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 
          FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
          INNER JOIN sys.master_files AS mf WITH (NOLOCK)
          ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
          CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 
          GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
    ORDER BY [Overall Latency] OPTION (RECOMPILE);

    DriveVolume Mount PointRead LatencyWrite LatencyOverall LatencyAvg Bytes/ReadAvg Bytes/WriteAvg Bytes/Transfer
    C:C:\00064447449331990

    The results of the query aren't exciting, but what are the columns and expected data types? Pre-2012, most people dump the query results into a table with an impossible filter like WHERE 1=2 and then query the above system tables.

    With the power of SQL Server 2012+, let's see what we can do. I'm going to pass in as the first argument our query and specify NULL for the next two parameters.

    SELECT
        DEDFRS.column_ordinal
    ,   DEDFRS.name
    ,   DEDFRS.is_nullable
    ,   DEDFRS.system_type_name
    ,   DEDFRS.max_length
    ,   DEDFRS.precision
    ,   DEDFRS.scale
    FROM
        sys.dm_exec_describe_first_result_set(N'
    SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 
        CASE 
            WHEN num_of_reads = 0 THEN 0 
            ELSE (io_stall_read_ms/num_of_reads) 
        END AS [Read Latency],
        CASE 
            WHEN num_of_writes = 0 THEN 0 
            ELSE (io_stall_write_ms/num_of_writes) 
        END AS [Write Latency],
        CASE 
            WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
            ELSE (io_stall/(num_of_reads + num_of_writes)) 
        END AS [Overall Latency],
        CASE 
            WHEN num_of_reads = 0 THEN 0 
            ELSE (num_of_bytes_read/num_of_reads) 
        END AS [Avg Bytes/Read],
        CASE 
            WHEN num_of_writes = 0 THEN 0 
            ELSE (num_of_bytes_written/num_of_writes) 
        END AS [Avg Bytes/Write],
        CASE 
            WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
            ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 
        END AS [Avg Bytes/Transfer]
    FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
                 SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
                 SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
                 SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 
          FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
          INNER JOIN sys.master_files AS mf WITH (NOLOCK)
          ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
          CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 
          GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
    ORDER BY [Overall Latency] OPTION (RECOMPILE);', NULL, NULL) AS DEDFRS;

    Look at our results. Now you can see the column names from our query, their basic type and whether they're nullable. That's pretty freaking handy.

    column_ordinalnameis_nullablesystem_type_namemax_lengthprecisionscale
    1Drive1nvarchar(2)400
    2Volume Mount Point1nvarchar(256)51200
    3Read Latency1bigint8190
    4Write Latency1bigint8190
    5Overall Latency1bigint8190
    6Avg Bytes/Read1bigint8190
    7Avg Bytes/Write1bigint8190
    8Avg Bytes/Transfer1bigint8190

    I'm thinking that I can use this technique against an arbitrary source of queries to build out the result tables and then ETL data into them. That should simplify my staging step for table loads. What can you use this for? Add links in the comments to how you use sys.dm_exec_describe_first_result_set

    Monday, August 28, 2017

    Python pyinstaller erroring with takes 4 positional arguments but 5 were given

    Pyinstaller is a program for turning python files into executable programs. This is helpful as it removes the requirement for having the python interpreter installed on a target computer. What was really weird was I could generate a multi-file package (pyinstaller .\MyFile.py) but not a onefile version.

    C:\tmp>pyinstaller -onefile .\MyFile.py
    
    Traceback (most recent call last):
      File "C:\Program Files (x86)\Python35-32\Scripts\pyinstaller-script.py", line 9, in 
        load_entry_point('PyInstaller==3.2.1', 'console_scripts', 'pyinstaller')()
      File "c:\program files (x86)\python35-32\lib\site-packages\PyInstaller\__main__.py", line 73, in run
        args = parser.parse_args(pyi_args)
      File "c:\program files (x86)\python35-32\lib\argparse.py", line 1727, in parse_args
        args, argv = self.parse_known_args(args, namespace)
      File "c:\program files (x86)\python35-32\lib\argparse.py", line 1759, in parse_known_args
        namespace, args = self._parse_known_args(args, namespace)
      File "c:\program files (x86)\python35-32\lib\argparse.py", line 1967, in _parse_known_args
        start_index = consume_optional(start_index)
      File "c:\program files (x86)\python35-32\lib\argparse.py", line 1907, in consume_optional
        take_action(action, args, option_string)
      File "c:\program files (x86)\python35-32\lib\argparse.py", line 1835, in take_action
        action(self, namespace, argument_values, option_string)
    TypeError: __call__() takes 4 positional arguments but 5 were given
    

    What's the root cause? The argument is --onefile not -onefile.

    Tuesday, August 15, 2017

    Biml build Database collection nodes

    Biml build Database collection nodes aka what good are Annotations

    In many of the walkthroughs on creating relational objects via Biml, it seems like people skim over the Databases collection. There's nothing built into the language to really support the creation of database nodes. The import database operations are focused on tables and schemas and assume the database node(s) have been created. I hate assumptions.

    Connections.biml

    Add a biml file to your project to house your connection information. I'm going to create two connections to the same Adventureworks 2014 database.

    <Biml xmlns="http://schemas.varigence.com/biml.xsd">
        <Connections>
            <OleDbConnection Name="Aw2014" ConnectionString="Provider=SQLNCLI11;Server=localhost\dev2016;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;" />
            <OleDbConnection Name="Aw2014Annotated" ConnectionString="Provider=SQLNCLI11;Server=localhost\dev2016;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;">
                <Annotations>
                    <Annotation Tag="DatabaseName">AdventureWorks2014</Annotation>
                    <Annotation Tag="ServerName">localhost\dev2016</Annotation>
                    <Annotation Tag="Provider">SQLNCLI11</Annotation>
                </Annotations>
            </OleDbConnection>
        </Connections>
    </Biml>

    The only material difference between the first and second OleDbConnection node is the declaration of Annotations on the second instance. Annotations are free form entities that allow you to enrich your Biml with more metadata. Here I've duplicated the DatabaseName, ServerName and Provider properties from my connection string. As you'll see in the upcoming BimlScript, prior proper planning prevents poor performance.

    Databases.biml

    The Databases node is a collection of AstDatabaseNode. They require a Name and a ConnectionName to be valid. Let's look at how we can construct these nodes based on information in our project. The first question I'd ask is what metadata do I readily have available? My Connections collection - that is already built out so I can enumerate through the items in there to populate the value of the ConnectionName. The only remaining item then is the Name for our database. I can see three ways of populating it: parsing the connection string for the database name, instantiating the connection manager and querying the database name from the RDBMS, or pulling the database name from our Annotations collection.

    The basic approach would take the form

    <Databases>
    <#
        string databaseName = string.Empty;
    
        foreach(AstOleDbConnectionNode conn in this.RootNode.OleDbConnections)
        {
            databaseName = "unknown";
            // Logic goes here!
    #>
            <Database Name="<#= conn.Name #>.<#= databaseName#>" ConnectionName="<#= conn.Name #>" />
    <#
        }
    #>
    </Databases>

    The logic we stuff in there can be as simple or complex as needed but the end result would be a well formed Database node.

    Parsing

    Connection strings are delimited (semicolon) key value pairs unique to the connection type. In this approach we'll split the connection string by semicolon and then split each resulting entity by the equals sign.

             // use string parsing and prayer
             try
             {
                string [] kVpish;
                KeyValuePair<string, string> kvp ;
                // This is probably the most fragile as it needs to take into consideration all the
                // possible connection styles. Get well acquainted with ConnectionStrings.com
                // Split our connnection string based on the delimiter of a semicolon
                foreach (var element in conn.ConnectionString.Split(new char [] {';'}))
                {
                    kVpish = element.Split(new char[]{'='});
                    if(kVpish.Count() > 1)
                    {
                        kvp = new KeyValuePair<string, string>(kVpish[0], kVpish[1]);
                        if (String.Compare("Initial Catalog", kvp.Key, StringComparison.InvariantCultureIgnoreCase) == 0)
                        {
                            databaseName = kvp.Value;
                        }
                    }
                }
             }
             catch (Exception ex)
             {
                 databaseName = string.Format("{0}_{1}", "Error", ex.ToString());
             }
    

    The challenge with this approach is that it's a fragile approach. As ConnectionStrings.com can attest, there are a lot of ways of constructing a connection string and what it denotes as the database name property.

    Query

    Another approach would be to instantiate the connection manager and then query the information schema equivalent and ask the database what the database name is. SQL Server makes this easy

                // Or we can use database connectivity
                // This query would need to have intelligence built into it to issue correct statement
                // per database. This works for SQL Server only
                string queryDatabaseName = @"SELECT db_name(db_id()) AS CurrentDB;";
                System.Data.DataTable dt = null;
                dt = ExternalDataAccess.GetDataTable(conn, queryDatabaseName);
                foreach (System.Data.DataRow row in dt.Rows)
                {
                    databaseName = row[0].ToString();
                }

    The downside to this is that, much like parsing the connection string it's going to be provider specific. Plus, this will be the slowest due to the cost of instantiating connections. Not to mention ensuring the build server has all the correct drivers installed.

    Annotations

    "Prior proper planning prevents poor performance" so let's spend a moment up front to define our metadata and then use Linq to extract that information. Since we can't guarantee that a connection node has an Annotation tag named DatabaseName, we need to test for the existence (Any) and if we find one, we'll extract the value.

                if (conn.Annotations.Any(an => an.Tag=="DatabaseName"))
                {
                    // We use the Select method to pull out only the thing, Text, that we are interested in
                    databaseName = conn.Annotations.Where(an => an.Tag=="DatabaseName").Select(t => t.Text).First().ToString();
                }
    

    The downside to this approach is that it requires planning and a bit of double entry as you need to keep the metadata (annotations) synchronized with the actual connection string. But since we're automating kind of people, that shouldn't be a problem...

    Databases.biml

    Putting it all together, our Databases.biml file becomes

    <Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <#@ template tier="10" #>
        <Databases>
        <#
        foreach(AstOleDbConnectionNode conn in this.RootNode.OleDbConnections)
        {
            string databaseName = "unknown";
            
            // Test whether the annotation collection contains a tag named DatabaseName
            if (conn.Annotations.Any(an => an.Tag=="DatabaseName"))
            {
                // We use the Select method to pull out only the thing, Text, that we are interested in
                databaseName = conn.Annotations.Where(an => an.Tag=="DatabaseName").Select(t => t.Text).First().ToString();
            }
            else
            {
                // No annotations found
                bool useStringParsing = true;
                if (useStringParsing)
                {
             // use string parsing and prayer
             try
             {
                string [] kVpish;
                KeyValuePair<string, string> kvp ;
                // This is probably the most fragile as it needs to take into consideration all the
                // possible connection styles. Get well acquainted with ConnectionStrings.com
                // Split our connnection string based on the delimiter of a semicolon
                foreach (var element in conn.ConnectionString.Split(new char [] {';'}))
                {
                    kVpish = element.Split(new char[]{'='});
                    if(kVpish.Count() > 1)
                    {
                        kvp = new KeyValuePair<string, string>(kVpish[0], kVpish[1]);
                        if (String.Compare("Initial Catalog", kvp.Key, StringComparison.InvariantCultureIgnoreCase) == 0)
                        {
                            databaseName = kvp.Value;
                        }
                    }
                }
             }
             catch (Exception ex)
             {
                 databaseName = string.Format("{0}_{1}", "Error", ex.ToString());
             }
                }
                else
                {
                    // Or we can use database connectivity
                    // This query would need to have intelligence built into it to issue correct statement
                    // per database. This works for SQL Server only
                    string queryDatabaseName = @"";
                    System.Data.DataTable dt = null;
                    dt = ExternalDataAccess.GetDataTable(conn, queryDatabaseName);
                    foreach (System.Data.DataRow row in dt.Rows)
                    {
                        databaseName = row[0].ToString();
                    }
                }
                
            }
            
        #>
            <Database Name="<#= conn.Name #>.<#= databaseName#>" ConnectionName="<#= conn.Name #>" />
            <#
            }
            #>
        </Databases>
    </Biml>

    And that's what it takes to use BimlScript to build out the Database collection nodes based on Annotation, string paring or database querying. Use Annotations to enrich your Biml objects with good metadata and then you can use it to simplify future operations.

    Friday, July 28, 2017

    Generate TSQL time slices

    I had some log data I wanted to bucket into 15 second time slices and I figured if I have solved this once, I will need to do it again so to the blog machine! This will use the LEAD, TIMEFROMPARTS and ROW_NUMBER() to accomplish this.
    SELECT
        D.Slice AS SliceStart
    ,   LEAD
        (
            D.Slice
        ,   1
            -- Default to midnight
        ,   TIMEFROMPARTS(0,0,0,0,0)
        )
        OVER (ORDER BY D.Slice) AS SliceStop
    ,   ROW_NUMBER() OVER (ORDER BY D.Slice) AS SliceLabel
    FROM
    (
        -- Generate 15 second time slices
        SELECT 
            TIMEFROMPARTS(A.rn, B.rn, C.rn, 0, 0) AS Slice
        FROM
            (SELECT TOP (24) -1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM sys.all_objects AS AO) AS A(rn)
            CROSS APPLY (SELECT TOP (60) (-1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL))) FROM sys.all_objects AS AO) AS B(rn)
            -- 4 values since we'll aggregate to 15 seconds
            CROSS APPLY (SELECT TOP (4) (-1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL))) * 15  FROM sys.all_objects AS AO) AS C(rn)
    ) D
    

    That looks like a lot, but it really isn't. Starting from the first inner most query, we select the top 24 rows from sys.all_objects and use the ROW_NUMBER function to generate us a monotonically increasing set of values, thus 1...24. However, since the allowable range of hours is 0 to 23, I deduct one from this value (A). I repeat this pattern to generate minutes (B) except we get the top 60. Since I want 15 second intervals, for the seconds query, I only get the top 4 values. I deduct one so we have {0,1,2,3} and then multiply by 15 to get my increments (C). If you want different time slices, that's how I would modify this pattern.

    Finally having 3 columns of numbers, I use TIMEFROMPARTS to build a time data type with the least amount of precision and present that as "Slice" and encapsulate that a derived table (D). Running that query gets me a list of periods but I don't know what the end period is.

    We can calculate the end period by using the LEAD function. I present my original Slice as SliceStart. I then use the LEAD function to calculate the next (1) value based on the Slice column. In the case of 23:59:45, the "next" value in our data set is NULL. To address that scenario, we pass in a the default value for the lead function.