World of Whatever

A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Tuesday, September 5, 2017

Biml Query Table Builder

Biml Query Table Builder

We previously noted the awesomeness that is SQL Server 2012+'s ability to expose a query's metadata. Let's look how we can couple that information with creating Biml Table objects.

Prep work

Add a static Biml file to your project that defines and OLE DB Connection and then a database and schema. e.g.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="msdb" ConnectionString="Provider=SQLNCLI11;Data Source=localhost\dev2016;Integrated Security=SSPI;Initial Catalog=msdb" />
    </Connections>
    <Databases>
        <Database Name="msdb" ConnectionName="msdb" />
    </Databases>
    <Schemas>
        <Schema Name="dbo" DatabaseName="msdb" />
    </Schemas>
</Biml>

Now that we have a database connection named msdb and a valid database and schema, save the file and let's get into the good stuff.

Given the reference query in previous post, Drive level latency information, we would need to declare the following Biml within our Tables collection.

        <Table Name="Query 28" SchemaName="msdb.dbo">
            <Columns>
                <Column Name="Drive" DataType="String" Length="4" Precision="0" Scale="0" IsNullable="true" />
                <Column Name="Volume Mount Point" DataType="String" Length="512" Precision="0" Scale="0" IsNullable="true" />
                <Column Name="Read Latency" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                <Column Name="Write Latency" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                <Column Name="Overall Latency" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                <Column Name="Avg Bytes/Read" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                <Column Name="Avg Bytes/Write" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
                <Column Name="Avg Bytes/Transfer" DataType="Int64" Length="8" Precision="19" Scale="0" IsNullable="true" />
            </Columns>
        </Table>

That could be accomplished purely within the declarative nature of Biml wherein we do lots of text nuggets <#= "foo" #> but that's going to be ugly to maintain as there's a lot of conditional logic to muck with. Instead, I'm going to create a C# method that returns the the Biml table object (AstTableNode). To do that, we will need to create a Biml Class nugget <#+ #>. I ended up creating two methods: GetAstTableNodeFromQuery and then a helper method to translate the SQL Server data types into something Biml understood.

<#+
    /// <summary>
    /// Build out a Biml table based on the supplied query and connection. 
    /// This assumes a valid SQL Server 2012+ OLEDB Connection is provided but the approach
    /// can be adapted based on providers and information schemas.
    /// We further assume that column names in the query are unique.
    /// </summary>
    /// <param name="connection">An OleDbConnection</param>
    /// <param name="query">A SQL query</param>
    /// <param name="schemaName">The schema our table should be created in</param>
    /// <param name="queryName">A name for our query</param>
    /// <returns>Best approximation of a SQL Server data type</returns>
    public AstTableNode GetAstTableNodeFromQuery(AstOleDbConnectionNode connection, string query, string schemaName, string queryName)
    {
        string template = @"SELECT
    DEDFRS.name
,   DEDFRS.is_nullable
,   DEDFRS.system_type_name
,   DEDFRS.max_length
,   DEDFRS.precision
,   DEDFRS.scale
FROM
    sys.dm_exec_describe_first_result_set(N'{0}', NULL, NULL) AS DEDFRS ORDER BY DEDFRS.column_ordinal;";
        AstTableNode atn = null;

        atn = new AstTableNode(null);
        atn.Name = queryName;
        atn.Schema = this.RootNode.Schemas[schemaName];
        string queryActual = string.Format(template, query.Replace("'", "''"));
        
        string colName = string.Empty;
        string typeText = string.Empty;
        System.Data.DbType dbt = DbType.UInt16;
        int length = 0;
        int precision = 0;
        int scale = 0;

        try
        {
            System.Data.DataTable dt = null;
            dt = ExternalDataAccess.GetDataTable(connection, queryActual);
            foreach (System.Data.DataRow row in dt.Rows)
            {
                try
                {
                    AstTableColumnBaseNode col = new AstTableColumnNode(atn);
                    // This can be empty -- see DBCC TRACESTATUS (-1)
                    if(row[0] == DBNull.Value)
                    {
                        atn.Annotations.Add(new AstAnnotationNode(atn){Tag = "Invalid", Text = "No Metadata generated"});
                        break;
                    }
                    else
                    {
                        colName = row[0].ToString();
                    }
                    
                    typeText = row[2].ToString();
                    dbt = TranslateSqlServerTypes(row[2].ToString());
                    length = int.Parse(row[3].ToString());
                    precision = int.Parse(row[4].ToString());
                    scale = int.Parse(row[5].ToString());
                    
                    col.Name = colName;
                    col.IsNullable = (bool)row[1];
                    col.DataType = dbt;
                    col.Length = length;
                    col.Precision = precision;
                    col.Scale = scale;
                    
                    atn.Columns.Add(col);
                }
                catch (Exception ex)
                {
                    // Something went awry with making a column for our table
                    AstTableColumnBaseNode col = new AstTableColumnNode(atn);
                    col.Name = "FailureColumn";
                    col.Annotations.Add(new AstAnnotationNode(col){Tag = "colName", Text = colName});
                    col.Annotations.Add(new AstAnnotationNode(col){Tag = "typeText", Text = typeText});
                    col.Annotations.Add(new AstAnnotationNode(col){Tag = "dbtype", Text = dbt.ToString()});
                    col.Annotations.Add(new AstAnnotationNode(col){Tag = "Error", Text = ex.Message});
                    col.Annotations.Add(new AstAnnotationNode(col){Tag = "Stack", Text = ex.StackTrace});
                    atn.Columns.Add(col);
                }
            }
        }
        catch (Exception ex)
        {
            // Table level failures
            AstTableColumnBaseNode col = new AstTableColumnNode(atn);
            col.Name = "Failure";
            col.Annotations.Add(new AstAnnotationNode(col){Tag = "Error", Text = ex.ToString()});
            col.Annotations.Add(new AstAnnotationNode(col){Tag = "SourceQuery", Text = query});
            col.Annotations.Add(new AstAnnotationNode(col){Tag = "QueryActual", Text = queryActual});
            atn.Columns.Add(col);
        }
        return atn;
    }
    
    /// <summary>
    /// A rudimentary method to convert SQL Server data types to Biml types. Doesn't cover
    /// UDDT, sql_variant(well)
    /// </summary>
    /// <param name="typeName">Data type with optional length/scale/precision</param>
    /// <returns>Best approximation of a SQL Server data type</returns>
    public DbType TranslateSqlServerTypes(string typeName)
    {
        // typeName might contain length - strip it
        string fixedName = typeName;
        if(typeName.Contains("("))
        {
            fixedName = typeName.Substring(0, typeName.IndexOf("("));
        }
        // Approximate translation of https://msdn.microsoft.com/en-us/library/System.Data.DbType.aspx
        // https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-data-type-mappings
        Dictionary<string, DbType> translate = new Dictionary<string, DbType> {
            
            {"bigint", DbType.Int64 }
        ,   {"binary", DbType.Binary }
        ,   {"bit", DbType.Boolean }
        ,   {"char", DbType.AnsiStringFixedLength }
        ,   {"date", DbType.Date }
        ,   {"datetime", DbType.DateTime }
        ,   {"datetime2", DbType.DateTime2 }
        ,   {"datetimeoffset", DbType.DateTimeOffset }
        ,   {"decimal", DbType.Decimal }
        ,   {"float", DbType.Double }
        //,   {"geography", 
        //,   {"geometry", 
        //,   {"hierarchyid", 
        ,   {"image", DbType.Binary }
        ,   {"int", DbType.Int32 }
        ,   {"money", DbType.Decimal }
        ,   {"nchar", DbType.StringFixedLength }
        ,   {"ntext", DbType.String }
        ,   {"numeric", DbType.Decimal }
        ,   {"nvarchar", DbType.String }
        ,   {"real", DbType.Single }
        ,   {"smalldatetime", DbType.DateTime }
        ,   {"smallint", DbType.Int16 }
        ,   {"smallmoney", DbType.Decimal }
        ,   {"sql_variant", DbType.Object }
        ,   {"sysname", DbType.String }
        ,   {"text", DbType.String }
        ,   {"time", DbType.Time }
        ,   {"timestamp", DbType.Binary }
        ,   {"tinyint", DbType.Byte }
        ,   {"uniqueidentifier", DbType.Guid }
        ,   {"varbinary", DbType.Binary }
        ,   {"varchar", DbType.AnsiString }
        ,   {"xml", DbType.Xml }
        };
        
        try
        {
            return translate[fixedName];
        }
        catch
        {
            return System.Data.DbType.UInt64;
        }
    }
<#+

Good grief, that's a lot of code, how do I use it? The basic usage would be something like

    <Tables>
        <#=  GetAstTableNodeFromQuery(this.RootNode.OleDbConnections["msdb"], "SELECT 100 AS demo", "dbo", "DemoQuery").GetBiml() #>
    </Tables>

The call to GetAstTableNodeFromQuery return an AstTableNode which is great, but what we really want is the Biml behind it so we chain a call to .GetBiml() onto the end.

What would make that better though is to make it a little more dynamic. Let's improve the code to create tables based on a pairs of names and queries. I'm going to use a Dictionary called namedQueries to hold the names and queries and then enumerate through them, calling our GetAstTableNodeFromQuery for each entry.

<#
    Dictionary<string, string> namedQueries = new Dictionary<string,string>{{"Query 28", @"-- Drive level latency information (Query 28) (Drive Level Latency)
-- Based on code from Jimmy May
SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 
    CASE 
        WHEN num_of_reads = 0 THEN 0 
        ELSE (io_stall_read_ms/num_of_reads) 
    END AS [Read Latency],
    CASE 
        WHEN num_of_writes = 0 THEN 0 
        ELSE (io_stall_write_ms/num_of_writes) 
    END AS [Write Latency],
    CASE 
        WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
        ELSE (io_stall/(num_of_reads + num_of_writes)) 
    END AS [Overall Latency],
    CASE 
        WHEN num_of_reads = 0 THEN 0 
        ELSE (num_of_bytes_read/num_of_reads) 
    END AS [Avg Bytes/Read],
    CASE 
        WHEN num_of_writes = 0 THEN 0 
        ELSE (num_of_bytes_written/num_of_writes) 
    END AS [Avg Bytes/Write],
    CASE 
        WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
        ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 
    END AS [Avg Bytes/Transfer]
FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
             SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
             SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
             SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 
      FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
      INNER JOIN sys.master_files AS mf WITH (NOLOCK)
      ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
      CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 
      GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
ORDER BY [Overall Latency] OPTION (RECOMPILE);"}};
#>
    <Tables>
        <# foreach(var kvp in namedQueries){ #>
        <#=  GetAstTableNodeFromQuery(this.RootNode.OleDbConnections["msdb"], kvp.Value, "dbo", kvp.Key).GetBiml() #>
        <# } #>
    </Tables>

How can we improve this? Let's get rid of the hard coded query names and actual queries. Tune in to the next installment to see how we'll make that work.

Full code is over on github

Tuesday, August 29, 2017

SQL Server Query Metadata

SQL Server Query Metadata

Pop quiz, how you determine the metadata of a query in SQL Server? For a table, you can query the sys.schemas/sys.tables/sys.columns tables but a query? You might start pulling the query apart and looking up each column and its metadata but then you have to factor in function calls and suddenly, you're writing a parser within your query and you have an infinite recursion error.

But, if you're on SQL Server 2012+, you have a friend in sys.dm_exec_describe_first_result_set.

Let's start with a random query from Glen Berry's diagnostic query set

-- Drive level latency information (Query 28) (Drive Level Latency)
-- Based on code from Jimmy May
SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 
    CASE 
        WHEN num_of_reads = 0 THEN 0 
        ELSE (io_stall_read_ms/num_of_reads) 
    END AS [Read Latency],
    CASE 
        WHEN num_of_writes = 0 THEN 0 
        ELSE (io_stall_write_ms/num_of_writes) 
    END AS [Write Latency],
    CASE 
        WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
        ELSE (io_stall/(num_of_reads + num_of_writes)) 
    END AS [Overall Latency],
    CASE 
        WHEN num_of_reads = 0 THEN 0 
        ELSE (num_of_bytes_read/num_of_reads) 
    END AS [Avg Bytes/Read],
    CASE 
        WHEN num_of_writes = 0 THEN 0 
        ELSE (num_of_bytes_written/num_of_writes) 
    END AS [Avg Bytes/Write],
    CASE 
        WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
        ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 
    END AS [Avg Bytes/Transfer]
FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
             SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
             SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
             SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 
      FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
      INNER JOIN sys.master_files AS mf WITH (NOLOCK)
      ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
      CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 
      GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
ORDER BY [Overall Latency] OPTION (RECOMPILE);

DriveVolume Mount PointRead LatencyWrite LatencyOverall LatencyAvg Bytes/ReadAvg Bytes/WriteAvg Bytes/Transfer
C:C:\00064447449331990

The results of the query aren't exciting, but what are the columns and expected data types? Pre-2012, most people dump the query results into a table with an impossible filter like WHERE 1=2 and then query the above system tables.

With the power of SQL Server 2012+, let's see what we can do. I'm going to pass in as the first argument our query and specify NULL for the next two parameters.

SELECT
    DEDFRS.column_ordinal
,   DEDFRS.name
,   DEDFRS.is_nullable
,   DEDFRS.system_type_name
,   DEDFRS.max_length
,   DEDFRS.precision
,   DEDFRS.scale
FROM
    sys.dm_exec_describe_first_result_set(N'
SELECT tab.[Drive], tab.volume_mount_point AS [Volume Mount Point], 
    CASE 
        WHEN num_of_reads = 0 THEN 0 
        ELSE (io_stall_read_ms/num_of_reads) 
    END AS [Read Latency],
    CASE 
        WHEN num_of_writes = 0 THEN 0 
        ELSE (io_stall_write_ms/num_of_writes) 
    END AS [Write Latency],
    CASE 
        WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
        ELSE (io_stall/(num_of_reads + num_of_writes)) 
    END AS [Overall Latency],
    CASE 
        WHEN num_of_reads = 0 THEN 0 
        ELSE (num_of_bytes_read/num_of_reads) 
    END AS [Avg Bytes/Read],
    CASE 
        WHEN num_of_writes = 0 THEN 0 
        ELSE (num_of_bytes_written/num_of_writes) 
    END AS [Avg Bytes/Write],
    CASE 
        WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
        ELSE ((num_of_bytes_read + num_of_bytes_written)/(num_of_reads + num_of_writes)) 
    END AS [Avg Bytes/Transfer]
FROM (SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive, SUM(num_of_reads) AS num_of_reads,
             SUM(io_stall_read_ms) AS io_stall_read_ms, SUM(num_of_writes) AS num_of_writes,
             SUM(io_stall_write_ms) AS io_stall_write_ms, SUM(num_of_bytes_read) AS num_of_bytes_read,
             SUM(num_of_bytes_written) AS num_of_bytes_written, SUM(io_stall) AS io_stall, vs.volume_mount_point 
      FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
      INNER JOIN sys.master_files AS mf WITH (NOLOCK)
      ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
      CROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.[file_id]) AS vs 
      GROUP BY LEFT(UPPER(mf.physical_name), 2), vs.volume_mount_point) AS tab
ORDER BY [Overall Latency] OPTION (RECOMPILE);', NULL, NULL) AS DEDFRS;

Look at our results. Now you can see the column names from our query, their basic type and whether they're nullable. That's pretty freaking handy.

column_ordinalnameis_nullablesystem_type_namemax_lengthprecisionscale
1Drive1nvarchar(2)400
2Volume Mount Point1nvarchar(256)51200
3Read Latency1bigint8190
4Write Latency1bigint8190
5Overall Latency1bigint8190
6Avg Bytes/Read1bigint8190
7Avg Bytes/Write1bigint8190
8Avg Bytes/Transfer1bigint8190

I'm thinking that I can use this technique against an arbitrary source of queries to build out the result tables and then ETL data into them. That should simplify my staging step for table loads. What can you use this for? Add links in the comments to how you use sys.dm_exec_describe_first_result_set

Monday, August 28, 2017

Python pyinstaller erroring with takes 4 positional arguments but 5 were given

Pyinstaller is a program for turning python files into executable programs. This is helpful as it removes the requirement for having the python interpreter installed on a target computer. What was really weird was I could generate a multi-file package (pyinstaller .\MyFile.py) but not a onefile version.

C:\tmp>pyinstaller -onefile .\MyFile.py

Traceback (most recent call last):
  File "C:\Program Files (x86)\Python35-32\Scripts\pyinstaller-script.py", line 9, in 
    load_entry_point('PyInstaller==3.2.1', 'console_scripts', 'pyinstaller')()
  File "c:\program files (x86)\python35-32\lib\site-packages\PyInstaller\__main__.py", line 73, in run
    args = parser.parse_args(pyi_args)
  File "c:\program files (x86)\python35-32\lib\argparse.py", line 1727, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "c:\program files (x86)\python35-32\lib\argparse.py", line 1759, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "c:\program files (x86)\python35-32\lib\argparse.py", line 1967, in _parse_known_args
    start_index = consume_optional(start_index)
  File "c:\program files (x86)\python35-32\lib\argparse.py", line 1907, in consume_optional
    take_action(action, args, option_string)
  File "c:\program files (x86)\python35-32\lib\argparse.py", line 1835, in take_action
    action(self, namespace, argument_values, option_string)
TypeError: __call__() takes 4 positional arguments but 5 were given

What's the root cause? The argument is --onefile not -onefile.

Tuesday, August 15, 2017

Biml build Database collection nodes

Biml build Database collection nodes aka what good are Annotations

In many of the walkthroughs on creating relational objects via Biml, it seems like people skim over the Databases collection. There's nothing built into the language to really support the creation of database nodes. The import database operations are focused on tables and schemas and assume the database node(s) have been created. I hate assumptions.

Connections.biml

Add a biml file to your project to house your connection information. I'm going to create two connections to the same Adventureworks 2014 database.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="Aw2014" ConnectionString="Provider=SQLNCLI11;Server=localhost\dev2016;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;" />
        <OleDbConnection Name="Aw2014Annotated" ConnectionString="Provider=SQLNCLI11;Server=localhost\dev2016;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;">
            <Annotations>
                <Annotation Tag="DatabaseName">AdventureWorks2014</Annotation>
                <Annotation Tag="ServerName">localhost\dev2016</Annotation>
                <Annotation Tag="Provider">SQLNCLI11</Annotation>
            </Annotations>
        </OleDbConnection>
    </Connections>
</Biml>

The only material difference between the first and second OleDbConnection node is the declaration of Annotations on the second instance. Annotations are free form entities that allow you to enrich your Biml with more metadata. Here I've duplicated the DatabaseName, ServerName and Provider properties from my connection string. As you'll see in the upcoming BimlScript, prior proper planning prevents poor performance.

Databases.biml

The Databases node is a collection of AstDatabaseNode. They require a Name and a ConnectionName to be valid. Let's look at how we can construct these nodes based on information in our project. The first question I'd ask is what metadata do I readily have available? My Connections collection - that is already built out so I can enumerate through the items in there to populate the value of the ConnectionName. The only remaining item then is the Name for our database. I can see three ways of populating it: parsing the connection string for the database name, instantiating the connection manager and querying the database name from the RDBMS, or pulling the database name from our Annotations collection.

The basic approach would take the form

<Databases>
<#
    string databaseName = string.Empty;

    foreach(AstOleDbConnectionNode conn in this.RootNode.OleDbConnections)
    {
        databaseName = "unknown";
        // Logic goes here!
#>
        <Database Name="<#= conn.Name #>.<#= databaseName#>" ConnectionName="<#= conn.Name #>" />
<#
    }
#>
</Databases>

The logic we stuff in there can be as simple or complex as needed but the end result would be a well formed Database node.

Parsing

Connection strings are delimited (semicolon) key value pairs unique to the connection type. In this approach we'll split the connection string by semicolon and then split each resulting entity by the equals sign.

         // use string parsing and prayer
         try
         {
            string [] kVpish;
            KeyValuePair<string, string> kvp ;
            // This is probably the most fragile as it needs to take into consideration all the
            // possible connection styles. Get well acquainted with ConnectionStrings.com
            // Split our connnection string based on the delimiter of a semicolon
            foreach (var element in conn.ConnectionString.Split(new char [] {';'}))
            {
                kVpish = element.Split(new char[]{'='});
                if(kVpish.Count() > 1)
                {
                    kvp = new KeyValuePair<string, string>(kVpish[0], kVpish[1]);
                    if (String.Compare("Initial Catalog", kvp.Key, StringComparison.InvariantCultureIgnoreCase) == 0)
                    {
                        databaseName = kvp.Value;
                    }
                }
            }
         }
         catch (Exception ex)
         {
             databaseName = string.Format("{0}_{1}", "Error", ex.ToString());
         }

The challenge with this approach is that it's a fragile approach. As ConnectionStrings.com can attest, there are a lot of ways of constructing a connection string and what it denotes as the database name property.

Query

Another approach would be to instantiate the connection manager and then query the information schema equivalent and ask the database what the database name is. SQL Server makes this easy

            // Or we can use database connectivity
            // This query would need to have intelligence built into it to issue correct statement
            // per database. This works for SQL Server only
            string queryDatabaseName = @"SELECT db_name(db_id()) AS CurrentDB;";
            System.Data.DataTable dt = null;
            dt = ExternalDataAccess.GetDataTable(conn, queryDatabaseName);
            foreach (System.Data.DataRow row in dt.Rows)
            {
                databaseName = row[0].ToString();
            }

The downside to this is that, much like parsing the connection string it's going to be provider specific. Plus, this will be the slowest due to the cost of instantiating connections. Not to mention ensuring the build server has all the correct drivers installed.

Annotations

"Prior proper planning prevents poor performance" so let's spend a moment up front to define our metadata and then use Linq to extract that information. Since we can't guarantee that a connection node has an Annotation tag named DatabaseName, we need to test for the existence (Any) and if we find one, we'll extract the value.

            if (conn.Annotations.Any(an => an.Tag=="DatabaseName"))
            {
                // We use the Select method to pull out only the thing, Text, that we are interested in
                databaseName = conn.Annotations.Where(an => an.Tag=="DatabaseName").Select(t => t.Text).First().ToString();
            }

The downside to this approach is that it requires planning and a bit of double entry as you need to keep the metadata (annotations) synchronized with the actual connection string. But since we're automating kind of people, that shouldn't be a problem...

Databases.biml

Putting it all together, our Databases.biml file becomes

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<#@ template tier="10" #>
    <Databases>
    <#
    foreach(AstOleDbConnectionNode conn in this.RootNode.OleDbConnections)
    {
        string databaseName = "unknown";
        
        // Test whether the annotation collection contains a tag named DatabaseName
        if (conn.Annotations.Any(an => an.Tag=="DatabaseName"))
        {
            // We use the Select method to pull out only the thing, Text, that we are interested in
            databaseName = conn.Annotations.Where(an => an.Tag=="DatabaseName").Select(t => t.Text).First().ToString();
        }
        else
        {
            // No annotations found
            bool useStringParsing = true;
            if (useStringParsing)
            {
         // use string parsing and prayer
         try
         {
            string [] kVpish;
            KeyValuePair<string, string> kvp ;
            // This is probably the most fragile as it needs to take into consideration all the
            // possible connection styles. Get well acquainted with ConnectionStrings.com
            // Split our connnection string based on the delimiter of a semicolon
            foreach (var element in conn.ConnectionString.Split(new char [] {';'}))
            {
                kVpish = element.Split(new char[]{'='});
                if(kVpish.Count() > 1)
                {
                    kvp = new KeyValuePair<string, string>(kVpish[0], kVpish[1]);
                    if (String.Compare("Initial Catalog", kvp.Key, StringComparison.InvariantCultureIgnoreCase) == 0)
                    {
                        databaseName = kvp.Value;
                    }
                }
            }
         }
         catch (Exception ex)
         {
             databaseName = string.Format("{0}_{1}", "Error", ex.ToString());
         }
            }
            else
            {
                // Or we can use database connectivity
                // This query would need to have intelligence built into it to issue correct statement
                // per database. This works for SQL Server only
                string queryDatabaseName = @"";
                System.Data.DataTable dt = null;
                dt = ExternalDataAccess.GetDataTable(conn, queryDatabaseName);
                foreach (System.Data.DataRow row in dt.Rows)
                {
                    databaseName = row[0].ToString();
                }
            }
            
        }
        
    #>
        <Database Name="<#= conn.Name #>.<#= databaseName#>" ConnectionName="<#= conn.Name #>" />
        <#
        }
        #>
    </Databases>
</Biml>

And that's what it takes to use BimlScript to build out the Database collection nodes based on Annotation, string paring or database querying. Use Annotations to enrich your Biml objects with good metadata and then you can use it to simplify future operations.

Friday, July 28, 2017

Generate TSQL time slices

I had some log data I wanted to bucket into 15 second time slices and I figured if I have solved this once, I will need to do it again so to the blog machine! This will use the LEAD, TIMEFROMPARTS and ROW_NUMBER() to accomplish this.
SELECT
    D.Slice AS SliceStart
,   LEAD
    (
        D.Slice
    ,   1
        -- Default to midnight
    ,   TIMEFROMPARTS(0,0,0,0,0)
    )
    OVER (ORDER BY D.Slice) AS SliceStop
,   ROW_NUMBER() OVER (ORDER BY D.Slice) AS SliceLabel
FROM
(
    -- Generate 15 second time slices
    SELECT 
        TIMEFROMPARTS(A.rn, B.rn, C.rn, 0, 0) AS Slice
    FROM
        (SELECT TOP (24) -1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM sys.all_objects AS AO) AS A(rn)
        CROSS APPLY (SELECT TOP (60) (-1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL))) FROM sys.all_objects AS AO) AS B(rn)
        -- 4 values since we'll aggregate to 15 seconds
        CROSS APPLY (SELECT TOP (4) (-1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL))) * 15  FROM sys.all_objects AS AO) AS C(rn)
) D

That looks like a lot, but it really isn't. Starting from the first inner most query, we select the top 24 rows from sys.all_objects and use the ROW_NUMBER function to generate us a monotonically increasing set of values, thus 1...24. However, since the allowable range of hours is 0 to 23, I deduct one from this value (A). I repeat this pattern to generate minutes (B) except we get the top 60. Since I want 15 second intervals, for the seconds query, I only get the top 4 values. I deduct one so we have {0,1,2,3} and then multiply by 15 to get my increments (C). If you want different time slices, that's how I would modify this pattern.

Finally having 3 columns of numbers, I use TIMEFROMPARTS to build a time data type with the least amount of precision and present that as "Slice" and encapsulate that a derived table (D). Running that query gets me a list of periods but I don't know what the end period is.

We can calculate the end period by using the LEAD function. I present my original Slice as SliceStart. I then use the LEAD function to calculate the next (1) value based on the Slice column. In the case of 23:59:45, the "next" value in our data set is NULL. To address that scenario, we pass in a the default value for the lead function.

Friday, June 30, 2017

Rename default constraints

This week I'm dealing with synchronizing tables between environments and it seems that regardless of what tool I'm using for schema compare, it still gets hung up on the differences in default names for constraints. Rather than fight that battle, I figured it'd greatly simplify my life to systematically rename all my constraints to non default names. The naming convention I went with is DF__schema name_table name_column name. I know that my schemas/tables/columns don't have spaces or "weird" characters in them so this works for me. Use this as your own risk and if you are using pre-2012 the CONCAT call will need to be adjusted to classic string concatenation, a.k.a. +
DECLARE @query nvarchar(4000);
DECLARE
    CSR CURSOR
FAST_FORWARD
FOR
SELECT
    CONCAT('ALTER TABLE ', QUOTENAME(S.name), '.', QUOTENAME(T.name), ' DROP CONSTRAINT [', DC.name, '];', CHAR(10)
    , 'ALTER TABLE ', QUOTENAME(S.name), '.', QUOTENAME(T.name)
    , ' ADD CONSTRAINT [', 'DF__', (S.name), '_', (T.name), '_', C.name, ']'
    , ' DEFAULT ', DC.definition, ' FOR ', QUOTENAME(C.name)) AS Query
FROM
    sys.schemas AS S
    INNER JOIN
        sys.tables AS T
        ON T.schema_id = S.schema_id
    INNER JOIN
        sys.columns AS C
        ON C.object_id = T.object_id
    INNER JOIN
        sys.default_constraints AS DC
        ON DC.parent_object_id = T.object_id
        AND DC.object_id = C.default_object_id
WHERE
    DC.name LIKE 'DF__%'
    AND DC.name <> CONCAT('DF__', (S.name), '_', (T.name), '_', C.name);

OPEN CSR
FETCH NEXT FROM CSR INTO @query;
WHILE @@FETCH_STATUS = 0
BEGIN
    BEGIN TRY
        EXECUTE sys.sp_executesql @query, N'';
    END TRY
    BEGIN CATCH
        PRINT ERROR_MESSAGE()
        PRINT @query;
    END CATCH
    FETCH NEXT FROM CSR INTO @query;
END
CLOSE CSR;
DEALLOCATE CSR;

Wednesday, March 22, 2017

Variable scoping in TSQL isn't a thing

It's a pop quiz kind of day: run the code through your mental parser.

BEGIN TRY
    DECLARE @foo varchar(30) = 'Created in try block';
    DECLARE @i int = 1 / 0;
END TRY
BEGIN CATCH
    PRINT @foo;
    SET @foo = 'Catch found';
END CATCH;

PRINT @foo;
It won't compile since @foo goes out of scope for both the catch and the final line
It won't compile since @foo goes out of scope for the final line
It prints "Created in try block" and then "Catch found"
I am too fixated on your form not having a submit button

Crazy enough, the last two are correct. It seems that unlike every other language I've worked with, all variables are scoped to the same local scope regardless of where in the script they are defined. Demo the first

Wanna see something even more crazy? Check this version out

BEGIN TRY
    DECLARE @i int = 1 / 0;
    DECLARE @foo varchar(30) = 'Created in try block';
END TRY
BEGIN CATCH
    PRINT @foo;
    SET @foo = 'Catch found';
END CATCH;

PRINT @foo;

As above, the scoping of variables remains the same but the forced divide by zero error occurs before the declaration and initialization of our variable @foo. The result? @foo remains uninitialized as evidenced by the first print in the Catch block but it still exists/was parsed to instantiate the variable but not so the value assignment. Second demo

What's all this mean? SQL's weird.