Разобрать строку и вставить результаты в 3 поля - PullRequest
0 голосов
/ 06 ноября 2019

У меня есть поле в таблице SQL, которое выглядит следующим образом.

dbfs:/mnt/rawvtogetdata/2019/06/30/placemt/CZZZ0630.M.00308286.txt

Это буквально имя файла. Мне нужно создать 3 новых поля и разбить приведенную выше строку на это.

Mth     Dy    Fl
06      30    CZZZ0630.M.00308286.txt

Как я могу это сделать?

У меня есть этот базовый SQL, который выполняет выбор и синтаксический анализ, но я не уверен, как сделать вставку в.

Select *,
   SUBSTRING(filename, 30, 2) AS Mth,
   SUBSTRING(filename, 33, 2) AS Dy,
   SUBSTRING(filename, 44, 99) AS Fl
from REUTERS_CPDG_2019

Я в SQL Server 2019.

Ответы [ 2 ]

1 голос
/ 07 ноября 2019

Предполагая, что это имена новых столбцов, я бы просто сказал:

UPDATE
    REUTERS_CPDG_2019
SET
    Mth = SUBSTRING(filename, 30, 2),
    Dy = SUBSTRING(filename, 33, 2),
    Fl = SUBSTRING(filename, 44, 99);
0 голосов
/ 07 ноября 2019

Это легко при использовании NGrams8k (см. DDL в конце этого поста). Используя NGrams8k, я создал функцию под названием SubstringBetweenChar8K. Вот несколько примеров того, как его использовать:

 DECLARE @string varchar(100) = 'abc.defg.hi.jk.lmnop.qrs.tuv';
-- beginning of string to 2nd delimiter, 2nd delimiter to end of the string
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,0,2, '.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,2,0, '.');

-- Between the 1st & 2nd, then 2nd & 5th delimiters
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,1,2, '.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,2,5, '.');

-- dealing with NULLS, delimiters that don't exist and when @first = @last
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,2,10,'.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,1,NULL,'.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,NULL,1,'.');

Результаты:

string                            item                  itemIndex
--------------------------------- --------------------- --------------------
abc.defg.hi.jk.lmnop.qrs.tuv      abc.defg              1

string                            item                  itemIndex
--------------------------------- --------------------- --------------------
abc.defg.hi.jk.lmnop.qrs.tuv      hi.jk.lmnop.qrs.tuv   10

string                            item                  itemIndex
--------------------------------- --------------------- --------------------
abc.defg.hi.jk.lmnop.qrs.tuv      defg                  5

string                            item                  itemIndex
--------------------------------- --------------------- --------------------
abc.defg.hi.jk.lmnop.qrs.tuv      hi.jk.lmnop           10

string                            item                  itemIndex
--------------------------------- --------------------- --------------------
abc.defg.hi.jk.lmnop.qrs.tuv      NULL                  NULL

string                            item                  itemIndex
--------------------------------- --------------------- --------------------
abc.defg.hi.jk.lmnop.qrs.tuv      NULL                  NULL

string                            item                  itemIndex
--------------------------------- --------------------- --------------------
abc.defg.hi.jk.lmnop.qrs.tuv      NULL                  NULL

Для того, что вы делаете:

DECLARE @string VARCHAR(8000) = 'dbfs:/mnt/rawvtogetdata/2019/06/30/placemt/CZZZ0630.M.00308286.txt';

SELECT      
  Mth = m.Item, 
  Dy  = d.Item, 
  Fl  = fl.item
FROM        dbo.substringBetweenChar8k(@string,4,5,'/') AS m
CROSS APPLY dbo.substringBetweenChar8k(@string,5,6,'/') AS d
CROSS APPLY dbo.substringBetweenChar8k(@string,7,0,'/') AS fl;

Результаты:

Mth    Dy   Fl
------ ---- ---------------------------------
06     30   CZZZ0630.M.00308286.txt

Функция DDL:

CREATE FUNCTION dbo.NGrams8k
(
  @string varchar(8000), -- Input string
  @N      int            -- requested token size
)
/****************************************************************************************
Purpose:
 A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens
 based on an input string (@string). Accepts strings up to 8000 varchar characters long.
 For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram.
Compatibility:
 SQL Server 2008+, Azure SQL Database
Syntax:
--===== Autonomous
 SELECT position, token FROM dbo.NGrams8k(@string,@N);
--===== Against a table using APPLY
 SELECT s.SomeID, ng.position, ng.token
 FROM dbo.SomeTable s
 CROSS APPLY dbo.NGrams8K(s.SomeValue,@N) ng;
Parameters:
 @string  = The input string to split into tokens.
 @N       = The size of each token returned.
Returns:
 Position = bigint; the position of the token in the input string
 token    = varchar(8000); a @N-sized character-level N-Gram token
Developer Notes: 
 1. NGrams8k is not case sensitive
 2. Many functions that use NGrams8k will see a huge performance gain when the optimizer
    creates a parallel execution plan. One way to get a parallel query plan (if the
    optimizer does not choose one) is to use make_parallel by Adam Machanic which can be
    found here:
 sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
3. When @N is less than 1 or greater than the datalength of the input string then no
    tokens (rows) are returned. If either @string or @N are NULL no rows are returned.
    This is a debatable topic but the thinking behind this decision is that: because you
    can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you
    can't turn anything into NULL-grams, no rows should be returned.
    For people who would prefer that a NULL input forces the function to return a single
    NULL output you could add this code to the end of the function:
    UNION ALL
    SELECT 1, NULL
    WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL)
 4. NGrams8k can also be used as a Tally Table with the position column being your "N"
    row. To do so use REPLICATE to create an imaginary string, then use NGrams8k to split
    it into unigrams then only return the position column. NGrams8k will get you up to
    8000 numbers. There will be no performance penalty for sorting by position in
    ascending order but there is for sorting in descending order. To get the numbers in
    descending order without forcing a sort in the query plan use the following formula:
    N = <highest number>-position+1.
 Pseudo Tally Table Examples:
    --===== (1) Get the numbers 1 to 100 in ascending order:
    SELECT N = position
    FROM dbo.NGrams8k(REPLICATE(0,100),1);
    --===== (2) Get the numbers 1 to 100 in descending order:
    DECLARE @maxN int = 100;
    SELECT N = @maxN-position+1
    FROM dbo.NGrams8k(REPLICATE(0,@maxN),1)
    ORDER BY position;
 5. NGrams8k is deterministic. For more about deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== Turn the string, 'abcd' into unigrams, bigrams and trigrams
 SELECT position, token FROM dbo.NGrams8k('abcd',1); -- unigrams (@N=1)
 SELECT position, token FROM dbo.NGrams8k('abcd',2); -- bigrams  (@N=2)
 SELECT position, token FROM dbo.NGrams8k('abcd',3); -- trigrams (@N=3)
--===== How many times the substring "AB" appears in each record
 DECLARE @table TABLE(stringID int identity primary key, string varchar(100));
 INSERT @table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');
 SELECT string, occurances = COUNT(*)
 FROM @table t
 CROSS APPLY dbo.NGrams8k(t.string,2) ng
 WHERE ng.token = 'AB'
 GROUP BY string;
----------------------------------------------------------------------------------------
Revision History:
 Rev 00 - 20140310 - Initial Development - Alan Burstein
 Rev 01 - 20150522 - Removed DQS N-Grams functionality, improved iTally logic. Also Added
                     conversion to bigint in the TOP logic to remove implicit conversion
                     to bigint - Alan Burstein
 Rev 03 - 20150909 - Added logic to only return values if @N is greater than 0 and less
                     than the length of @string. Updated comment section. - Alan Burstein
 Rev 04 - 20151029 - Added ISNULL logic to the TOP clause for the @string and @N
                     parameters to prevent a NULL string or NULL @N from causing "an
                     improper value" being passed to the TOP clause. - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
L1(N) AS
(
  SELECT 1
  FROM (VALUES    -- 90 NULL values used to create the CTE Tally Table
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
        (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
       ) t(N)
),
iTally(N) AS                                   -- my cte Tally Table
(
  SELECT TOP(ABS(CONVERT(BIGINT,(DATALENGTH(ISNULL(@string,''))-(ISNULL(@N,1)-1)),0)))
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -- Order by a constant to avoid a sort
  FROM L1 a CROSS JOIN L1 b                    -- cartesian product for 8100 rows (90^2)
)
SELECT
  position = N,                                   -- position of the token in the string(s)
  token    = SUBSTRING(@string,CAST(N AS int),@N) -- the @N-Sized token
FROM iTally
WHERE @N > 0 AND @N <= DATALENGTH(@string);       -- Protection against bad parameter values
GO


GO
CREATE FUNCTION dbo.SubstringBetweenChar8K
(
  @string    varchar(8000),
  @first     int,
  @last      int,
  @delimiter varchar(100)
)
/*****************************************************************************************
Purpose:
 Takes in input string (@string) and returns the text between two instances of a delimiter
 (@delimiter); the location of the delimiters is defined by @first and @last.
 For example: if @string = 'xx.yy.zz.abc', @first=1, @last=3, and @delimiter = '.' the
 function will return the text: yy.zz; this is the text between the first and third
 instance of "." in the string "xx.yy.zz.abc".

Compatibility:
 SQL Server 2008+

Syntax:
--===== Autonomous use
 SELECT sb.item, sb.itemIndex, sb.itemLength
 FROM dbo.SubstringBetweenChar8K(@string, @first, @last, @delimiter); sb;

--===== Use against a table
 SELECT sb.item, sb.itemIndex, sb.itemLength
 FROM SomeTable st
 CROSS APPLY dbo.SubstringBetweenChar8K(st.SomeColumn1, 1, 2, '.') sb;

Parameters:
 @string    = varchar(8000); Input string to parse
 @first     = int; the instance of @delimiter to search for; this is where the output 
              should start. When @first is 0 then the function will return everything from
              the beginning of @string until @end.
 @last      = int; the last instance of @delimiter to search for; this is where the output 
              should end. When @end is 0 then the function will return everything from 
              @first until the end of the string.
 @delimiter = varchar(100); The delimiter use to determine where the output starts/ends

Return Types:
 Inline Table Valued Function returns:
   item     = varchar(8000); the substring between the two instances of @delimiter 
               defined by @first and @last
 itemIndex    = smallint; the location of where the substring begins
------------------------------------------------------------------------------------------
Developer Notes:
 1. Requires NGrams8K. The code for NGrams8K can be found here:
    http://www.sqlservercentral.com/articles/Tally+Table/142316/

 2. This function is what is referred to as an "inline" scalar UDF." Technically it's an
    inline table valued function (iTVF) but performs the same task as a scalar valued user
    defined function (UDF); the difference is that it requires the APPLY table operator
    to accept column values as a parameter. For more about "inline" scalar UDFs see this
    article by SQL MVP Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/91724/
    and for more about how to use APPLY see the this article by SQL MVP Paul White:
    http://www.sqlservercentral.com/articles/APPLY/69953/.

    Note the above syntax example and usage examples below to better understand how to
    use the function. Although the function is slightly more complicated to use than a
    scalar UDF it will yield notably better performance for many reasons. For example,
    unlike a scalar UDFs or multi-line table valued functions, the inline scalar UDF does
    not restrict the query optimizer's ability generate a parallel query execution plan.

 3. dbo.SubstringBetweenChar8K generally performs better with a parallel execution plan 
    but the optimizer is sometimes stingy about assigning one. Consider performance 
    testing using Traceflag 8649 in Development environments and Adam Machanic's 
    make_parallel in production environments. 

 4. dbo.SubstringBetweenChar8K returns NULL when supplied with a NULL input strings and/or
    NULL pattern;

 5. dbo.SubstringBetweenChar8K is deterministic; for more about deterministic and
    nondeterministic functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx

Examples:
 DECLARE @string varchar(100) = 'abc.defg.hi.jk.lmnop.qrs.tuv';
-- beginning of string to 2nd delimiter, 2nd delimiter to end of the string
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,0,2, '.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,2,0, '.');

-- Between the 1st & 2nd, then 2nd & 5th delimiters
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,1,2, '.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,2,5, '.');

-- dealing with NULLS, delimiters that don't exist and when @first = @last
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,2,10,'.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,1,NULL,'.');
  SELECT string=@string, item, itemIndex FROM dbo.SubstringBetweenChar8K(@string,NULL,1,'.');
---------------------------------------------------------------------------------------
Revision History:
 Rev 00 - 20160720 - Initial Creation - Alan Burstein
 Rev 01 - 20180613 - Complete re-design, includeing multi-char delimiters
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT 
  item = 
    CASE WHEN @first >= 0 AND @last >=0 THEN
      CASE WHEN @first+@last=0 THEN @string
           WHEN @last=0        THEN SUBSTRING(@string, p.mn+LEN(@delimiter), 8000)
           WHEN @first<@last   THEN SUBSTRING(@string, p.mn+LEN(@delimiter), 
                                      NULLIF(p.mx,p.mn)-p.mn-LEN(@delimiter)) END END,
  itemIndex = 
    CASE WHEN @first >= 0 AND @last >=0 THEN
      CASE WHEN @first+@last=0 THEN 1
           WHEN @last=0        THEN (p.mn+LEN(@delimiter))
           WHEN @first<@last   THEN (p.mn+LEN(@delimiter))*SIGN(NULLIF(p.mx,p.mn)) END END
FROM
(
  SELECT MIN(d.de), MAX(d.de)
  FROM
  (
    SELECT CHECKSUM(0),0 WHERE @first = 0 UNION ALL
    SELECT CHECKSUM(ROW_NUMBER() OVER (ORDER BY ng.position)), ng.position
    FROM dbo.ngrams8k(@string, LEN(@delimiter)) ng
    WHERE ng.token = @delimiter
  ) d(ds,de)
  WHERE ds IN (@first,@last)
) p(mn,mx);
GO
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...