BCP выдает "String data, right truncation @ #", даже если типы данных совпадают - PullRequest
1 голос
/ 02 октября 2019

Попытка скопировать данные файла TSV в базу данных MSSQL через bcp из пакета mssql-tools. Получение журналов ошибок bcp вида:

#@ Row 1, Column 20: String data, right truncation @#
"1234"  "ASDFG Z ZXCCVVB"   "Physician" "ASDFG ZXCVB"   ""  ""  ""  ""  ""  ""  "10"    ""  ""  ""  ""  ""  "1234"  "1234"  "Provider"  "Y" "1234"  "1234"  ""  ""  "Person"    "MD"    "ASDFG N ZXCVB" "Active"    "Internal"  <NULL>  ""  ""  ""  ""  ""  "xxxxxx"    "xxxxxx"    ""  "1234"  "M" <NULL>  ""  ""  ""  ""  "Y" "Y" "Y" "Y" <NULL>  ""  ""  <NULL>  <NULL>  ""  ""  <NULL>  <NULL>  ""  ""  <NULL>  ""  ""  ""  ""  ""  ""  ""  ""  <NULL>  <NULL>  100110  ""  ""  ""  ""  ""  "1" ""  ""  ""  ""  ""  ""  ""  4   "Y" <NULL>  <NULL>  <NULL>  ""  ""  <NULL>  ""  <NULL>  ""  ""  ""  "MD"    <NULL>  ""  ""  ""  <NULL>  <NULL>  <NULL>  <NULL>  ""  <NULL>  ""  ""  "1234"  ""  ""  <NULL>  ""  ""  <NULL>  "1" "Y" ""  "2" 1   1   "1" "1" 2019-10-01
"1234"  "HEALTH CENTER" ""  "WOMEN'S"   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "1234"  "Provider"  "Y" "1234"  ""  ""  ""  "Person"    ""  "HEALTH CENTER" "Active"    ""      ""  ""  ""  ""  ""  "xxxxxx"    "xxxxxx"    ""  ""  ""      ""  ""  ""  ""  ""  ""  ""  ""      ""  ""          ""  ""          ""  ""      ""  ""  ""  ""  ""  ""  ""  ""              ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""      ""              ""  ""      ""      ""  ""  ""  ""      ""  ""  ""                  ""      ""  ""  "1234"  ""  ""      ""  ""      ""  ""  ""  ""  1   1   ""  "1" 2019-10-01

С помощью команды bcp ...

RECOMMEDED_IMPORT_MODE="-c"
/opt/mssql-tools/bin/bcp "$TABLENAME" in "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV" \
        -S $DEST_IP \
        -U $USER -P $PASSWORD \
        -d $DEST_DB \
        $RECOMMEDED_IMPORT_MODE \
        -t "\t" \
        -e $EXPORT_STAGE/$TABLENAME/$TABLENAME.bcperror.log

Обратите внимание, что ошибка первой строки связана со столбцом 20, соответствующим «Y»характер. Глядя на схему БД назначения, я вижу, что тип varchar(1). Таблица DDl от

SELECT ORDINAL_POSITION, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
       , IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'mytable'
ORDINAL_POSITION    COLUMN_NAME DATA_TYPE   CHARACTER_MAXIMUM_LENGTH    IS_NULLABLE
1   PROV_ID varchar 18  NO
2   PROV_NAME   varchar 254 YES
3   PROV_TYPE   varchar 254 YES
4   PROV_ABBR   varchar 254 YES
5   GL_PREFIX   varchar 15  YES
6   RPT_GRP_ONE varchar 80  YES
7   RPT_GRP_TWO varchar 80  YES
8   RPT_GRP_THREE   varchar 80  YES
9   RPT_GRP_FOUR    varchar 80  YES
10  RPT_GRP_FIVE    varchar 80  YES
11  RPT_GRP_SIX varchar 66  YES
12  RPT_GRP_SEVEN   varchar 66  YES
13  RPT_GRP_EIGHT   varchar 66  YES
14  RPT_GRP_NINE    varchar 66  YES
15  RPT_GRP_TEN varchar 66  YES
16  IS_RESIDENT varchar 3   YES
17  USER_ID varchar 18  YES
18  EPIC_PROV_ID    varchar 18  YES
19  REFERRAL_SRCE_TYPE  varchar 66  YES
20  IS_VERIFIED_YN  varchar 1   YES
21  SER_REF_SRCE_ID varchar 18  YES
22  UPIN    varchar 30  YES
23  SSN varchar 192 YES
24  EMP_STATUS  varchar 40  YES
25  STAFF_RESOURCE  varchar 20  YES
26  CLINICIAN_TITLE varchar 100 YES
27  EXTERNAL_NAME   varchar 80  YES
28  ACTIVE_STATUS   varchar 20  YES
29  REFERRAL_SOURCE_TYPE    varchar 66  YES
30  RECORD_TYPE numeric NULL    YES
31  BILL_PROV_YN    varchar 1   YES
32  BILL_UNDER_PROV_ID  varchar 18  YES
33  SUP_PROV_ID varchar 18  YES
34  COUNTY_C    varchar 66  YES
35  COUNTRY_C   varchar 66  YES
36  OFFICE_PHONE_NUM    varchar 50  YES
37  OFFICE_FAX_NUM  varchar 25  YES
38  EMAIL   varchar 128 YES
39  DEA_NUMBER  varchar 30  YES
40  SEX varchar 66  YES
41  BIRTH_DATE  date    NULL    YES
42  MEDICARE_PROV_ID    varchar 12  YES
43  MEDICAID_PROV_ID    varchar 12  YES
44  IS_PRIV_REVOKED varchar 1   YES
45  NURSE_EMP_ID    varchar 18  YES
46  EPICCARE_PROV_YN    varchar 1   YES
47  MEDS_AUTH_PROV_YN   varchar 1   YES
48  ORDS_AUTH_PROV_YN   varchar 1   YES
49  TRANS_INTF_USER_YN  varchar 1   YES
50  PEER_REV_LAST_DATE  date    NULL    YES
51  TAKING_NEW_PAT_YN   varchar 1   YES
52  TAKING_WALKINS_YN   varchar 1   YES
53  LAST_RECOMMENDED_DATE   date    NULL    YES
54  BASE_COST   numeric NULL    YES
55  SURG_REC_POOL_YN    varchar 1   YES
56  INSTRUMENT_TYPE_C   varchar 66  YES
57  EQUIP_SERVICE_DATE  date    NULL    YES
58  EQUIP_LASTSVC_DATE  date    NULL    YES
59  CLM_POS_REQD_YN varchar 1   YES
60  DEFAULT_POS_CLM_YN  varchar 1   YES
61  MODALITY_TYPE_C numeric NULL    YES
62  MODALITY_YN varchar 1   YES
63  SUPERV_POOL_ID  varchar 18  YES
64  SUPERV_POOL_NAME    varchar 254 YES
65  FLASH_CARD_PRT_ROU  varchar 255 YES
66  CTRL_SHEET_PRT_ROU  varchar 255 YES
67  PIN_ID  varchar 18  YES
68  PROV_ATTR_ID    varchar 18  YES
69  ATTND_PRIM_PAGER    varchar 50  YES
70  OO_OFFICE_FROM_DTE  date    NULL    YES
71  OO_OFFICE_TO_DTE    date    NULL    YES
72  DEF_DEPARTMENT_ID   numeric NULL    YES
73  CM_PHY_OWNER_ID varchar 25  YES
74  CM_LOG_OWNER_ID varchar 25  YES
75  RPT_GRP_ELEVEN_C    varchar 66  YES
76  RPT_GRP_TWELVE_C    varchar 66  YES
77  RPT_GRP_THIRTEEN_C  varchar 66  YES
78  RPT_GRP_FOURTEEN_C  varchar 66  YES
79  RPT_GRP_FIFTEEN_C   varchar 66  YES
80  RPT_GRP_SIXTEEN_C   varchar 66  YES
81  RPT_GRP_SEVNTEEN_C  varchar 66  YES
82  RPT_GRP_EIGHTEEN_C  varchar 66  YES
83  RPT_GRP_NINETEEN_C  varchar 66  YES
84  RPT_GRP_TWENTY_C    varchar 66  YES
85  HOSPITALIST_YN  varchar 254 YES
86  DEF_DIVISION_C  numeric NULL    YES
87  DEF_PROVIDER_YN varchar 254 YES
88  PROV_REC_STATE_C    numeric NULL    YES
89  X_IP_ORD_PROV_C varchar 1   YES
90  PROV_START_DATE date    NULL    YES
91  PRACTICE_NAME_C numeric NULL    YES
92  X_SUPERVISING_PROV  varchar 254 YES
93  X_SUPERVISION_REQD  varchar 50  YES
94  X_INPAT_DISCIPLINE  varchar 18  YES
95  X_ED_PROVIDER   varchar 1   YES
96  X_IS_EMPLOYED_CRNA  varchar 254 YES
97  X_REVENUE_DEPT_ID   numeric NULL    YES
98  SURG_SCHED_OUT_YN   varchar 1   YES
99  SURG_EQP_SVCDAT_YN  varchar 1   YES
100 SURG_COST_TBL_ID    numeric NULL    YES
101 TEAM_LEADER_ID  varchar 18  YES
102 TEAM_C  numeric NULL    YES
103 SUP_PROV_YN varchar 1   YES
104 EMPLOYED_CRNA_YN    varchar 1   YES
105 IS_INTERPRETER_YN   varchar 1   YES
106 DOCTORS_DEGREE  varchar 254 YES
107 REVENUE_DEPT_ID numeric NULL    YES
108 ENC_PROV_YN varchar 1   YES
109 OR_VLD_DT_OFST  numeric NULL    YES
110 OR_CHARGE_CODE_ID   numeric NULL    YES
111 RES_POOL_TYPE_C numeric NULL    YES
112 PHARMACIST_YN   varchar 1   YES
113 LAB_FAX_NUMBER  varchar 254 YES
114 PROV_PHOTO  varchar 120 YES
115 USE_DEPT_VT_LIM_YN  varchar 1   YES
116 VERIFYING_PERSON_ID varchar 18  YES
117 DIRECTORY_INFO  varchar 254 YES
118 DBC_EXT_POS_ID  numeric NULL    YES
119 EDI_CLM_ACTIVE_YN   varchar 1   YES
120 PROV_CLM_PROC_STA_C numeric NULL    YES
121 PAYEE_NUM_DEFAULT   varchar 254 YES
122 SER_CLM_ID  varchar 254 YES
123 MCD_PROF_CD_C   numeric NULL    YES
124 OP_ORD_PROV_YN  varchar 1   YES
125 IS_SUP_PROV_REQ_C   numeric NULL    YES
126 PROVIDER_TYPE_C varchar 66  YES
127 EPRESCRIBING_YN varchar 1   YES
128 EP_FLAG_YN  varchar 1   YES
129 SEX_C   varchar 66  YES
130 ACTIVE_STATUS_C numeric NULL    YES
131 REFERRAL_SOURCE_TYPE_C  varchar 66  YES
132 STAFF_RESOURCE_C    numeric NULL    YES
133 REFERRAL_SRCE_TYPE_C    varchar 66  YES
134 etl_date    datetime    NULL    YES

Таким образом, символ из TSV должен соответствовать. Используя cat -ne datafile.tsv можно увидеть, что строки заканчиваются на $, и предварительная обработка so, которую я делаю для файла:

# remove header
sed -i 1d "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV"
# remove DOS carriage return
sed -i "s/\r//g" "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV"
# append etl date to data records
sed -i "s/$/\t$(date +%Y-%m-%d)/" "$EXPORT_STAGE/$TABLENAME/$TABLENAME.TSV"

И заголовок файла необработанных данных выглядит примерно так (некоторые поля заменены случайными бессмысленными значениями) ...

"PROV_ID"   "PROV_NAME" "PROV_TYPE" "PROV_ABBR" "GL_PREFIX" "RPT_GRP_ONE"   "RPT_GRP_TWO"   "RPT_GRP_THREE" "RPT_GRP_FOUR"  "RPT_GRP_FIVE"  "RPT_GRP_SIX"   "RPT_GRP_SEVEN" "RPT_GRP_EIGHT" "RPT_GRP_NINE"  "RPT_GRP_TEN"   "IS_RESIDENT"   "USER_ID"   "EPIC_PROV_ID"  "REFERRAL_SRCE_TYPE"    "IS_VERIFIED_YN"    "SER_REF_SRCE_ID"   "UPIN"  "SSN"   "EMP_STATUS"    "STAFF_RESOURCE"    "CLINICIAN_TITLE"   "EXTERNAL_NAME" "ACTIVE_STATUS" "REFERRAL_SOURCE_TYPE"  "RECORD_TYPE"   "BILL_PROV_YN"  "BILL_UNDER_PROV_ID"    "SUP_PROV_ID"   "COUNTY_C"  "COUNTRY_C" "OFFICE_PHONE_NUM"  "OFFICE_FAX_NUM"    "EMAIL" "DEA_NUMBER"    "SEX"   "BIRTH_DATE"    "MEDICARE_PROV_ID"  "MEDICAID_PROV_ID"  "IS_PRIV_REVOKED"   "NURSE_EMP_ID"  "EPICCARE_PROV_YN"  "MEDS_AUTH_PROV_YN" "ORDS_AUTH_PROV_YN" "TRANS_INTF_USER_YN"    "PEER_REV_LAST_DATE"    "TAKING_NEW_PAT_YN" "TAKING_WALKINS_YN" "LAST_RECOMMENDED_DATE" "BASE_COST" "SURG_REC_POOL_YN"  "INSTRUMENT_TYPE_C" "EQUIP_SERVICE_DATE"    "EQUIP_LASTSVC_DATE"    "CLM_POS_REQD_YN"   "DEFAULT_POS_CLM_YN"    "MODALITY_TYPE_C"   "MODALITY_YN"   "SUPERV_POOL_ID"    "SUPERV_POOL_NAME"  "FLASH_CARD_PRT_ROU"    "CTRL_SHEET_PRT_ROU"    "PIN_ID"    "PROV_ATTR_ID"  "ATTND_PRIM_PAGER"  "OO_OFFICE_FROM_DTE"    "OO_OFFICE_TO_DTE"  "DEF_DEPARTMENT_ID" "CM_PHY_OWNER_ID"   "CM_LOG_OWNER_ID"   "RPT_GRP_ELEVEN_C"  "RPT_GRP_TWELVE_C"  "RPT_GRP_THIRTEEN_C"    "RPT_GRP_FOURTEEN_C"    "RPT_GRP_FIFTEEN_C" "RPT_GRP_SIXTEEN_C" "RPT_GRP_SEVNTEEN_C"    "RPT_GRP_EIGHTEEN_C"    "RPT_GRP_NINETEEN_C"    "RPT_GRP_TWENTY_C"  "HOSPITALIST_YN"    "DEF_DIVISION_C"    "DEF_PROVIDER_YN"   "PROV_REC_STATE_C"  "PROV_START_DATE"   "PRACTICE_NAME_C"   "SURG_SCHED_OUT_YN" "SURG_EQP_SVCDAT_YN"    "SURG_COST_TBL_ID"  "TEAM_LEADER_ID"    "TEAM_C"    "SUP_PROV_YN"   "EMPLOYED_CRNA_YN"  "IS_INTERPRETER_YN" "DOCTORS_DEGREE"    "REVENUE_DEPT_ID"   "ENC_PROV_YN"   "PHARMACIST_YN" "LAB_FAX_NUMBER"    "RES_POOL_TYPE_C"   "OR_VLD_DT_OFST"    "OR_CHARGE_CODE_ID" "DBC_EXT_POS_ID"    "OP_ORD_PROV_YN"    "IS_SUP_PROV_REQ_C" "PROV_PHOTO"    "USE_DEPT_VT_LIM_YN"    "VERIFYING_PERSON_ID"   "DIRECTORY_INFO"    "EDI_CLM_ACTIVE_YN" "PROV_CLM_PROC_STA_C"   "PAYEE_NUM_DEFAULT" "SER_CLM_ID"    "MCD_PROF_CD_C" "PROVIDER_TYPE_C"   "EPRESCRIBING_YN"   "EP_FLAG_YN"    "SEX_C" "STAFF_RESOURCE_C"  "ACTIVE_STATUS_C"   "REFERRAL_SOURCE_TYPE_C"    "REFERRAL_SRCE_TYPE_C"
"1234"  "ASDF Z XZXCV"  "Physician" "ASD ZXCV"  ""  ""  ""  ""  ""  ""  "10"    ""  ""  ""  ""  ""  "1234"  "1234"  "Provider"  "Y" "1234"  "1234"  ""  ""  "Person"    "MD"    "ASDFG Z ZXV"   "Active"    "Internal"      ""  ""  ""  ""  ""  "xxxxxx"    "xxxxxx"    ""  "1234"  "M"     ""  ""  ""  ""  "Y" "Y" "Y" "Y"     ""  ""          ""  ""          ""  ""      ""  ""  ""  ""  ""  ""  ""  ""          1234    ""  ""  ""  ""  ""  "1" ""  ""  ""  ""  ""  ""  ""  4   "Y"             ""  ""      ""      ""  ""  ""  "MD"        ""  ""  ""                  ""      ""  ""  "1234"  ""  ""      ""  ""      "1" "Y" ""  "2" 1   1   "1" "1"
"1234"  "HEALTH CENTER" ""  "WOMEN'S"   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "1234"  "Provider"  "Y" "1234"  ""  ""  ""  "Person"    ""  "HEALTH CENTER" "Active"    ""      ""  ""  ""  ""  ""  xxxxxx" "xxxxxx"    ""  ""  ""      ""  ""  ""  ""  ""  ""  ""  ""      ""  ""          ""  ""          ""  ""      ""  ""  ""  ""  ""  ""  ""  ""              ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""      ""              ""  ""      ""      ""  ""  ""  ""      ""  ""  ""                  ""      ""  ""  "1234"  ""  ""      ""  ""      ""  ""  ""  ""  1   1   ""  "1"

и вот первая строка со скрытыми символами, обнаруженными в vi (через :set list):

1234^IASFD^I ASDFG Z^IASDFG^IZXCV^I ASDFG^I^I^I^I^I^I^I10^I^I^I^I^I^I1234^I1234^IProvider^IY^I1234^I1234^I^I^IPerson^IMD^IADFG Z ZXCV^IActive^IInternal^I^I^I^I^I^I^Ixxxxxx^Ixxxxxx^I^I1234^IM^I^I^I^I^I^IY^IY^IY^IY^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I^I1234^I^I^I^I^I^I1^I^I^I^I^I^I^I^I4^IY^I^I^I^I^I^I^I^I^I^I^I^IMD^I^I^I^I^I^I^I^I^I^I^I^I^I1234^I^I^I^I^I^I^I1^IY^I^I2^I1^I1^I1^I1^I2019-10-01$

Может кто-нибудь помочь определить, что здесь может происходить

1 Ответ

1 голос
/ 02 октября 2019

Как сказал Джейми и Алекс, BCP интерпретировал двойные кавычки как часть текстового значения, поэтому поля, которые должны были быть одиночными, например. Y считывались как строки шириной 3 "Y" и, таким образом, переполняли поле в целевой БД (что в данном случае ожидало varchar(1)).

* Расширенный разговор в комментариях был полезен для различных других проблем в данных, и я думаю, что все еще полезен для других отладочных проблем BCP, но эта проблема двойных кавычек была главной проблемой для этого вопроса.

...