SQL Server Unable to return query Thai data

knpiaxh1  于 5个月前  发布在  其他
关注(0)|答案(2)|浏览(40)

I have a table with columns that contain both Thai and English text data in NVARCHAR(255) . In SSMS I can query the table and return all the rows easy enough. But if I then query specifically for one of the Thai results it returns no rows.

SELECT TOP 1000
    [Province]
    , [District]
    , [SubDistrict]
    , [Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]

Returns

Province    District    SubDistrict Branch 
อุตรดิตถ์   ลับแล   ศรีพนมมาศ   Northern
Bangkok  Khlong Toei    Khlong Tan  SSS1

But this query:

SELECT
    [Province]
    , [District]
    , [SubDistrict]
    , [Branch ]
FROM [THDocuworldRego].[dbo].[allDistricsBranches]
WHERE [Province] LIKE 'อุตรดิตถ์'

Returns no rows. What do I need o do to get the expected results. The collation set is Latin1_General_CI_AS . The data is displayed and inserted with no errors just can't search.

6psbrbz9

6psbrbz91#

Two problems:

  • The string being passed into the LIKE clause is VARCHAR due to not being prefixed with a capital "N". For example:
SELECT 'อุตรดิตถ์' AS [VARCHAR], N'อุตรดิตถ์' AS [NVARCHAR]
-- ?????????        อุตรดิตถ

What is happening here is that when SQL Server is parsing the query batch, it needs to determine the exact type and value of all literals / constants. So it figures out that 12 is an INT and 12.0 is a NUMERIC , etc. It knows that N'ดิ' is NVARCHAR , which is an all-inclusive character set, so it takes the value as is. BUT, as noted before, 'ดิ' is VARCHAR , which is an 8-bit encoding, which means that the character set is controlled by a Code Page. For string literals and variables / parameters, the Code Page used for VARCHAR data is the Database's default Collation. If there are characters in the string that are not available on the Code Page used by the Database's default Collation, they are either converted to a "best fit" mapping, if such a mapping exists, else they become the default replacement character: ? .

Technically speaking, since the Database's default Collation controls string literals (and variables), and since there is a Code Page for "Thai" (available in Windows Collations), then it would be possible to have a VARCHAR string containing Thai characters (meaning: 'ดิ' , without the "N" prefix, would work). But that would require changing the Database's default Collation, and that is A LOT more work than simply prefixing the string literal with "N".

For an in-depth look at this behavior, please see my two-part series:

The end result will look like:

WHERE [Province] LIKE N'%อุตรดิตถ์%'

EDIT:
I just edited the question to format the "results" to be more readable. It now appears that the following might also work (since no wildcards are being used in the LIKE predicate in the question):

WHERE [Province] = N'อุตรดิตถ์'

EDIT 2:
A string (i.e. something inside of single-quotes) isVARCHAR if there is no "N" prefixed to the string literal. It doesn't matter what the destination datatype is (e.g. an NVARCHAR(255) column). The issue here is the datatype of the source data, and that source is a string literal. And unlike a string in .NET, SQL Server handles 'string' as an 8-bit encoding ( VARCHAR ; ASCII values 0 - 127 same across all Code Pages, Extended ASCII values 128 - 255 determined by the Code Page, and potentially 2-byte sequences for Double-Byte Character Sets) and N'string' as UTF-16 Little Endian ( NVARCHAR ; Unicode character set, 2-byte sequences for BMP characters 0 - 65535, two 2-byte sequences for Code Points above 65535). Using 'string' is the same as passing in a VARCHAR variable. For example:

DECLARE @ASCII VARCHAR(20);
SET @ASCII = N'อุตรดิตถ์';
SELECT @ASCII AS [ImplicitlyConverted]
-- ?????????
qvk1mo1f

qvk1mo1f2#

Could be a number of things!

Fist of print out the value of the column and your query string in hex.

SELECT
    convert(varbinary(20), Province) as stored 
    convert(varbinary(20),'อุตรดิตถ์') as query 
from allDistricsBranches;

This should give you some insight to the problem. I think the most likely cause is the ั, ิ, characters being typed in the wrong sequence. They are displayed as part of the main letter but are stored internally as separate characters.

相关问题