Разбор JSON ответа с использованием API суммирования текста, ошибка кодировки в ответе - PullRequest
0 голосов
/ 07 августа 2020

Я пользуюсь услугой https://www.meaningcloud.com/products/automatic-summarization для резюмирования текста. Я использую. NET Core 5

Например, я хочу сократить эту новость: https://e.vnexpress.net/news/business/economy/vn-index-rises-for-third-straight-session-4141865.html

string input = "..." // long content of news post.
var client = new RestClient("https://api.meaningcloud.com/summarization-1.0");
client.Timeout = -1;
var request = new RestRequest(Method.POST);
request.AddParameter("key", "25870359b682ec3c93f9becd850eb459");  // fake token because this content is public.        
request.AddParameter("sentences", 4);
request.AddParameter("txt", JsonEncodedText.Encode(content));

IRestResponse response = client.Execute(request);
System.Threading.Thread.Sleep(3000);
var res = JObject.Parse(response.Content);
// Need convert \r\n , \r\n\r\n to space.
string short_content = res["summary"].ToString();
// SysUtil.StringEncodingConvert(short_content, "ISO-8859-1", "UTF-8");            
string result = raw_string.Replace(" [...] ", " ");

Input

The benchmark VN-Index saw steady growth throughout the day, gradually gaining a total of 10.23 points by the end of the session. The Ho Chi Minh Stock Exchange (HoSE), on which the index is based, saw 300 stocks gain and 78 lose. Total trading volume improved 48 percent over the previous session, reaching VND6.2 trillion ($269 million). The VN30-Index, a basket of HoSE’s 30 largest capped stocks, rose 1.63 percent, with 27 gaining and 2 losing. Its top gainers were SAB of Vietnam’s largest brewer Sabeco, up 4.8 percent, followed by VJC of budget airline Vietjet, up 2.8 percent, and MWG of electronics retailer Mobile World, up 2.2 percent. Of Vietnam’s biggest state-owned lenders by assets, BID of BIDV climbed 0.85 percent, VCB of Vietcombank 0.8 percent, and CTG of VietinBank 0.6 percent. HDB of HDBank and TCB of Techcombank led gains of private banks at 0.85 percent and 0.6 percent respectively. Other gainers included PNJ of Phu Nhuan Jewelry with 1.4 percent, HPG of steel producer Hoa Phat, 1.1 percent, and MSN of conglomerate Masan, 1 percent. The only two VN30 tickers that ended in the red were VIC of conglomerate Vingroup, down 1 percent, and PLX of fuel distributor Petrolimex, down 0.05 percent. The HNX-Index for stocks on the Hanoi Stock Exchange, home to mid and small caps, rose 1.35 percent, and the UPCoM-Index for stocks on the Unlisted Public Companies Market added 0.3 percent. Foreign investors turned net buyers to the tune of VND15.7 billion ($681,600), with buying pressure focused mainly on HPG and VHM of real estate giant Vinhomes.

вывод после суммирования текста (4 предложения)

The benchmark VN-Index saw steady growth throughout the day, gradually gaining a total of 10.23 points by the end of the session. The VN30-Index, a basket of HoSE\u2019s 30 largest capped stocks, rose 63 percent, with 27 gaining and 2 losing. Of Vietnam\u2019s biggest state-owned lenders by assets, BID of BIDV climbed 0.85 percent, VCB of Vietcombank 0.8 percent, and CTG of VietinBank 0.6 percent. The HNX-Index for stocks on the Hanoi Stock Exchange, home to mid and small caps, rose 1.35 percent, and the UPCoM-Index for stocks on the Unlisted Public Companies Market added 0.3 percent.

enter image description here

I also try use util

using System;

namespace myproj.Controllers
{

    public class SysUtil
    {
        public static String StringEncodingConvert(String strText, String strSrcEncoding, String strDestEncoding)
        {
            System.Text.Encoding srcEnc = System.Text.Encoding.GetEncoding(strSrcEncoding);
            System.Text.Encoding destEnc = System.Text.Encoding.GetEncoding(strDestEncoding);
            byte[] bData = srcEnc.GetBytes(strText);
            byte[] bResult = System.Text.Encoding.Convert(srcEnc, destEnc, bData);
            return destEnc.GetString(bResult);
        }
    }

}

but not success.

even I replace, still not success

tring result2 = result.Replace("\u2019s", "'s");

Уловил некоторую проблему

\u2019s -> Мне нужно 's, как это заархивировать?

1 Ответ

1 голос
/ 07 августа 2020

\u2019 - это символ Юникода для умной цитаты. Просто замените это:

result2 = result.Replace('\u2019', '\'')
...