Большая строка разбивается на строки с максимальной длиной в Java - PullRequest
19 голосов
/ 23 сентября 2011
String input = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";

//text copied from http://www.nationalgeographic.com/community/terms/

Я хочу разбить эту большую строку на строки, и строки не должны содержать более MAX_LINE_LENGTH символов в каждой строке.

Что я пробовал до сих пор

int MAX_LINE_LENGTH = 20;    
System.out.print(Arrays.toString(input.split("(?<=\\G.{MAX_LINE_LENGTH})")));
//maximum length of line 20 characters

Выход:

[THESE TERMS AND COND, ITIONS OF SERVICE (t, he Terms) ARE A LEGA, L AND B ...

Это вызывает разрыв слов . Я не хочу этого Вместо того, чтобы я хотел получить вывод, как это:

[THESE TERMS AND , CONDITIONS OF , SERVICE (the Terms) , ARE A LEGAL AND B ...

Добавлено еще одно условие: Если длина слова больше, чем MAX_LINE_LENGTH, слово должно быть разделено.

И решение должно быть без помощи внешних банок.

Ответы [ 8 ]

24 голосов
/ 23 сентября 2011

Просто перебирайте строку слово за словом и прерывайте всякий раз, когда слово превышает предел.

public String addLinebreaks(String input, int maxLineLength) {
    StringTokenizer tok = new StringTokenizer(input, " ");
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    while (tok.hasMoreTokens()) {
        String word = tok.nextToken();

        if (lineLen + word.length() > maxLineLength) {
            output.append("\n");
            lineLen = 0;
        }
        output.append(word);
        lineLen += word.length();
    }
    return output.toString();
}

Я только что напечатал это от руки, вам, возможно, придется нажать и немного подтолкнуть, чтобы он скомпилировался.

Ошибка: если слово на входе длиннее maxLineLength, оно будет добавлено к текущей строке, а не к слишком длинной собственной строке. Я предполагаю, что длина вашей строки составляет около 80 или 120 символов, и в этом случае это вряд ли будет проблемой.

10 голосов
/ 09 января 2014

Лучше: используйте Apache Commons Lang:

org.apache.commons.lang.WordUtils

<code>/**
 * <p>Wraps a single line of text, identifying words by <code>' '</code>.</p>
 * 
 * <p>New lines will be separated by the system property line separator.
 * Very long words, such as URLs will <i>not</i> be wrapped.</p>
 * 
 * <p>Leading spaces on a new line are stripped.
 * Trailing spaces are not stripped.</p>
 *
 * <pre>
 * WordUtils.wrap(null, *) = null
 * WordUtils.wrap("", *) = ""
 * 
* * @param str Строка для переноса слов, может быть нулевой * @param wrapLengthстолбец для переноса слов, где меньше 1, рассматривается как 1 * @ возврат строки с добавленными символами новой строки, null при нулевом вводе * / public static String wrap (String str, int wrapLength) {return wrap (str, wrapLength), ноль, ложь);}
6 голосов
/ 04 января 2013

Спасибо Баренду Гарвелинку за ваш ответ. Я изменил приведенный выше код, чтобы исправить Ошибка: «если слово во входе длиннее, чем maxCharInLine»

public String[] splitIntoLine(String input, int maxCharInLine){

    StringTokenizer tok = new StringTokenizer(input, " ");
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    while (tok.hasMoreTokens()) {
        String word = tok.nextToken();

        while(word.length() > maxCharInLine){
            output.append(word.substring(0, maxCharInLine-lineLen) + "\n");
            word = word.substring(maxCharInLine-lineLen);
            lineLen = 0;
        }

        if (lineLen + word.length() > maxCharInLine) {
            output.append("\n");
            lineLen = 0;
        }
        output.append(word + " ");

        lineLen += word.length() + 1;
    }
    // output.split();
    // return output.toString();
    return output.toString().split("\n");
}
5 голосов
/ 16 сентября 2015

Вы можете использовать метод WordUtils.wrap Apache Commans Lang

 import java.util.*;
 import org.apache.commons.lang3.text.WordUtils;
 public class test3 {


public static void main(String[] args) {

    String S = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
    String F = WordUtils.wrap(S, 20);
    String[] F1 =  F.split(System.lineSeparator());
    System.out.println(Arrays.toString(F1));

}}

Вывод

   [THESE TERMS AND, CONDITIONS OF, SERVICE (the Terms), ARE A LEGAL AND, BINDING AGREEMENT, BETWEEN YOU AND, NATIONAL GEOGRAPHIC, governing your use, of this site,, www.nationalgeographic.com,, which includes but, is not limited to, products, software, and services offered, by way of the, website such as the, Video Player,, Uploader, and other, applications that, link to these Terms, (the Site). Please, review the Terms, fully before you, continue to use the, Site. By using the, Site, you agree to, be bound by the, Terms. You shall, also be subject to, any additional terms, posted with respect, to individual, sections of the, Site. Please review, our Privacy Policy,, which also governs, your use of the, Site, to understand, our practices. If, you do not agree,, please discontinue, using the Site., National Geographic, reserves the right, to change the Terms, at any time without, prior notice. Your, continued access or, use of the Site, after such changes, indicates your, acceptance of the, Terms as modified., It is your, responsibility to, review the Terms, regularly. The Terms, were last updated on, 18 July 2011.]
4 голосов
/ 08 января 2014

Исходя из предложения @Barend, ниже приведена моя окончательная версия с небольшими изменениями:

private static final char NEWLINE = '\n';
private static final String SPACE_SEPARATOR = " ";
//if text has \n, \r or \t symbols it's better to split by \s+
private static final String SPLIT_REGEXP= "\\s+";

public static String breakLines(String input, int maxLineLength) {
    String[] tokens = input.split(SPLIT_REGEXP);
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    for (int i = 0; i < tokens.length; i++) {
        String word = tokens[i];

        if (lineLen + (SPACE_SEPARATOR + word).length() > maxLineLength) {
            if (i > 0) {
                output.append(NEWLINE);
            }
            lineLen = 0;
        }
        if (i < tokens.length - 1 && (lineLen + (word + SPACE_SEPARATOR).length() + tokens[i + 1].length() <=
                maxLineLength)) {
            word += SPACE_SEPARATOR;
        }
        output.append(word);
        lineLen += word.length();
    }
    return output.toString();
}

System.out.println(breakLines("THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A     LEGAL AND BINDING " +
                "AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing     your use of this site, " +
            "www.nationalgeographic.com, which includes but is not limited to products, " +
            "software and services offered by way of the website such as the Video Player.", 20));

Выходы:

THESE TERMS AND
CONDITIONS OF
SERVICE (the Terms)
ARE A LEGAL AND
BINDING AGREEMENT
BETWEEN YOU AND
NATIONAL GEOGRAPHIC
governing your use
of this site,
www.nationalgeographic.com,
which includes but
is not limited to
products, software
and services 
offered by way of
the website such as
the Video Player.
1 голос
/ 19 октября 2016

Поскольку Java 8 , вы также можете использовать Потоки для решения таких проблем.

Ниже вы можете найти полный пример, который использует Сокращение с использованием метода .collect ()

Я думаю, что это должно быть короче, чем другие сторонние решения.

private static String multiLine(String longString, String splitter, int maxLength) {
    return Arrays.stream(longString.split(splitter))
            .collect(
                ArrayList<String>::new,     
                (l, s) -> {
                    Function<ArrayList<String>, Integer> id = list -> list.size() - 1;
                    if(l.size() == 0 || (l.get(id.apply(l)).length() != 0 && l.get(id.apply(l)).length() + s.length() >= maxLength)) l.add("");
                    l.set(id.apply(l), l.get(id.apply(l)) + (l.get(id.apply(l)).length() == 0 ? "" : splitter) + s);
                },
                (l1, l2) -> l1.addAll(l2))
            .stream().reduce((s1, s2) -> s1 + "\n" + s2).get();
}

public static void main(String[] args) {
    String longString = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
    String SPLITTER = " ";
    int MAX_LENGTH = 20;
    System.out.println(multiLine(longString, SPLITTER, MAX_LENGTH));
}
1 голос
/ 15 января 2016

Моя версия (предыдущие не работали)

public static List<String> breakSentenceSmart(String text, int maxWidth) {

    StringTokenizer stringTokenizer = new StringTokenizer(text, " ");
    List<String> lines = new ArrayList<String>();
    StringBuilder currLine = new StringBuilder();
    while (stringTokenizer.hasMoreTokens()) {
        String word = stringTokenizer.nextToken();

        boolean wordPut=false;
        while (!wordPut) {
            if(currLine.length()+word.length()==maxWidth) { //exactly fits -> dont add the space
                currLine.append(word);
                wordPut=true;
            }
            else if(currLine.length()+word.length()<=maxWidth) { //whole word can be put
                if(stringTokenizer.hasMoreTokens()) {
                    currLine.append(word + " ");
                }else{
                    currLine.append(word);
                }
                wordPut=true;
            }else{
                if(word.length()>maxWidth) {
                    int lineLengthLeft = maxWidth - currLine.length();
                    String firstWordPart = word.substring(0, lineLengthLeft);
                    currLine.append(firstWordPart);
                    //lines.add(currLine.toString());
                    word = word.substring(lineLengthLeft);
                    //currLine = new StringBuilder();
                }
                lines.add(currLine.toString());
                currLine = new StringBuilder();
            }

        }
        //
    }
    if(currLine.length()>0) { //add whats left
        lines.add(currLine.toString());
    }
    return lines;
}
1 голос
/ 12 июня 2012

Недавно я написал несколько методов для этого, которые, если в одной из строк нет пробельных символов, выбирают разбиение на другие не алфавитно-цифровые символы, прежде чем прибегать к разделению в середине слова.

Вот как у меня получилось:

(Использует lastIndexOfRegex() методы, которые я выложил здесь .)

/**
 * Indicates that a String search operation yielded no results.
 */
public static final int NOT_FOUND = -1;



/**
 * Version of lastIndexOf that uses regular expressions for searching.
 * By Tomer Godinger.
 * 
 * @param str String in which to search for the pattern.
 * @param toFind Pattern to locate.
 * @return The index of the requested pattern, if found; NOT_FOUND (-1) otherwise.
 */
public static int lastIndexOfRegex(String str, String toFind)
{
    Pattern pattern = Pattern.compile(toFind);
    Matcher matcher = pattern.matcher(str);

    // Default to the NOT_FOUND constant
    int lastIndex = NOT_FOUND;

    // Search for the given pattern
    while (matcher.find())
    {
        lastIndex = matcher.start();
    }

    return lastIndex;
}

/**
 * Finds the last index of the given regular expression pattern in the given string,
 * starting from the given index (and conceptually going backwards).
 * By Tomer Godinger.
 * 
 * @param str String in which to search for the pattern.
 * @param toFind Pattern to locate.
 * @param fromIndex Maximum allowed index.
 * @return The index of the requested pattern, if found; NOT_FOUND (-1) otherwise.
 */
public static int lastIndexOfRegex(String str, String toFind, int fromIndex)
{
    // Limit the search by searching on a suitable substring
    return lastIndexOfRegex(str.substring(0, fromIndex), toFind);
}

/**
 * Breaks the given string into lines as best possible, each of which no longer than
 * <code>maxLength</code> characters.
 * By Tomer Godinger.
 * 
 * @param str The string to break into lines.
 * @param maxLength Maximum length of each line.
 * @param newLineString The string to use for line breaking.
 * @return The resulting multi-line string.
 */
public static String breakStringToLines(String str, int maxLength, String newLineString)
{
    StringBuilder result = new StringBuilder();
    while (str.length() > maxLength)
    {
        // Attempt to break on whitespace first,
        int breakingIndex = lastIndexOfRegex(str, "\\s", maxLength);

        // Then on other non-alphanumeric characters,
        if (breakingIndex == NOT_FOUND) breakingIndex = lastIndexOfRegex(str, "[^a-zA-Z0-9]", maxLength);

        // And if all else fails, break in the middle of the word
        if (breakingIndex == NOT_FOUND) breakingIndex = maxLength;

        // Append each prepared line to the builder
        result.append(str.substring(0, breakingIndex + 1));
        result.append(newLineString);

        // And start the next line
        str = str.substring(breakingIndex + 1);
    }

    // Check if there are any residual characters left
    if (str.length() > 0)
    {
        result.append(str);
    }

    // Return the resulting string
    return result.toString();
}
...