Как я могу удалить все не буквенно-цифровые символы в начале и в конце слов, используя разделитель? - PullRequest
0 голосов
/ 06 ноября 2018

Я хочу напечатать список слов, которые я получаю через документ, используя сканер. Файл содержит кусок книги, например:

'Oh, I BEG your pardon!' she exclaimed in a tone of great dismay, and
began picking them up again as quickly as she could, for the accident of
the goldfish kept running in her head, and she had a vague sort of idea
that they must be collected at once and put back into the jury-box, or
they would die.

Для задания я должен использовать разделитель, поэтому мне нужно сформировать шаблон. Я хочу удалить все те не алфавитно-цифровые символы в начале и в конце слов. Как мне реализовать правильный шаблон?

Я думаю, что это должно быть что-то с [^ a-zA-Z0-9], но мне нужно присвоить это заголовку и хвосту слов

Это просто удаляет все неправильные символы, а также в середине слов, что не то, что мне нужно:

Scanner string=openTextFile(fileName).useDelimiter([^a-zA-Z0-9]);

Вот мой код. Этот вопрос о команде содержимого в коммутаторе.

 package nl.ru.ai.SjoerdSam.exercise7;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Scanner;

public class Concordances
{

  final static int MAX_NR_OF_WORDS=20000;
  public static void main(String[] args) throws IOException
  {
    try
    {
      String[] words=new String[MAX_NR_OF_WORDS];
      int[] freqs=new int[MAX_NR_OF_WORDS];
      boolean terminate=true;
      while(terminate)
      {
        System.out.println("Please enter 'read' to start reading a file and display number of words read, "+"'content' to display content (all currently stored words in the order of apperance), "+"'stop' to stop the program or "+"'count'+ the word you want to count to count the number of occurences of a word, "+"the total number of words and the percantage of occurences. Followed by the filename.");
        Scanner input=new Scanner(System.in);
        String userInput=input.nextLine();
        String command=assignTask(userInput);
        String fileName=assignFilename(userInput);
        Scanner string=openTextFile(fileName).useDelimiter("\\s|\\W(?=\\s)|(?<=\\s)\\W|^\\W|\\W$|!");

        switch(command)
        {
          case "read":
            FileInputStream inputStream=new FileInputStream(fileName);
            BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(inputStream));
            String line=bufferedReader.readLine();

            int allWords=0;

            while(line!=null)
            {
              String[] wordsInLine=line.split(" ");
              allWords=allWords+wordsInLine.length;
              line=bufferedReader.readLine();
            }
            System.out.println("The number of words in this file is: "+allWords+"\n");
            break;
          case "content":
            int nr=findAndCountWords(string,words,freqs);
            displayWords(nr,words,freqs);
            break;
          case "stop":
            terminate=false;
            System.out.println("Program terminated");
            break;
          //            case "count":
          //// Bit stuck here on how to do the count and show the frequency of a single word. if i would actually get the frequency the percentage could be found by dividing the frequency with total number of words found above
          //              Scanner single=new Scanner(System.in);
          //              System.out.println("Please type in the word you want to know data of");
          //              String word= single.nextLine();
          //              findAndCountWord(scanner,words,word);
          //              System.out.println("The frequency for the word"+" "+ single +" "+"is" + findAndCountWord(single,words,word) );
          //              break;

        }
      }
    }
    catch(IllegalArgumentException e)
    {
      System.out.print(e);
    }
  }

  private static String assignFilename(String input)
  {
    int i;
    for(i=0;i<input.length();i++)
      if(input.charAt(i)==' '||input==null)
        break;
    input=input.substring(i+1,input.length());
    return input;
  }

  private static String assignTask(String input)
  {
    int i;
    for(i=0;i<input.length();i++)
      if(input.charAt(i)==' ')
        break;
    input=input.substring(0,i);
    return input;
  }

  static Scanner openTextFile(String input) throws FileNotFoundException
  {
    assert (true);
    FileInputStream fileName=new FileInputStream(input);
    BufferedReader bufferedReader=new BufferedReader(new InputStreamReader(fileName));
    return new Scanner(bufferedReader);
  }
  static int findAndCountWords(Scanner scanner, String[] words, int[] freqs)
  {
    assert words!=null&&freqs!=null;
    int nr=0;
    while(scanner.hasNext())
    {
      String word=scanner.next();
      if(updateWord(word,words,freqs,nr))
        nr++;
    }
    return nr;
  }

  static boolean updateWord(String word, String[] words, int[] freqs, int nr)
  {
    assert nr>=0&&words!=null&&freqs!=null;
    int pos=sequentialSearch(words,0,nr,word);
    if(pos<nr)
    {
      freqs[pos]++;
      return false;
    } else if(!word.equals(" "))
    {
      words[pos]=word;
      freqs[pos]=1;
      return true;
    }
    return true;
  }
  static int sequentialSearch(String[] array, int from, int to, String searchValue)
  {
    assert 0<=from&&0<=to : "Invalidbounds";
    assert array!=null : "Array shouldbeinitialized";
    if(from>to)
      return to;
    int position=from;
    while(position<to&&(!array[position].equals(searchValue)))
      position++;
    return position;
  }
  static void displayFrequencies(int nr, String[] words, int[] freqs)
  {
    assert nr>=0&&words!=null&&freqs!=null;

    for(int i=0;i<nr;i++)
    {
      System.out.println(words[i]+" "+freqs[i]);
    }
  }
  static void displayWords(int nr, String[] words, int[] freqs)
  {
    assert nr>=0&&words!=null&&freqs!=null;

    for(int i=0;i<nr;i++)
    {
      System.out.println(words[i]);
    }
  }

  static int findAndCountWord(Scanner scanner, String[] words, String word)
  {
    assert words!=null;
    int wordCount=0;
    while(scanner.hasNext())
    {
      for(int i=0;i<words.length;i++)
      {
        if(word.equals(words[i]))
        {
          wordCount++;
        }
      }
    }
    return wordCount;
  }
}

Вот еще один пример, который я использую как момент:

'Well!' thought Alice to herself, 'after such a fall as this, I shall
think nothing of tumbling down stairs! How brave they'll all think me at
home! Why, I wouldn't say anything about it, even if I fell off the top
of the house!' (Which was very likely true.)

Down, down, down. Would the fall NEVER come to an end! 'I wonder how
many miles I've fallen by this time?' she said aloud. 'I must be getting
somewhere near the centre of the earth. Let me see: that would be four
thousand miles down, I think--' (for, you see, Alice had learnt several
things of this sort in her lessons in the schoolroom, and though this
was not a VERY good opportunity for showing off her knowledge, as there
was no one to listen to her, still it was good practice to say it over)
'--yes, that's about the right distance--but then I wonder what Latitude
or Longitude I've got to?' (Alice had no idea what Latitude was, or
Longitude either, but thought they were nice grand words to say.)

1 Ответ

0 голосов
/ 06 ноября 2018

Использовать пробелы, или пунктуация с соседними пробелами / началом / концом, заявленные с помощью осмотров:

"\\s|\\W+(?=\\s)|(?<=\\s)\\W+|^\\W+|\\W+$"

См. живое демо .

\W означает не слово, а слово - любая буква или цифра (или подчеркивание, но это не должно вызывать проблем).

...