Я только что написал приложение Flex, которое обрабатывает текстовое содержимое Википедии в виде строк.
Я пытаюсь использовать RegExp, чтобы очистить всю разметку Википедии. Вот пример:
Мне бы хотелось:
var pageText:String = new String("was an [[People of the United States|American]] [[film director]], writer, [[Film producer|producer]], and [[photographer]] who lived in England during most of the last four decades of his career. Kubrick was noted for the scrupulous care with which he chose his subjects, his slow method of working, the variety of genres he worked in, his technical perfectionism, and his reclusiveness about his films and personal life. He maintained almost complete artistic control, making movies according to his own whims and time constraints, but with the rare advantage of big-[[Movie studio|studio]] [[financial support]] for all his endeavors.");
чтобы выглядеть так:
var pageText:String = new String("was an American film director, writer, producer, and photographer who lived in England during most of the last four decades of his career. Kubrick was noted for the scrupulous care with which he chose his subjects, his slow method of working, the variety of genres he worked in, his technical perfectionism, and his reclusiveness about his films and personal life. He maintained almost complete artistic control, making movies according to his own whims and time constraints, but with the rare advantage of big-studio financial support for all his endeavors.");
Так что мне нужно написать RegExp, который [[Удалить эту часть | но оставь это]].
Я проверил эти среди других:
var pattern:RegExp = new RegExp(/\[\[(.+)\|/);
var pattern2:RegExp = new regExp(/^\[\[\|/);
var pattern3:RegExp = new RegExp(/^\[\[[A-Z].*\|$/);
var pageTextCleaned:String = pageText.replace(pattern, " ");
Тогда было бы легко просто удалить оставшиеся [[и]]
Я не использую этот материал для RegExp, поэтому любая помощь будет отличной!
Спасибо!