невозможно получить распознавание речи при использовании открытых ушей с видеоплеером - PullRequest
1 голос
/ 16 июля 2011

В моем приложении я использую открытые уши в качестве звукового движка. Я использую проигрыватель фильмов для воспроизведения фильма, соответствующего слову, которое я говорю, используя открытые уши. Но каждый раз, когда загружается мой первый фильм, мое приложение переключается на потокномер 13059, после этого мое приложение не может распознать речь. Я даже попытался снова создать экземпляр диспетчера аудиосеансов и затем запустить его метод запуска аудиосеанса. Я получил следующую консоль, когда я запускаю приложение в моем устройстве: -

[Session started at 2011-07-16 11:11:47 +0530.]
GNU gdb 6.3.50-20050815 (Apple version gdb-1516) (Fri Feb 11 06:19:43 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "--host=i386-apple-darwin --target=arm-apple-darwin".tty /dev/ttys000
Loading program into debugger…
Program loaded.
target remote-mobile /tmp/.XcodeGDBRemote-173-53
Switching to remote-macosx protocol
mem 0x1000 0x3fffffff cache
mem 0x40000000 0xffffffff none
mem 0x00000000 0x0fff none
run
Running…
[Switching to thread 11523]
[Switching to thread 11523]
sharedlibrary apply-load-rules all
continue
2011-07-16 11:12:51.335 App[568:307] /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.sqlite
2011-07-16 11:12:51.371 App[568:307] /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.dic
2011-07-16 11:12:51.376 App[568:307] /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.languagemodel
2011-07-16 11:12:51.382 App[568:1807] OPENEARSLOGGING: Recognition loop has started
INFO: cmd_ln.c(512): Parsing command line:
\
    -lm /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.languagemodel \
    -dict /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.dic \
    -fdict /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/noisedict \
    -hmm /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app \
    -maxhmmpf 3000 \
    -maxwpf 5 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-argfile            
-ascale     20.0        2.000000e+01
-backtrace  no      no
-beam       1e-48       1.000000e-48
-bestpath   yes     yes
-bestpathlw 9.5     9.500000e+00
-bghist     no      no
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     8.0
-compallsen no      no
-debug              0
-dict               /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.dic
-dictcase   no      no
-dither     no      no
-doublebw   no      no
-ds     1       1
-fdict              /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/noisedict
-feat       1s_c_d_dd   1s_c_d_dd
-featparams         
-fillprob   1e-8        1.000000e-08
-frate      100     100
-fsg                
-fsgusealtpron  yes     yes
-fsgusefiller   yes     yes
-fwdflat    yes     yes
-fwdflatbeam    1e-64       1.000000e-64
-fwdflatefwid   4       4
-fwdflatlw  8.5     8.500000e+00
-fwdflatsfwin   25      25
-fwdflatwbeam   7e-29       7.000000e-29
-fwdtree    yes     yes
-hmm                /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app
-input_endian   little      little
-jsgf               
-kdmaxbbi   -1      -1
-kdmaxdepth 0       0
-kdtree             
-latsize    5000        5000
-lda                
-ldadim     0       0
-lextreedump    0       0
-lifter     0       0
-lm             /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.languagemodel
-lmctl              
-lmname     default     default
-logbase    1.0001      1.000100e+00
-logfn              
-logspec    no      no
-lowerf     133.33334   1.333333e+02
-lpbeam     1e-40       1.000000e-40
-lponlybeam 7e-29       7.000000e-29
-lw     6.5     6.500000e+00
-maxhmmpf   -1      3000
-maxnewoov  20      20
-maxwpf     -1      5
-mdef               
-mean               
-mfclogdir          
-mixw               
-mixwfloor  0.0000001   1.000000e-07
-mllr               
-mmap       yes     yes
-ncep       13      13
-nfft       512     512
-nfilt      40      40
-nwpen      1.0     1.000000e+00
-pbeam      1e-48       1.000000e-48
-pip        1.0     1.000000e+00
-pl_beam    1e-10       1.000000e-10
-pl_pbeam   1e-5        1.000000e-05
-pl_window  0       0
-rawlogdir          
-remove_dc  no      no
-round_filters  yes     yes
-samprate   16000       1.600000e+04
-seed       -1      -1
-sendump            
-senmgau            
-silprob    0.005       5.000000e-03
-smoothspec no      no
-svspec             
-tmat               
-tmatfloor  0.0001      1.000000e-04
-topn       4       4
-topn_beam  0       0
-toprule            
-transform  legacy      legacy
-unit_area  yes     yes
-upperf     6855.4976   6.855498e+03
-usewdphones    no      no
-uw     1.0     1.000000e+00
-var                
-varfloor   0.0001      1.000000e-04
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wbeam      7e-29       7.000000e-29
-wip        0.65        6.500000e-01
-wlen       0.025625    2.562500e-02

INFO: cmd_ln.c(512): Parsing command line:
\
    -nfilt 20 \
    -lowerf 1 \
    -upperf 4000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -svspec 0-12/13-25/26-38 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -cmninit 39 \
    -varnorm no 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     39
-dither     no      no
-doublebw   no      no
-feat       1s_c_d_dd   1s_c_d_dd
-frate      100     100
-input_endian   little      little
-lda                
-ldadim     0       0
-lifter     0       0
-logspec    no      no
-lowerf     133.33334   1.000000e+00
-ncep       13      13
-nfft       512     512
-nfilt      40      20
-remove_dc  no      yes
-round_filters  yes     no
-samprate   16000       1.600000e+04
-seed       -1      -1
-smoothspec no      no
-svspec             0-12/13-25/26-38
-transform  legacy      dct
-unit_area  yes     yes
-upperf     6855.4976   4.000000e+03
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wlen       0.025625    2.500000e-02

INFO: acmod.c(238): Parsed model-specific feature parameters from /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/feat.params
INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(520): Reading model definition: /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(330): Reading binary model definition: /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/mdef
2011-07-16 11:12:51.521 App[568:307] string: -------- App
2011-07-16 11:12:51.537 App[568:307] bonjourTypeFromIdentifier App
INFO: bin_mdef.c(508): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size

INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size

INFO: ms_gauden.c(358): 0 variance values floored
INFO: s2_semi_mgau.c(897): Loading senones from dump file /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/sendump
INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(294): Allocating 4193 * 20 bytes (81 KiB) for word entries
INFO: dict.c(306): Reading main dictionary: /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.dic
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(309): 86 words read
INFO: dict.c(314): Reading filler dictionary: /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/App.app/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(476): ngrams 1=76, 2=146, 3=86
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(515):       76 = #unigrams created
INFO: ngram_model_arpa.c(194): Reading bigrams
INFO: ngram_model_arpa.c(531):      146 = #bigrams created
INFO: ngram_model_arpa.c(532):        9 = #prob2 entries
INFO: ngram_model_arpa.c(539):        3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(291): Reading trigrams
INFO: ngram_model_arpa.c(552):       86 = #trigrams created
INFO: ngram_model_arpa.c(553):        5 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 57 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 281
INFO: ngram_search_fwdtree.c(333): after: 57 root, 153 non-root channels, 11 single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
2011-07-16 11:12:52.237 App[568:1807] OPENEARSLOGGING: Starting openAudioDevice on the device.
2011-07-16 11:12:52.240 App[568:1807] OPENEARSLOGGING: Audio unit wrapper successfully created.
2011-07-16 11:12:52.250 App[568:1807] OPENEARSLOGGING: Set audio route to SpeakerAndMicrophone
2011-07-16 11:12:52.253 App[568:1807] OPENEARSLOGGING: Setting the variables for the device and starting it.
2011-07-16 11:12:52.255 App[568:1807] OPENEARSLOGGING: Looping through ringbuffer sections and pre-allocating them.
2011-07-16 11:12:52.434 App[568:307]  own Device :Mark’s iPad
2011-07-16 11:12:52.811 App[568:1807] OPENEARSLOGGING: Started audio output unit.
2011-07-16 11:12:52.820 App[568:1807] OPENEARSLOGGING: Calibration has started
2011-07-16 11:12:55.516 App[568:307] /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.sqlite
2011-07-16 11:12:57.040 App[568:1807] OPENEARSLOGGING: Calibration has completed
2011-07-16 11:12:57.052 App[568:1807] OPENEARSLOGGING: Project has these words in its dictionary:
AROUND
AROUND(2)
BEHIND
BLINK
BOUNCE
BRUSH
BUCKLE
BUMP
SLEEP
SPILL
SPLASH
STACK
UNTIE
UNZIP
VACCUM
WAIT
WAKE
WALK
WALK(2)
WASH
WASH(2)
WIPE
WRITE
YES
ZIP

2011-07-16 11:12:57.053 App[568:1807] OPENEARSLOGGING: Listening.
2011-07-16 11:12:57.042 App[568:307] :-)
2011-07-16 11:12:57.055 App[568:307] Did start listening
2011-07-16 11:12:57.963 App[568:1807] OPENEARSLOGGING: Speech detected...
2011-07-16 11:12:59.470 App[568:1807] OPENEARSLOGGING: Stopping audio unit.
2011-07-16 11:12:59.601 App[568:1807] OPENEARSLOGGING: Audio Output Unit stopped, cleaning up variable states.
2011-07-16 11:12:59.606 App[568:1807] OPENEARSLOGGING: Processing speech, please wait...
INFO: cmn_prior.c(121): cmn_prior_update: from < 39.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_prior.c(139): cmn_prior_update: to   < 55.07  1.28 -0.44 -1.36 -1.51 -1.02 -1.49 -0.38 -0.87 -0.73 -0.22 -0.40  0.07 >
INFO: ngram_search_fwdtree.c(1513):     1717 words recognized (20/fr)
INFO: ngram_search_fwdtree.c(1515):    67993 senones evaluated (773/fr)
INFO: ngram_search_fwdtree.c(1517):    48722 channels searched (553/fr), 4788 1st, 34707 last
INFO: ngram_search_fwdtree.c(1521):     2394 words for which last channels evaluated (27/fr)
INFO: ngram_search_fwdtree.c(1524):     2597 candidate words for entering last phone (29/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 42 words
INFO: ngram_search_fwdflat.c(912):      989 words recognized (11/fr)
INFO: ngram_search_fwdflat.c(914):    76756 senones evaluated (872/fr)
INFO: ngram_search_fwdflat.c(916):    75060 channels searched (852/fr)
INFO: ngram_search_fwdflat.c(918):     3486 words searched (39/fr)
INFO: ngram_search_fwdflat.c(920):     2315 word transitions (26/fr)
INFO: ngram_search.c(1137): lattice start node <s>.0 end node </s>.76
INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(</s>:76:86) = -674533
INFO: ps_lattice.c(1266): Joint P(O,S) = -687162 P(S|O) = -12629
2011-07-16 11:13:00.129 App[568:1807] OPENEARSLOGGING: Pocketsphinx heard "NEXT TO" with a score of (-12629) and an utterance ID of 000000000.
2011-07-16 11:13:00.131 App[568:1807] OPENEARSLOGGING: Setting the variables for the device and starting it.
2011-07-16 11:13:00.134 App[568:1807] OPENEARSLOGGING: Looping through ringbuffer sections and pre-allocating them.
2011-07-16 11:13:00.133 App[568:307] NEXT TO
2011-07-16 11:13:00.237 App[568:1807] OPENEARSLOGGING: Started audio output unit.
2011-07-16 11:13:00.241 App[568:1807] OPENEARSLOGGING: Stopping audio unit.
2011-07-16 11:13:00.367 App[568:1807] OPENEARSLOGGING: Audio Output Unit stopped, cleaning up variable states.
2011-07-16 11:13:00.377 App[568:1807] OPENEARSLOGGING: This device is not recording, so first we will set its recording status to 0
2011-07-16 11:13:00.382 App[568:1807] OPENEARSLOGGING: The audio unit is running so we are going to dispose of its instance
2011-07-16 11:13:00.384 App[568:1807] OPENEARSLOGGING: No longer listening.
2011-07-16 11:13:00.444 App[568:307] Movie loading
2011-07-16 11:13:00.450 App[568:307] OPENEARSLOGGING: The audio session has already been initialized, continuing to set its properties.
2011-07-16 11:13:00.457 App[568:180b] OPENEARSLOGGING: Recognition loop has started


[Switching to thread 13059]
2011-07-16 11:13:01.291 App[568:180b] OPENEARSLOGGING: Starting openAudioDevice on the device.
2011-07-16 11:13:01.301 App[568:180b] OPENEARSLOGGING: Audio unit wrapper successfully created.
2011-07-16 11:13:01.856 App[568:180b] OPENEARSLOGGING: Set audio route to SpeakerAndMicrophone
2011-07-16 11:13:01.866 App[568:180b] OPENEARSLOGGING: Setting the variables for the device and starting it.
2011-07-16 11:13:01.868 App[568:180b] OPENEARSLOGGING: Looping through ringbuffer sections and pre-allocating them.
2011-07-16 11:13:02.493 App[568:180b] OPENEARSLOGGING: Started audio output unit.
2011-07-16 11:13:02.497 App[568:180b] OPENEARSLOGGING: Calibration has started
2011-07-16 11:13:06.713 App[568:180b] OPENEARSLOGGING: Calibration has completed
2011-07-16 11:13:06.722 App[568:180b] OPENEARSLOGGING: Project has these words in its dictionary:
AROUND
AROUND(2)
BEHIND
BLINK
BOUNCE
BRUSH
BUCKLE
BUMP
OFF
ON
ON(2)
OPEN
OUT
PUT
READ
READ(2)
SEND
SHOWLIST
SLEEP
SPILL
SPLASH
STACK
STIR
SWEEP
SWING
TAKE
TALK
TIE
TO
TO(2)
TO(3)
TURN
UNBUCKLE
UNBUTTON
UNCOVER
UNDERLINE
UNFOLD
UNTIE
UNZIP
VACCUM
WAIT
WAKE
WALK
WALK(2)


2011-07-16 11:13:06.723 App[568:180b] OPENEARSLOGGING: Listening.
2011-07-16 11:13:06.715 App[568:307] :-)
2011-07-16 11:13:06.724 App[568:307] Did start listening
2011-07-16 11:14:04.907 App[568:307] /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.sqlite
2011-07-16 11:14:05.389 App[568:307] /var/mobile/Applications/8B771C61-EBC1-422E-91A7-B8FAA8A306C7/Documents/App.sqlite




Also this is my code where i am playing the video:--


if(moviePath)
    {

        NSLog(@"Movie loading");
        //MPMoviePlayerController *moviePlayer;
        NSURL *movieURL = [NSURL fileURLWithPath:moviePath];

        moviePlayer=[[MPMoviePlayerController alloc] initWithContentURL:movieURL];
        moviePlayer.useApplicationAudioSession=NO;
        [[moviePlayer view] setFrame: CGRectMake(30,70,385,225)];
        [self.view  addSubview:moviePlayer.view];
        [moviePlayer setContentURL:movieURL];
        moviePlayer.shouldAutoplay=NO;

    }

Пожалуйста, помогите. Почему я не получаю вывод.

Спасибо, Сачин

1 Ответ

1 голос
/ 16 июля 2011

Я разработчик OpenEars.Вы не можете использовать AVPlayer, MPMoviePlayer или MPMusicPlayerController одновременно с PocketsphinxController.Вы должны быть в состоянии использовать их, остановить их, освободить их, а затем сбросить настройки аудиосеанса с помощью AudioSessionManager и , а затем создать и запустить PocketsphinxController.Насколько я могу судить, слишком многие из настроек аудиосеанса, которые требуются драйверу аудиоустройства с низкой задержкой, автоматически переопределяются объектами AVPlayer, MPMoviePlayer и MPMusicPlayerController, и они не могут быть переопределены AudioSessionManager, покаобъекты все еще используются (что, кажется, означает: все еще созданный).Это не проблема с AVAudioPlayer.

В настоящее время многие обсуждают это на форумах OpenEars , и я на начальной стадии выясняю, что можно с этим сделать, поэтомувозможно, вы захотите зайти и добавить свои данные, чтобы я мог убедиться, что смогу дать правильное руководство по этому вопросу в документах и ​​/ или, возможно, найти обходной путь или основную ошибку.

РЕДАКТИРОВАТЬ:Я обнаружил одну проблему, которая делала эту не очень надежной, и я только что выпустил .912, который включает в себя исправление.Теперь не нужно выпускать объекты медиаплеера, достаточно остановить их воспроизведение и запустить [audioSessionManager startAudioSession] во второй раз и (только если это было прервано, не обязательно в каждом случае) перезапустить цикл прослушивания PocketsphinxController.

...