У меня есть шардинговая система на Docker. У меня есть 6 сегментов с набором реплик (PSA), конфигурационный сервер с набором реплик и 2 мес go сервера.
Когда я закрываю или удаляю первичный контейнер осколка; Вторичная школа на короткое время стала первичной, а затем снова вторичной. Я ни разу не добирался до баз данных. Я всегда получаю ошибку ниже
**Unable to reach primary for set shard01**
Журналы вторичных контейнеров и контейнеров Arbiter находятся ниже. Не могли бы вы мне помочь?
Спасибо.
Журнал вторичного контейнера находится ниже
2020-07-14T14:06:39.446+0300 I COMMAND [conn15] Received replSetStepUp request
2020-07-14T14:06:39.446+0300 I ELECTION [conn15] Starting an election due to step up request
2020-07-14T14:06:39.446+0300 I ELECTION [conn15] skipping dry run and running for election in term 2
2020-07-14T14:06:39.448+0300 I REPL [replexec-1145] Scheduling remote command request for vote request: RemoteCommand 940662 -- target:srd01-primary:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 2, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:06:39.448+0300 I REPL [replexec-1145] Scheduling remote command request for vote request: RemoteCommand 940663 -- target:srd01-arbiter:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 2, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:06:39.449+0300 I ELECTION [replexec-1151] VoteRequester(term 2) received a no vote from srd01-arbiter:27017 with reason "can see a healthy primary (srd01-primary:27017) of equal or greater priority"; response message: { term: 2, voteGranted: false, reason: "can see a healthy primary (srd01-primary:27017) of equal or greater priority", ok: 1.0 }
2020-07-14T14:06:39.450+0300 I ELECTION [replexec-1155] VoteRequester(term 2) received an invalid response from srd01-primary:27017: ShutdownInProgress: In the process of shutting down; response message: { operationTime: Timestamp(1594724796, 1), ok: 0.0, errmsg: "In the process of shutting down", code: 91, codeName: "ShutdownInProgress", $gleStats: { lastOpTime: Timestamp(0, 0), electionId: ObjectId('7fffffff0000000000000001') }, lastCommittedOpTime: Timestamp(1594724796, 1), $configServerState: { opTime: { ts: Timestamp(1594724798, 2), t: 1 } }, $clusterTime: { clusterTime: Timestamp(1594724798, 2), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } } }
2020-07-14T14:06:39.450+0300 I ELECTION [replexec-1155] not becoming primary, we received insufficient votes
2020-07-14T14:06:39.450+0300 I ELECTION [replexec-1155] Lost election due to internal error
2020-07-14T14:06:39.450+0300 I COMMAND [conn15] replSetStepUp request failed :: caused by :: CommandFailed: Election failed.
2020-07-14T14:06:39.780+0300 I NETWORK [conn15] end connection 10.0.41.4:58820 (22 connections now open)
2020-07-14T14:06:39.789+0300 I NETWORK [conn38] end connection 10.0.41.4:59336 (21 connections now open)
2020-07-14T14:06:39.802+0300 I REPL [replication-2] Restarting oplog query due to error: InterruptedAtShutdown: error in fetcher batch callback :: caused by :: interrupted at shutdown. Last fetched optime: { ts: Timestamp(1594724796, 1), t: 1 }. Restarts remaining: 1
2020-07-14T14:06:39.803+0300 I REPL [replication-2] Scheduled new oplog query Fetcher source: srd01-primary:27017 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp(1594724796, 1) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 2, readConcern: { afterClusterTime: Timestamp(0, 1) } } query metadata: { $replData: 1, $oplogQueryData: 1, $readPreference: { mode: "secondaryPreferred" } } active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 940664 -- target:srd01-primary:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp(1594724796, 1) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 2, readConcern: { afterClusterTime: Timestamp(0, 1) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: {type: "NoRetryPolicy"}
2020-07-14T14:06:39.803+0300 I REPL [replication-1] Error returned from oplog query (no more query restarts left): InterruptedAtShutdown: error in fetcher batch callback :: caused by :: interrupted at shutdown
2020-07-14T14:06:39.803+0300 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InterruptedAtShutdown: error in fetcher batch callback :: caused by :: interrupted at shutdown
2020-07-14T14:06:39.803+0300 I REPL [rsBackgroundSync] Clearing sync source srd01-primary:27017 to choose a new one.
2020-07-14T14:06:39.804+0300 I REPL [rsBackgroundSync] could not find member to sync from
2020-07-14T14:06:39.807+0300 I REPL_HB [replexec-1156] Heartbeat to srd01-primary:27017 failed after 2 retries, response status: InterruptedAtShutdown: interrupted at shutdown
2020-07-14T14:06:39.807+0300 I REPL [replexec-1156] Member srd01-primary:27017 is now in state RS_DOWN - interrupted at shutdown
2020-07-14T14:06:40.307+0300 I CONNPOOL [replexec-1155] dropping unhealthy pooled connection to srd01-primary:27017
2020-07-14T14:06:40.308+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:06:40.313+0300 I REPL_HB [replexec-1155] Heartbeat to srd01-primary:27017 failed after 2 retries, response status: HostUnreachable: Error connecting to srd01-primary:27017 (10.0.41.29:27017) :: caused by :: Connection refused
2020-07-14T14:06:41.222+0300 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to srd01-primary:27017: InvalidSyncSource: Sync source was cleared. Was srd01-primary:27017
2020-07-14T14:06:56.613+0300 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] dropping unhealthy pooled connection to srd01-primary:27017
2020-07-14T14:06:56.613+0300 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to srd01-primary:27017
2020-07-14T14:07:00.813+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:07:01.025+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:07:01.026+0300 I ELECTION [replexec-1156] Starting an election, since we've seen no PRIMARY in the past 10000ms
2020-07-14T14:07:01.026+0300 I ELECTION [replexec-1156] conducting a dry run election to see if we could be elected. current term: 2
2020-07-14T14:07:01.026+0300 I REPL [replexec-1156] Scheduling remote command request for vote request: RemoteCommand 940693 -- target:srd01-primary:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "shard01", dryRun: true, term: 2, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:01.026+0300 I REPL [replexec-1156] Scheduling remote command request for vote request: RemoteCommand 940694 -- target:srd01-arbiter:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "shard01", dryRun: true, term: 2, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:01.026+0300 I CONNPOOL [Replication] Connecting to srd01-arbiter:27017
2020-07-14T14:07:01.027+0300 I ELECTION [replexec-1157] VoteRequester(term 2 dry run) received a yes vote from srd01-arbiter:27017; response message: { term: 2, voteGranted: true, reason: "", ok: 1.0 }
2020-07-14T14:07:01.027+0300 I ELECTION [replexec-1157] dry election run succeeded, running for election in term 3
2020-07-14T14:07:01.028+0300 I REPL [replexec-1157] Scheduling remote command request for vote request: RemoteCommand 940695 -- target:srd01-primary:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 3, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:01.028+0300 I REPL [replexec-1157] Scheduling remote command request for vote request: RemoteCommand 940696 -- target:srd01-arbiter:27017 db:admin cmd:{ replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 3, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:01.029+0300 I COMMAND [conn42] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4415ms
2020-07-14T14:07:01.029+0300 I COMMAND [conn26] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4270ms
2020-07-14T14:07:01.029+0300 I COMMAND [conn25] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4266ms
2020-07-14T14:07:01.030+0300 I COMMAND [conn28] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4210ms
2020-07-14T14:07:01.030+0300 I COMMAND [conn27] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4209ms
2020-07-14T14:07:01.030+0300 I COMMAND [conn29] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4172ms
2020-07-14T14:07:01.030+0300 I COMMAND [conn30] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4145ms
2020-07-14T14:07:01.031+0300 I COMMAND [conn53] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4092ms
2020-07-14T14:07:01.031+0300 I COMMAND [conn54] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4092ms
2020-07-14T14:07:01.031+0300 I COMMAND [conn33] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 3975ms
2020-07-14T14:07:01.031+0300 I COMMAND [conn34] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 3974ms
2020-07-14T14:07:01.031+0300 I COMMAND [conn17] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 3900ms
2020-07-14T14:07:01.032+0300 I COMMAND [conn35] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 3760ms
2020-07-14T14:07:01.033+0300 I COMMAND [conn24] command admin.$cmd command: isMaster { isMaster: 1, $db: "admin" } numYields:0 reslen:908 locks:{} protocol:op_msg 4345ms
2020-07-14T14:07:01.037+0300 I ELECTION [replexec-1145] VoteRequester(term 3) received a yes vote from srd01-arbiter:27017; response message: { term: 3, voteGranted: true, reason: "", ok: 1.0 }
2020-07-14T14:07:01.037+0300 I ELECTION [replexec-1145] election succeeded, assuming primary role in term 3
2020-07-14T14:07:01.037+0300 I REPL [replexec-1145] transition to PRIMARY from SECONDARY
2020-07-14T14:07:01.037+0300 I REPL [replexec-1145] Resetting sync source to empty, which was :27017
2020-07-14T14:07:01.037+0300 I REPL [replexec-1145] Entering primary catch-up mode.
2020-07-14T14:07:01.613+0300 I NETWORK [ReplicaSetMonitor-TaskExecutor] Marking host srd01-primary:27017 as failed :: caused by :: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2020-07-14T14:07:01.613+0300 W NETWORK [ReplicaSetMonitor-TaskExecutor] Unable to reach primary for set shard01
Лог контейнера арбитра ниже
2020-07-14T14:06:39.448+0300 I ELECTION [conn10] Sending vote response: { term: 2, voteGranted: false, reason: "can see a healthy primary (srd01-primary:27017) of equal or greater priority" }
2020-07-14T14:06:39.780+0300 I NETWORK [conn3] end connection 10.0.41.4:38140 (1 connection now open)
2020-07-14T14:06:40.950+0300 I CONNPOOL [replexec-1394] dropping unhealthy pooled connection to srd01-primary:27017
2020-07-14T14:06:40.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:07:00.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:07:01.022+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:07:01.024+0300 I REPL [replexec-1395] Member srd01-primary:27017 is now in state RS_DOWN - Couldn't get a connection within the time limit
2020-07-14T14:07:01.027+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: true, term: 2, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:01.027+0300 I ELECTION [conn10] Sending vote response: { term: 2, voteGranted: true, reason: "" }
2020-07-14T14:07:01.029+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 3, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:01.029+0300 I ELECTION [conn10] Sending vote response: { term: 3, voteGranted: true, reason: "" }
2020-07-14T14:07:20.851+0300 I NETWORK [listener] connection accepted from 10.0.41.4:54514 #12 (2 connections now open)
2020-07-14T14:07:20.851+0300 I NETWORK [conn12] received client metadata from 10.0.41.4:54514 conn12: { driver: { name: "NetworkInterfaceTL", version: "4.2.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2020-07-14T14:07:20.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:07:22.023+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:07:22.025+0300 I - [conn12] operation was interrupted because a client disconnected
2020-07-14T14:07:22.025+0300 W COMMAND [conn12] Unable to gather storage statistics for a slow operation due to lock aquire timeout
2020-07-14T14:07:22.026+0300 I COMMAND [conn12] command admin.$cmd command: isMaster { isMaster: 1, client: { driver: { name: "NetworkInterfaceTL", version: "4.2.2" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }, compression: [ "snappy", "zstd", "zlib" ], internalClient: { minWireVersion: 8, maxWireVersion: 8 }, hangUpOnStepDown: false, saslSupportedMechs: "local.__system", $db: "admin" } numYields:0 reslen:685 locks:{} protocol:op_query 1174ms
2020-07-14T14:07:22.026+0300 I NETWORK [conn12] end connection 10.0.41.4:54514 (1 connection now open)
2020-07-14T14:07:22.027+0300 I REPL [replexec-1394] Member srd01-secondary:27017 is now in state PRIMARY
2020-07-14T14:07:40.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:07:43.023+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:07:43.036+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: true, term: 3, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:43.036+0300 I ELECTION [conn10] Sending vote response: { term: 3, voteGranted: true, reason: "" }
2020-07-14T14:07:43.038+0300 I REPL [replexec-1395] Member srd01-secondary:27017 is now in state SECONDARY
2020-07-14T14:07:43.038+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 4, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:07:43.038+0300 I ELECTION [conn10] Sending vote response: { term: 4, voteGranted: true, reason: "" }
2020-07-14T14:08:00.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:08:04.022+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:08:20.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:08:25.024+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:08:25.027+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: true, term: 4, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:08:25.027+0300 I ELECTION [conn10] Sending vote response: { term: 4, voteGranted: true, reason: "" }
2020-07-14T14:08:25.032+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 5, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:08:25.032+0300 I ELECTION [conn10] Sending vote response: { term: 5, voteGranted: true, reason: "" }
2020-07-14T14:08:40.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:08:46.025+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:09:00.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:09:05.197+0300 I CONTROL [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: sharding state is not yet initialized
2020-07-14T14:09:05.199+0300 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: sharding state is not yet initialized
2020-07-14T14:09:07.024+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:09:07.026+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: true, term: 5, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:09:07.026+0300 I ELECTION [conn10] Sending vote response: { term: 5, voteGranted: true, reason: "" }
2020-07-14T14:09:07.027+0300 I ELECTION [conn10] Received vote request: { replSetRequestVotes: 1, setName: "shard01", dryRun: false, term: 6, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp(1594724796, 1), t: 1 } }
2020-07-14T14:09:07.027+0300 I ELECTION [conn10] Sending vote response: { term: 6, voteGranted: true, reason: "" }
2020-07-14T14:09:20.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:09:28.026+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:09:28.028+0300 I REPL [replexec-1394] Member srd01-secondary:27017 is now in state PRIMARY
2020-07-14T14:09:40.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:09:49.024+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:10:00.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017
2020-07-14T14:10:10.022+0300 I NETWORK [ftdc] getaddrinfo("srd01-primary") failed: Temporary failure in name resolution
2020-07-14T14:10:20.950+0300 I CONNPOOL [Replication] Connecting to srd01-primary:27017