У меня есть .txt с несколькими строками, которые выглядят так:
> X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1),EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
> X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1),EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
Каждое поле разделено табуляцией, а четвертое содержит несколько данных, разделенных запятой. Я знаю, что могу разделить его на tr , '\n'
, дав следующее:
X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1)
EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1)
EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1)
NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
Но то, что я хотел бы, было бы следующее:
X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA 11161.p1 NA A/A 77 A/A 87 A/C 97 A/C 0
X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033699.1|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
X 147010263 SNP EXON(MODIFIER|||||FMR1||CODING|NR_033700.1|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|516|FMR1||CODING|NM_001185081.1|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|537|FMR1||CODING|NM_001185075.1|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|586|FMR1||CODING|NM_001185082.1|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|611|FMR1||CODING|NM_001185076.1|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
X 147010263 SNP NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|aaA/aaC|K119N|632|FMR1||CODING|NM_002024.5|5|1) NA NA 13829.p1 A/A 46 A/A 83 A/C 17 A/C 0
Обратите внимание, что начало строки (X 147010263, его хромосомные позиции) также могут быть разными, например, 3 41278119, 4 114275304
Как мне этого добиться?
Спасибо!