Как извлечь информацию о пути? - PullRequest
2 голосов
/ 17 октября 2019

Для следующего JSON я хотел бы извлечь что-то вроде этого (это символ TAB).

CHROMOSOMES<TAB>HUMAN<TAB>1<TAB>1
...
STATUSES<TAB>name<TAB>Approved
...
ATTRIBUTES<TAB>HGNC<TAB>HGNC ID<TAB>gd_hgnc_id
...
ATTRIBUTES<TAB>EXTERNAL<TAB>NCBI Gene ID<TAB>md_eg_id<TAB>NCBI
...
ORDER_BY<TAB>HGNC ID<TAB>gd_hgnc_id
...

Я хотел бы получить умный способ извлечь информацию о пути этой древовидной структуры. Не могли бы вы показать мне лучший способ сделать это? Спасибо.

{
  "CHROMOSOMES": {
    "HUMAN": [
      {
        "name": "1",
        "value": "1"
      },
      {
        "name": "2",
        "value": "2"
      },
      {
        "name": "3",
        "value": "3"
      },
      {
        "name": "4",
        "value": "4"
      },
      {
        "name": "5",
        "value": "5"
      },
      {
        "name": "6",
        "value": "6"
      },
      {
        "name": "7",
        "value": "7"
      },
      {
        "name": "8",
        "value": "8"
      },
      {
        "name": "9",
        "value": "9"
      },
      {
        "name": "10",
        "value": "10"
      },
      {
        "name": "11",
        "value": "11"
      },
      {
        "name": "12",
        "value": "12"
      },
      {
        "name": "13",
        "value": "13"
      },
      {
        "name": "14",
        "value": "14"
      },
      {
        "name": "15",
        "value": "15"
      },
      {
        "name": "16",
        "value": "16"
      },
      {
        "name": "17",
        "value": "17"
      },
      {
        "name": "18",
        "value": "18"
      },
      {
        "name": "19",
        "value": "19"
      },
      {
        "name": "20",
        "value": "20"
      },
      {
        "name": "21",
        "value": "21"
      },
      {
        "name": "22",
        "value": "22"
      },
      {
        "name": "X",
        "value": "X"
      },
      {
        "name": "Y",
        "value": "Y"
      },
      {
        "name": "reserved loci",
        "value": "reserved"
      },
      {
        "name": "mitochondrial",
        "value": "mito"
      },
      {
        "name": "pseudoautosomal",
        "value": "XandY"
      }
    ]
  },
  "STATUSES": [
    {
      "name": "Approved",
      "value": "Approved"
    },
    {
      "name": "Entry and symbol withdrawn",
      "value": "Entry Withdrawn"
    }
  ],
  "ATTRIBUTES": {
    "HGNC": [
      {
        "name": "HGNC ID",
        "value": "gd_hgnc_id"
      },
      {
        "name": "Approved symbol",
        "value": "gd_app_sym"
      },
      {
        "name": "Approved name",
        "value": "gd_app_name"
      },
      {
        "name": "Status",
        "value": "gd_status"
      },
      {
        "name": "Locus type",
        "value": "gd_locus_type"
      },
      {
        "name": "Locus group",
        "value": "gd_locus_group"
      },
      {
        "name": "Previous symbols",
        "value": "gd_prev_sym"
      },
      {
        "name": "Previous name",
        "value": "gd_prev_name"
      },
      {
        "name": "Synonyms",
        "value": "gd_aliases"
      },
      {
        "name": "Name synonyms",
        "value": "gd_name_aliases"
      },
      {
        "name": "Chromosome",
        "value": "gd_pub_chrom_map"
      },
      {
        "name": "Date approved",
        "value": "gd_date2app_or_res"
      },
      {
        "name": "Date modified",
        "value": "gd_date_mod"
      },
      {
        "name": "Date symbol changed",
        "value": "gd_date_sym_change"
      },
      {
        "name": "Date name changed",
        "value": "gd_date_name_change"
      },
      {
        "name": "Accession numbers",
        "value": "gd_pub_acc_ids"
      },
      {
        "name": "Enzyme IDs",
        "value": "gd_enz_ids"
      },
      {
        "name": "NCBI Gene ID",
        "value": "gd_pub_eg_id"
      },
      {
        "name": "Ensembl gene ID",
        "value": "gd_pub_ensembl_id"
      },
      {
        "name": "Mouse genome database ID",
        "value": "gd_mgd_id"
      },
      {
        "name": "Specialist database links",
        "value": "gd_other_ids"
      },
      {
        "name": "Specialist database IDs",
        "value": "gd_other_ids_list"
      },
      {
        "name": "Pubmed IDs",
        "value": "gd_pubmed_ids"
      },
      {
        "name": "RefSeq IDs",
        "value": "gd_pub_refseq_ids"
      },
      {
        "name": "Gene group ID",
        "value": "family.id"
      },
      {
        "name": "Gene group name",
        "value": "family.name"
      },
      {
        "name": "CCDS IDs",
        "value": "gd_ccds_ids"
      },
      {
        "name": "Vega IDs",
        "value": "gd_vega_ids"
      },
      {
        "name": "Locus specific databases",
        "value": "gd_lsdb_links"
      }
    ],
    "EXTERNAL": [
      {
        "name": "NCBI Gene ID",
        "source": "NCBI",
        "value": "md_eg_id"
      },
      {
        "name": "OMIM ID",
        "source": "OMIM",
        "value": "md_mim_id"
      },
      {
        "name": "RefSeq",
        "source": "NCBI",
        "value": "md_refseq_id"
      },
      {
        "name": "UniProt ID",
        "source": "UniProt",
        "value": "md_prot_id"
      },
      {
        "name": "Ensembl ID",
        "source": "Ensembl",
        "value": "md_ensembl_id"
      },
      {
        "name": "Vega ID",
        "source": "Vega",
        "value": "md_vega_id"
      },
      {
        "name": "UCSC ID",
        "source": "UCSC",
        "value": "md_ucsc_id"
      },
      {
        "name": "Mouse genome database ID",
        "source": "MGI",
        "value": "md_mgd_id"
      },
      {
        "name": "Rat genome database ID",
        "source": "RGD",
        "value": "md_rgd_id"
      },
      {
        "name": "LNCipedia",
        "source": "LNCipedia",
        "value": "md_lncipedia"
      },
      {
        "name": "GtRNAdb",
        "source": "GtRNAdb",
        "value": "md_gtrnadb"
      }
    ]
  },
  "ORDER_BY": [
    {
      "name": "HGNC ID",
      "value": "gd_hgnc_id"
    },
    {
      "name": "Approved symbol",
      "value": "gd_app_sym_sort"
    },
    {
      "name": "Approved name",
      "value": "gd_app_name"
    },
    {
      "name": "Status",
      "value": "gd_status"
    },
    {
      "name": "Locus type",
      "value": "gd_locus_type"
    },
    {
      "name": "Locus group",
      "value": "gd_locus_group"
    },
    {
      "name": "Previous symbols",
      "value": "gd_prev_sym"
    },
    {
      "name": "Previous name",
      "value": "gd_prev_name"
    },
    {
      "name": "Synonyms",
      "value": "gd_aliases"
    },
    {
      "name": "Name synonyms",
      "value": "gd_name_aliases"
    },
    {
      "name": "Chromosome",
      "value": "gd_pub_chrom_map_sort"
    },
    {
      "name": "Date approved",
      "value": "gd_date2app_or_res"
    },
    {
      "name": "Date modified",
      "value": "gd_date_mod"
    },
    {
      "name": "Date symbol changed",
      "value": "gd_date_sym_change"
    },
    {
      "name": "Date name changed",
      "value": "gd_date_name_change"
    },
    {
      "name": "Accession numbers",
      "value": "gd_pub_acc_ids"
    },
    {
      "name": "Enzyme IDs",
      "value": "gd_enz_ids"
    },
    {
      "name": "NCBI Gene ID",
      "value": "gd_pub_eg_id"
    },
    {
      "name": "Ensembl gene ID",
      "value": "gd_pub_ensembl_id"
    },
    {
      "name": "Mouse genome database ID",
      "value": "gd_mgd_id"
    },
    {
      "name": "Specialist database links",
      "value": "gd_other_ids"
    },
    {
      "name": "Specialist database IDs",
      "value": "gd_other_ids_list"
    },
    {
      "name": "Pubmed IDs",
      "value": "gd_pubmed_ids"
    },
    {
      "name": "RefSeq IDs",
      "value": "gd_pub_refseq_ids"
    },
    {
      "name": "Gene group ID",
      "value": "family.id"
    },
    {
      "name": "Gene group name",
      "value": "family.name"
    },
    {
      "name": "CCDS IDs",
      "value": "gd_ccds_ids"
    },
    {
      "name": "Vega IDs",
      "value": "gd_vega_ids"
    },
    {
      "name": "Locus specific databases",
      "value": "gd_lsdb_links"
    },
    {
      "name": "NCBI Gene ID (supplied by NCBI)",
      "value": "md_eg_id"
    },
    {
      "name": "OMIM ID (supplied by OMIM)",
      "value": "md_mim_id"
    },
    {
      "name": "RefSeq (supplied by NCBI)",
      "value": "md_refseq_id"
    },
    {
      "name": "UniProt ID (supplied by UniProt)",
      "value": "md_prot_id"
    },
    {
      "name": "Ensembl ID (supplied by Ensembl)",
      "value": "md_ensembl_id"
    },
    {
      "name": "Vega ID (supplied by Vega)",
      "value": "md_vega_id"
    },
    {
      "name": "UCSC ID (supplied by UCSC)",
      "value": "md_ucsc_id"
    },
    {
      "name": "Mouse genome database ID (supplied by MGI)",
      "value": "md_mgd_id"
    },
    {
      "name": "Rat genome database ID (supplied by RGD)",
      "value": "md_rgd_id"
    },
    {
      "name": "LNCipedia ID (supplied by LNCipedia)",
      "value": "md_lncipedia"
    },
    {
      "name": "GtRNAdb ID (supplied by GtRNAdb)",
      "value": "md_gtrnadb"
    }
  ],
  "OUTPUT": [
    "Text",
    "Make URL for text"
  ]
}

1 Ответ

1 голос
/ 17 октября 2019

Мне бы хотелось получить умный способ извлечь информацию о пути этой древовидной структуры.

paths - ваш друг.

Учитывая определенные нарушения вИсходные данные, точные требования не всегда ясны, но следующее может быть тем, что вы ищете, и даже если нет, его будет легко настроить в соответствии с вашими подробными требованиями.

totsv.jq

def s: map(select(type=="string"));

paths as $p
| getpath($p) 
| if type == "object" and has("name")
  then ($p|s) + [.name, .value, (.source // empty)]
  elif type == "array" and .[0] == "Text" then ($p|s) + .
  else empty
  end
| @tsv

Вызов

jq -crf totsv.jq chromosomes.json

Выбор с выхода

CHROMOSOMES HUMAN   1   1
CHROMOSOMES HUMAN   2   2
...
STATUSES    Approved    Approved
STATUSES    Entry and symbol withdrawn  Entry Withdrawn
ATTRIBUTES  HGNC    HGNC ID gd_hgnc_id
...
ORDER_BY    GtRNAdb ID (supplied by GtRNAdb)    md_gtrnadb
OUTPUT  Text    Make URL for text

Для дальнейшего использования

Скореечем давать очень длинный ввод образца, было бы лучше дать небольшой образец, который тесно сплетен с подробными требованиями.

...