Drumbeat/MoJo/hackfest/berlin/projects/MetaMetaProject
Contents
About The Project
Name: Meta Meta Project
Code Repository: On Github
The Meta Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.
Project Status
- Much of the API is designed and documented.
- Much of the API is stubbed out in code, ready to have the "brains" inserted.
- Keyword extraction is implemented, in addition to a front-facing "test shell" which can easily be modified to show off the new features as they are added.
Collaborators
Oh so many folks at Hacktoberfest helped in discussions, brainstorms, fleshing out the wish lists, and in some case even in the code. Shout outs in particular go to:
- [Raynor Vliegendhart] who helped design python server template and serve as a spectacular Python resource.
- [Tathagata Dasgupta] who has been particularly enthusiastic about contributing his entity extraction work.
- [Mark Boas] who is going to be a key player in the incorporation of microformats transcription features.
- Laurian Gridinoc whose comments and advice helped shape the API design.
Next steps
There are some clear next steps:
- Continue fleshing out the API, particularly for Text and Audio formats.
- Continue to code the specific tasks in the API.
- Flesh out and possibly streamline the installation process.
- Encapsulate library includes so that, when setting up a server, it is possible to only set up specific portions (for instance maybe someone doesn't need the identify_keywords task so ideally they wouldn't have to install nltk).
- Design a "test script" which will make it clear what tasks are functional and what tasks don't have their dependencies properly installed
- Design new media type "Web Site" which will focus on component extraction (e.g. "identify_videos" "identify_content" etc.)
Places where this project might be tested include:
- This tool can be used (and contributed to) by anyone who is hacking together a project using media, be it in a newsroom, a professional in a company, or a hobby coder.
Meta Standards Resources
(Add links and summaries to documents discussing metadata)
- rNews is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
- Metafragments proposed metadata markup for audio and video. - (Julien Dorra)
Known APIs and Tools
(Add links and summaries of toolkits and APIs which can help generate data!)
- http://m.vid.ly/user/ - won't generate metadata but can help with format conversions
Desired Functionality
TEXT
Valid Inputs: URL, Plain Text, HTML
Optional Inputs: Known Metadata
Desired Metadata:
- Primary Themes (Document-wide) - Primary Themes (Per-paragraph) - Suggested Tags - Entities (Names, Locations, Dates, Organizations) and their locations in text - Author - Publishing organization (if any) - Date initially published and date last updated - Names of people who are quoted - Quotes - Other texts cited and/or linked (books, articles, urls) - All other numbers (that aren't dates) and their units (i.e. data points cited) - Corrections
VIDEO
Valid Inputs: URL, Video (.mov .mp4 vp8)
Optional Inputs: Transcript, Faces, Known Metadata
Desired Metadata:
- Transcript - Moments of audio transition (new speaker) - Moments of video transition (new scene) - OCR data (any text that appears on image) and their timestamps - Entities (Names, Locations) and their timestamps - Suggested Tags - Face identification and their timestamp ranges [only done if faces are provided] - caption/summary - author and job title - headline - keywords - location - date - copyright - news org name - URL to related word story
AUDIO
Valid Inputs: URL, Audio (mp3, wav)
Optional Inputs: Transcript, Voice Samples, Known Metadata
Desired Metadata:
- Transcript - Moments of audio transition (new speaker) - Entities (Names, Locations) and their timestamps - Suggested Tags - Voice identification and their timestamp ranges [only done if voice samples are provided]
IMAGE
Valid Inputs: URL, Image (jpg, gif, bmp, png, tif)
Optional Inputs: Faces, Known Metadata
Desired Metadata:
- OCR data and it's coordinate location - Object identification - Face identification [only done if faces are provided] - Location identification
In photo we have:
- caption - author and job title - headline - keywords - location - date - copyright - news org name
INTERACTIVE
Valid Inputs: URL
Optional Inputs: None
Desired Metadata: ???
WEB PAGE
Valid Inputs: URL
Optional Inputs: None
Desired Metadata:
- images - audio - videos - content - title - author - last update - meta tags -
API
API to be as RESTful as posisble. Current thought is that POST will be used to upload the media item (if needed) which will return a Media Item ID (MIID), GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).
Entity Types
* text * image * video * audio
Text
URL: /api/text
POST
Inputs
- text_file:file // text file to store on the server - url:str // url containing the text to store on the server - text:str // text to store on the server - ttl:int {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
Note: Either text_file, url, or text must be provided
Outputs
- miid:int // unique media item id assigned to this item
GET
Inputs
- miid:int // server-provided media item id to be analyzed - url:str // url containing the text to be analyzed - text:str // text to be analyzed - tasks:dictionary // list of tasks to perform - results:dictionary {D: null} // list of results from past tasks
Note: Either miid, url, or text must be provided
Outputs
- results:dictionary // list of task results (one result object per task).
Tasks
identify_entities
Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both
Powered by [???]
Inputs
None
Outputs
- entities:array // array of [position, entity, type] tuples in the document
identify_keywords
Identify main keywords found in the text, either document wide, per paragraph, or both
Powered by nltk
Inputs
- type:enum('document','paragraph', 'both') {D: 'document'} // The scope of keywords to be extracted - klen:int {D: 1} // The number of words per "keyword" - kcount:int {D: 5} // The number of keywords to return
Outputs
- document_keywords:array // list of keywords for the entire document - paragraph_keywords:array // list of keywords for each paragraph
Video
URL: /api/video
POST
Inputs
- video_file:file // video file to store on the server - url:str // url containing the video to store on the server - ttl:int {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
Note: Either video_file or url must be provided
Outputs
- miid:int // unique media item id assigned to this item
GET
Inputs
- miid:int // server-provided media item id to be analyzed - url:str // url containing the video to be analyzed - tasks:dictionary // list of tasks to perform - results:dictionary {D null} // list of results from past tasks
Note: Either miid or url must be provided
Outputs
- results:dictionary // list of task results (one result object per task)
Tasks
identify_audio_transitions
Identify moments of distinct changes in audio content (e.g. speaker changes).
Powered by [???]
Inputs
None
Outputs
- audio_transitions:array // list of [HH:MM:SS, sound_id] tuples
identify_entities
Identify entities (e.g. people, organizations, and locations) found in the video transcript
Powered by [???]
Inputs
None
Outputs
- entities:array // array of [HH:MM:SS, entity, type] tuples in the document
identify_faces
Identify faces that appear in the video
Powered by ??
Inputs
- sample_rate:int {D: 1} // number of frames per second to sample for analysis
Outputs
- faces:array // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
identify_keywords
Identify main keywords found in the video, either video wide or per time segment
Powered by nltk
Inputs
- block_size:int {D: 0} // size of the time blocks in seconds (0 means entire video)
Outputs
- video_keywords:array // list of [start HH:MM:SS, [keywords]] tuples for each time block
identify_video_transitions
Identify moments of distinct changes in video content (e.g. scene changes).
Powered by [???]
Inputs
None
Outputs
- video_transitions:array // list of [HH:MM:SS, scene_id] tuples
ocr
Attempt to extract any digital characters found in the video.
Powered by [???]
Inputs
- focus_blocks:array {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR - sample_rate:int {D: 1} // number of frames per second to sample for analysis
Outputs
- ocr_results:array // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
transcribe
Attempt to create a timestamped transcript for the video. The transcript will either be ripped from CC data or estimated using speech to text algorithms.
Powered by [???]
Inputs
None
Outputs
- transcript:array // list of [HH:MM:SS, transcript] tuples - transcription_method:enum('cc','stt') // method used to generate the transcript
Audio
URL: /api/audio
Image
URL: /api/image