Text Analysis API

Purpose of this API is to provide various text analytics tools. We are starting with providing lemmatizer for 10 languages, term frequency, term density, automated readability index etc.

The access is provided by RapidAPI service.

Connect on RapidAPI

Description

Purpose of this API is to provide various text analytics tools. We are starting with providing lemmatizer for Bulgarian, Czech, English, Estonian, French, Hungarian, Romanian, Slovak, Slovene and Ukrainian with automatic language detection. Together with various statistics like term frequency, term density, automated readability index, reading time estimate, speaking time estimate etc.

The lemmatizer is based on Lemmagen project and is using lexicons derived from MULTEXT east free dictionaries published under Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. The source dictionary contains triplets: wordform, lemma and morphosyntactic description (MSD). The triplets are used for generating lemmatization rules (Ripple Down Rules). The quality of the derived lexicons depends on the source triplets quality.

NOTE: The lemmatizer is not context aware. Text is first tokenized then each token is analyzed individually using Lemmagen and the lemma is returned.

Disclaimer

API Hood does not warrant that the provided data will be free from errors or omissions, because used lexicons have a different quality in terms of completeness and precision.

Example

Request

curl --request POST \
     --include \
     --url https://texts.p.rapidapi.com/texts/analyze \
     --header 'accept: application/json' \
     --header 'content-type: application/json' \
     --header 'x-rapidapi-host: texts.p.rapidapi.com' \
     --header 'x-rapidapi-key: YOUR_API_KEY' \
     --data '{
       "text": "The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. The term is roughly synonymous with text mining; indeed, Ronen Feldman modified a 2000 description of \"text mining\" in 2004 to describe \"text analytics\". The latter term is now used more frequently in business settings while \"text mining\" is used in some of the earliest application areas, dating to the 1980s, notably life-sciences research and government intelligence.",
       "language": "en",
       "rules": ["term => 🐜"]
     }'

Response

HTTP/1.1 200 OK
Cache-Control: max-age=0, private, must-revalidate
Content-Type: application/json; charset=utf-8
Date: Mon, 05 Aug 2019 15:20:11 GMT
Server: RapidAPI-1.0.27
X-RapidAPI-Region: AWS - eu-central-1
X-RapidAPI-Version: 1.0.27
X-Request-Id: FbgP9AP4gpeDUzAAADIh
X-Request-Took-Ms: 43.829
Content-Length: 6371
Connection: keep-alive

{
    "disclaimer": "API Hood does not warrant that the provided data will be free from errors or omissions, because used lexicons have a different quality in terms of completeness and precision.",
    "language": {
        "code": "en",
        "detected": false,
        "probability": 1
    },
    "tokens": [
        {
            "lemma": "The",
            "position": 0,
            "range": [
                0,
                3
            ],
            "term": "The"
        },
        {
            "lemma": "🐜",
            "position": 1,
            "range": [
                4,
                8
            ],
            "term": "term"
        },
        {
            "lemma": "text",
            "position": 2,
            "range": [
                9,
                13
            ],
            "term": "text"
        },
        {
            "lemma": "analytic",
            "position": 3,
            "range": [
                14,
                23
            ],
            "term": "analytics"
        },
        {
            "lemma": "describe",
            "position": 4,
            "range": [
                24,
                33
            ],
            "term": "describes"
        },
        {
            "lemma": "a",
            "position": 5,
            "range": [
                34,
                35
            ],
            "term": "a"
        },
        {
            "lemma": "set",
            "position": 6,
            "range": [
                36,
                39
            ],
            "term": "set"
        },
        {
            "lemma": "of",
            "position": 7,
            "range": [
                40,
                42
            ],
            "term": "of"
        },
        {
            "lemma": "linguistic",
            "position": 8,
            "range": [
                43,
                53
            ],
            "term": "linguistic"
        },
        {
            "lemma": "statistical",
            "position": 9,
            "range": [
                55,
                66
            ],
            "term": "statistical"
        },
        {
            "lemma": "and",
            "position": 10,
            "range": [
                68,
                71
            ],
            "term": "and"
        },
        {
            "lemma": "machine",
            "position": 11,
            "range": [
                72,
                79
            ],
            "term": "machine"
        },
        {
            "lemma": "learn",
            "position": 12,
            "range": [
                80,
                88
            ],
            "term": "learning"
        },
        {
            "lemma": "technique",
            "position": 13,
            "range": [
                89,
                99
            ],
            "term": "techniques"
        },
        {
            "lemma": "that",
            "position": 14,
            "range": [
                100,
                104
            ],
            "term": "that"
        },
        {
            "lemma": "model",
            "position": 15,
            "range": [
                105,
                110
            ],
            "term": "model"
        },
        {
            "lemma": "and",
            "position": 16,
            "range": [
                111,
                114
            ],
            "term": "and"
        },
        {
            "lemma": "structure",
            "position": 17,
            "range": [
                115,
                124
            ],
            "term": "structure"
        },
        {
            "lemma": "the",
            "position": 18,
            "range": [
                125,
                128
            ],
            "term": "the"
        },
        {
            "lemma": "information",
            "position": 19,
            "range": [
                129,
                140
            ],
            "term": "information"
        },
        {
            "lemma": "content",
            "position": 20,
            "range": [
                141,
                148
            ],
            "term": "content"
        },
        {
            "lemma": "of",
            "position": 21,
            "range": [
                149,
                151
            ],
            "term": "of"
        },
        {
            "lemma": "textual",
            "position": 22,
            "range": [
                152,
                159
            ],
            "term": "textual"
        },
        {
            "lemma": "source",
            "position": 23,
            "range": [
                160,
                167
            ],
            "term": "sources"
        },
        {
            "lemma": "for",
            "position": 24,
            "range": [
                168,
                171
            ],
            "term": "for"
        },
        {
            "lemma": "business",
            "position": 25,
            "range": [
                172,
                180
            ],
            "term": "business"
        },
        {
            "lemma": "intelligence",
            "position": 26,
            "range": [
                181,
                193
            ],
            "term": "intelligence"
        },
        {
            "lemma": "exploratory",
            "position": 27,
            "range": [
                195,
                206
            ],
            "term": "exploratory"
        },
        {
            "lemma": "data",
            "position": 28,
            "range": [
                207,
                211
            ],
            "term": "data"
        },
        {
            "lemma": "analysis",
            "position": 29,
            "range": [
                212,
                220
            ],
            "term": "analysis"
        },
        {
            "lemma": "research",
            "position": 30,
            "range": [
                222,
                230
            ],
            "term": "research"
        },
        {
            "lemma": "or",
            "position": 31,
            "range": [
                232,
                234
            ],
            "term": "or"
        },
        {
            "lemma": "investigation",
            "position": 32,
            "range": [
                235,
                248
            ],
            "term": "investigation"
        },
        {
            "lemma": "The",
            "position": 33,
            "range": [
                250,
                253
            ],
            "term": "The"
        },
        {
            "lemma": "🐜",
            "position": 34,
            "range": [
                254,
                258
            ],
            "term": "term"
        },
        {
            "lemma": "be",
            "position": 35,
            "range": [
                259,
                261
            ],
            "term": "is"
        },
        {
            "lemma": "roughly",
            "position": 36,
            "range": [
                262,
                269
            ],
            "term": "roughly"
        },
        {
            "lemma": "synonymous",
            "position": 37,
            "range": [
                270,
                280
            ],
            "term": "synonymous"
        },
        {
            "lemma": "with",
            "position": 38,
            "range": [
                281,
                285
            ],
            "term": "with"
        },
        {
            "lemma": "text",
            "position": 39,
            "range": [
                286,
                290
            ],
            "term": "text"
        },
        {
            "lemma": "mine",
            "position": 40,
            "range": [
                291,
                297
            ],
            "term": "mining"
        },
        {
            "lemma": "indeed",
            "position": 41,
            "range": [
                299,
                305
            ],
            "term": "indeed"
        },
        {
            "lemma": "Ronen",
            "position": 42,
            "range": [
                307,
                312
            ],
            "term": "Ronen"
        },
        {
            "lemma": "Feldman",
            "position": 43,
            "range": [
                313,
                320
            ],
            "term": "Feldman"
        },
        {
            "lemma": "modify",
            "position": 44,
            "range": [
                321,
                329
            ],
            "term": "modified"
        },
        {
            "lemma": "a",
            "position": 45,
            "range": [
                330,
                331
            ],
            "term": "a"
        },
        {
            "lemma": "2000",
            "position": 46,
            "range": [
                332,
                336
            ],
            "term": "2000"
        },
        {
            "lemma": "description",
            "position": 47,
            "range": [
                337,
                348
            ],
            "term": "description"
        },
        {
            "lemma": "of",
            "position": 48,
            "range": [
                349,
                351
            ],
            "term": "of"
        },
        {
            "lemma": "text",
            "position": 49,
            "range": [
                353,
                357
            ],
            "term": "text"
        },
        {
            "lemma": "mine",
            "position": 50,
            "range": [
                358,
                364
            ],
            "term": "mining"
        },
        {
            "lemma": "in",
            "position": 51,
            "range": [
                366,
                368
            ],
            "term": "in"
        },
        {
            "lemma": "2004",
            "position": 52,
            "range": [
                369,
                373
            ],
            "term": "2004"
        },
        {
            "lemma": "to",
            "position": 53,
            "range": [
                374,
                376
            ],
            "term": "to"
        },
        {
            "lemma": "describe",
            "position": 54,
            "range": [
                377,
                385
            ],
            "term": "describe"
        },
        {
            "lemma": "text",
            "position": 55,
            "range": [
                387,
                391
            ],
            "term": "text"
        },
        {
            "lemma": "analytic",
            "position": 56,
            "range": [
                392,
                401
            ],
            "term": "analytics"
        },
        {
            "lemma": "The",
            "position": 57,
            "range": [
                404,
                407
            ],
            "term": "The"
        },
        {
            "lemma": "latter",
            "position": 58,
            "range": [
                408,
                414
            ],
            "term": "latter"
        },
        {
            "lemma": "🐜",
            "position": 59,
            "range": [
                415,
                419
            ],
            "term": "term"
        },
        {
            "lemma": "be",
            "position": 60,
            "range": [
                420,
                422
            ],
            "term": "is"
        },
        {
            "lemma": "now",
            "position": 61,
            "range": [
                423,
                426
            ],
            "term": "now"
        },
        {
            "lemma": "use",
            "position": 62,
            "range": [
                427,
                431
            ],
            "term": "used"
        },
        {
            "lemma": "more",
            "position": 63,
            "range": [
                432,
                436
            ],
            "term": "more"
        },
        {
            "lemma": "frequently",
            "position": 64,
            "range": [
                437,
                447
            ],
            "term": "frequently"
        },
        {
            "lemma": "in",
            "position": 65,
            "range": [
                448,
                450
            ],
            "term": "in"
        },
        {
            "lemma": "business",
            "position": 66,
            "range": [
                451,
                459
            ],
            "term": "business"
        },
        {
            "lemma": "setting",
            "position": 67,
            "range": [
                460,
                468
            ],
            "term": "settings"
        },
        {
            "lemma": "while",
            "position": 68,
            "range": [
                469,
                474
            ],
            "term": "while"
        },
        {
            "lemma": "text",
            "position": 69,
            "range": [
                476,
                480
            ],
            "term": "text"
        },
        {
            "lemma": "mine",
            "position": 70,
            "range": [
                481,
                487
            ],
            "term": "mining"
        },
        {
            "lemma": "be",
            "position": 71,
            "range": [
                489,
                491
            ],
            "term": "is"
        },
        {
            "lemma": "use",
            "position": 72,
            "range": [
                492,
                496
            ],
            "term": "used"
        },
        {
            "lemma": "in",
            "position": 73,
            "range": [
                497,
                499
            ],
            "term": "in"
        },
        {
            "lemma": "some",
            "position": 74,
            "range": [
                500,
                504
            ],
            "term": "some"
        },
        {
            "lemma": "of",
            "position": 75,
            "range": [
                505,
                507
            ],
            "term": "of"
        },
        {
            "lemma": "the",
            "position": 76,
            "range": [
                508,
                511
            ],
            "term": "the"
        },
        {
            "lemma": "early",
            "position": 77,
            "range": [
                512,
                520
            ],
            "term": "earliest"
        },
        {
            "lemma": "application",
            "position": 78,
            "range": [
                521,
                532
            ],
            "term": "application"
        },
        {
            "lemma": "area",
            "position": 79,
            "range": [
                533,
                538
            ],
            "term": "areas"
        },
        {
            "lemma": "date",
            "position": 80,
            "range": [
                540,
                546
            ],
            "term": "dating"
        },
        {
            "lemma": "to",
            "position": 81,
            "range": [
                547,
                549
            ],
            "term": "to"
        },
        {
            "lemma": "the",
            "position": 82,
            "range": [
                550,
                553
            ],
            "term": "the"
        },
        {
            "lemma": "1980",
            "position": 83,
            "range": [
                554,
                559
            ],
            "term": "1980s"
        },
        {
            "lemma": "notably",
            "position": 84,
            "range": [
                561,
                568
            ],
            "term": "notably"
        },
        {
            "lemma": "life",
            "position": 85,
            "range": [
                569,
                573
            ],
            "term": "life"
        },
        {
            "lemma": "science",
            "position": 86,
            "range": [
                574,
                582
            ],
            "term": "sciences"
        },
        {
            "lemma": "research",
            "position": 87,
            "range": [
                583,
                591
            ],
            "term": "research"
        },
        {
            "lemma": "and",
            "position": 88,
            "range": [
                592,
                595
            ],
            "term": "and"
        },
        {
            "lemma": "government",
            "position": 89,
            "range": [
                596,
                606
            ],
            "term": "government"
        },
        {
            "lemma": "intelligence",
            "position": 90,
            "range": [
                607,
                619
            ],
            "term": "intelligence"
        }
    ],
    "sentences": [
        "The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation.",
        "The term is roughly synonymous with text mining; indeed, Ronen Feldman modified a 2000 description of \"text mining\" in 2004 to describe \"text analytics\".",
        "The latter term is now used more frequently in business settings while \"text mining\" is used in some of the earliest application areas, dating to the 1980s, notably life-sciences research and government intelligence."
    ],
    "readability": {
        "ari": {
            "age": {
                "from": 24,
                "to": null
            },
            "index": 14,
            "level": "Professor"
        },
        "lexical_richness": 1.44,
        "reading_time": 1,
        "speaking_time": 1.09
    },
    "stats": {
        "lemmas": {
            "chars": {
                "count": 476,
                "per_token": 5.23
            },
            "count": 91,
            "density": {
                "1980": 0.01,
                "2000": 0.01,
                "2004": 0.01,
                "that": 0.01,
                "source": 0.01,
                "a": 0.02,
                "application": 0.01,
                "intelligence": 0.02,
                "structure": 0.01,
                "while": 0.01,
                "now": 0.01,
                "The": 0.03,
                "mine": 0.03,
                "linguistic": 0.01,
                "business": 0.02,
                "frequently": 0.01,
                "use": 0.02,
                "setting": 0.01,
                "latter": 0.01,
                "machine": 0.01,
                "set": 0.01,
                "🐜": 0.03,
                "more": 0.01,
                "content": 0.01,
                "modify": 0.01,
                "Feldman": 0.01,
                "text": 0.05,
                "early": 0.01,
                "analysis": 0.01,
                "with": 0.01,
                "notably": 0.01,
                "describe": 0.02,
                "exploratory": 0.01,
                "the": 0.03,
                "data": 0.01,
                "model": 0.01,
                "analytic": 0.02,
                "textual": 0.01,
                "science": 0.01,
                "roughly": 0.01,
                "synonymous": 0.01,
                "life": 0.01,
                "and": 0.03,
                "date": 0.01,
                "of": 0.04,
                "statistical": 0.01,
                "information": 0.01,
                "some": 0.01,
                "description": 0.01,
                "indeed": 0.01,
                "learn": 0.01,
                "in": 0.03,
                "be": 0.03,
                "technique": 0.01,
                "Ronen": 0.01,
                "research": 0.02,
                "area": 0.01,
                "to": 0.02,
                "or": 0.01,
                "investigation": 0.01,
                "government": 0.01,
                "for": 0.01
            },
            "frequency": {
                "1980": 1,
                "2000": 1,
                "2004": 1,
                "that": 1,
                "source": 1,
                "a": 2,
                "application": 1,
                "intelligence": 2,
                "structure": 1,
                "while": 1,
                "now": 1,
                "The": 3,
                "mine": 3,
                "linguistic": 1,
                "business": 2,
                "frequently": 1,
                "use": 2,
                "setting": 1,
                "latter": 1,
                "machine": 1,
                "set": 1,
                "🐜": 3,
                "more": 1,
                "content": 1,
                "modify": 1,
                "Feldman": 1,
                "text": 5,
                "early": 1,
                "analysis": 1,
                "with": 1,
                "notably": 1,
                "describe": 2,
                "exploratory": 1,
                "the": 3,
                "data": 1,
                "model": 1,
                "analytic": 2,
                "textual": 1,
                "science": 1,
                "roughly": 1,
                "synonymous": 1,
                "life": 1,
                "and": 3,
                "date": 1,
                "of": 4,
                "statistical": 1,
                "information": 1,
                "some": 1,
                "description": 1,
                "indeed": 1,
                "learn": 1,
                "in": 3,
                "be": 3,
                "technique": 1,
                "Ronen": 1,
                "research": 2,
                "area": 1,
                "to": 2,
                "or": 1,
                "investigation": 1,
                "government": 1,
                "for": 1
            },
            "top": [
                {
                    "count": 5,
                    "term": "text"
                },
                {
                    "count": 4,
                    "term": "of"
                },
                {
                    "count": 3,
                    "term": "be"
                },
                {
                    "count": 3,
                    "term": "in"
                },
                {
                    "count": 3,
                    "term": "and"
                },
                {
                    "count": 3,
                    "term": "the"
                },
                {
                    "count": 3,
                    "term": "🐜"
                },
                {
                    "count": 3,
                    "term": "mine"
                },
                {
                    "count": 3,
                    "term": "The"
                },
                {
                    "count": 2,
                    "term": "to"
                }
            ],
            "uniq_count": 62
        },
        "sentences": {
            "count": 3
        },
        "terms": {
            "chars": {
                "count": 512,
                "per_token": 5.63
            },
            "count": 91,
            "density": {
                "2000": 0.01,
                "2004": 0.01,
                "that": 0.01,
                "describes": 0.01,
                "a": 0.02,
                "application": 0.01,
                "intelligence": 0.02,
                "structure": 0.01,
                "while": 0.01,
                "now": 0.01,
                "The": 0.03,
                "linguistic": 0.01,
                "areas": 0.01,
                "business": 0.02,
                "settings": 0.01,
                "sources": 0.01,
                "frequently": 0.01,
                "techniques": 0.01,
                "modified": 0.01,
                "latter": 0.01,
                "machine": 0.01,
                "sciences": 0.01,
                "analytics": 0.02,
                "set": 0.01,
                "more": 0.01,
                "content": 0.01,
                "Feldman": 0.01,
                "text": 0.05,
                "analysis": 0.01,
                "with": 0.01,
                "notably": 0.01,
                "describe": 0.01,
                "exploratory": 0.01,
                "the": 0.03,
                "data": 0.01,
                "model": 0.01,
                "textual": 0.01,
                "roughly": 0.01,
                "synonymous": 0.01,
                "life": 0.01,
                "and": 0.03,
                "used": 0.02,
                "1980s": 0.01,
                "of": 0.04,
                "statistical": 0.01,
                "information": 0.01,
                "some": 0.01,
                "dating": 0.01,
                "description": 0.01,
                "indeed": 0.01,
                "earliest": 0.01,
                "term": 0.03,
                "in": 0.03,
                "Ronen": 0.01,
                "is": 0.03,
                "research": 0.02,
                "to": 0.02,
                "or": 0.01,
                "investigation": 0.01,
                "government": 0.01,
                "for": 0.01,
                "learning": 0.01,
                "mining": 0.03
            },
            "frequency": {
                "2000": 1,
                "2004": 1,
                "that": 1,
                "describes": 1,
                "a": 2,
                "application": 1,
                "intelligence": 2,
                "structure": 1,
                "while": 1,
                "now": 1,
                "The": 3,
                "linguistic": 1,
                "areas": 1,
                "business": 2,
                "settings": 1,
                "sources": 1,
                "frequently": 1,
                "techniques": 1,
                "modified": 1,
                "latter": 1,
                "machine": 1,
                "sciences": 1,
                "analytics": 2,
                "set": 1,
                "more": 1,
                "content": 1,
                "Feldman": 1,
                "text": 5,
                "analysis": 1,
                "with": 1,
                "notably": 1,
                "describe": 1,
                "exploratory": 1,
                "the": 3,
                "data": 1,
                "model": 1,
                "textual": 1,
                "roughly": 1,
                "synonymous": 1,
                "life": 1,
                "and": 3,
                "used": 2,
                "1980s": 1,
                "of": 4,
                "statistical": 1,
                "information": 1,
                "some": 1,
                "dating": 1,
                "description": 1,
                "indeed": 1,
                "earliest": 1,
                "term": 3,
                "in": 3,
                "Ronen": 1,
                "is": 3,
                "research": 2,
                "to": 2,
                "or": 1,
                "investigation": 1,
                "government": 1,
                "for": 1,
                "learning": 1,
                "mining": 3
            },
            "top": [
                {
                    "count": 5,
                    "term": "text"
                },
                {
                    "count": 4,
                    "term": "of"
                },
                {
                    "count": 3,
                    "term": "mining"
                },
                {
                    "count": 3,
                    "term": "is"
                },
                {
                    "count": 3,
                    "term": "in"
                },
                {
                    "count": 3,
                    "term": "term"
                },
                {
                    "count": 3,
                    "term": "and"
                },
                {
                    "count": 3,
                    "term": "the"
                },
                {
                    "count": 3,
                    "term": "The"
                },
                {
                    "count": 2,
                    "term": "to"
                }
            ],
            "uniq_count": 63
        }
    },
    "copyright": "©2019 API Hood. Generated with LemmaGen using MULTEXT-East free lexicons distributed under CC BY-SA 4.0"
}

JSON response was pretty printed manually

Documentation

For full documentation and interactive console, please see RapidAPI Endpoints and API Details pages.

Changelog

Language detection (August 2019)

The API tries to detect the language of the analyzed text when the language parameter is not presented. The detected language is returned in the response together with probability.

{
  //...
  "language": {
        "code": "en",
        "detected": true,
        "probability": 0.9999938244532147
    }
  //...
}

Text statistics (August 2019)

API now returns various text statistics like term frequency, term density, automated readability index, reading time estimate, speaking time estimate, top lemmas, top terms, lexical richness etc.

Planned features

Pricing

We are providing free rate limited (60 req / min) plan. For more plans, please see RapidAPI pricing page.

Research purposes

We want to support researchers interested in the API use, so if you want to use the API for research purposes and need more requests, contact us for a custom plan and we will try to do our best to support your research.

Unlimited access

If you need a plan with unlimited requests, contact us for a custom plan.

Connect on RapidAPI