Aho,, A. V., & Corasick,, M. J. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18, 333–340.
Aki,, S. (1992). Waiting time problems for a sequence of discrete random variables. Annals of the Institute of Statistical Mathematics, 44, 363–378.
Aki,, S., & Hirano,, K. (1993). Discrete distributions related to succession events in a two‐state Markov chain. In K. Matusita,, M. L. Puri,, & T. Hayakawa, (Eds.), Statistical science and data analysis (pp. 464–474). Amsterdam, The Netherlands: VSP International Science Publishers.
Antzoulakos,, D. L. (1999). On waiting time problems associated with runs in Markov dependent trials. Annals of the Institute of Statistical Mathematics, 51(2), 323–330.
Antzoulakos,, D. L. (2001). Waiting times for patterns in a sequence of multistate trials. Journal of Applied Probability, 38, 508–518.
Aston,, J. A. D., & Martin,, D. E. K. (2005). Waiting time distribution of competing patterns in higher‐order Markovian sequences. Journal of Applied Probability, 42, 977–988.
Aston,, J. A. D., & Martin,, D. E. K. (2007). Waiting time distributions of general runs and patterns in hidden Markov models. Annals of Applied Statistics, 1(2), 585–611.
Balakrishnan,, N., & Koutras,, M. V. (2002). Runs and scans with applications. New York, NY: John Wiley %26 Sons, Inc.
Balasubramanian,, N., Viveros,, R., & Balakrishnan,, N. (1993). Sooner and later waiting time problems for Markovian Bernoulli trials. Statistics %26 Probability Letters, 18, 153–161.
Bassino,, F., Clément,, J., Fayolle,, J., & Nicodéme,, P. (2012). Counting occurrences for a finite set of words. ACM Transactions on Algorithms, 8(3), 358–370.
Begleiter,, R., El‐Yaniv,, R., & Yona,, G. (2004). On prediction using variable length Markov models. Journal of Artificial Intelligence, 22, 385–421.
Benson,, G., & Mak,, D. Y. F. (2009). Exact distribution of a spaced seed statistic for DNA homology detection. String Processing and Information Retrieval, Lecture Notes in Computer Science, 5280, 283–293.
Boeva,, V., Clément,, J., Régnier,, M., & Vandenbogaert,, M. (2005). Assessing the significance of sets of words. In A. Apostolico,, M. Crochemore,, & K. Park, (Eds.), Combinatorial pattern matching. CPM 2005. Lecture notes in computer science (Vol. 3537, pp. 358–370). Berlin, Germany: Springer.
Bourguignon,, P.‐Y. & Robelin,, D. (2004). Modeles de Markov parcimonieux: selection de modele et estimation. Paper presented at: Proceedings of the 5e edition des Journees Ouvertes en Biologie, Informatique et Mathematiques (JOBIM), Montreal.
Brookner,, E. (1966). Recurrent events in a Markov chain. Information and Control, 9, 215–229.
Buhler,, J., Keich,, U., & Sun,, Y. (2005). Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences, 70, 342–363.
Bühlmann,, P. (2000). Model selection for variable length Markov chains and tuning the context algorithm. Annals of the Institute of Statistical Mathematics, 52(2), 287–315.
Bühlmann,, P., & Wyner,, A. J. (1999). Variable length Markov chains. Annals of Statistics, 27(2), 480–513.
Cheung,, L. W. K. (2004). Use of runs statistics for pattern recognition in genomic DNA sequences. Journal of Computational Biology, 11, 107–124.
Cochran,, W. G. (1938). An extension of Gold`s method for examining the apparent persistence of one type of weather. Quarterly Journal of the Royal Meteorological Society, 64, 631–634.
Csiszár,, I., & Talata,, Z. (2006). Context tree estimation for not necessarily finite memory processes, via BIC and MDL. IEEE Transactions on Information Theory, 52, 1007–1016.
Ebnesharashoob,, M., & Sobel,, M. (1990). Sooner and later waiting time problems for Bernoulli trials: Frequency and run quotas. Statistics %26 Probability Letters, 9(1), 5–11.
Eggeling,, R., & Koivisto,, M. K. H. (2016). Pruning rules for learning parsimonious context trees. In A. Ihler, & D. Janzing, (Eds.), Uncertainty in artificial intelligence: Proceedings of the thirty‐second conference, Jersey City, New Jersey, USA (pp. 152–161). Corvallis, OR: AUAI Press.
Eggeling,, R., Koivisto,, M. K. H. & Grosse,, I. (2015). Dealing with small data: On the generalization of context trees. Paper presented at: Proceedings of the 32nd International Conference on Machine Learning (ICML).
Eryilmaz,, S. (2016). Generalized waiting time distributions associated with runs. Metrika, 79(3), 357–368.
Feller,, W. (1968). An introduction to probabiiltiy theory and its applications (Vol. 1, 3rd ed.). New York, NY: Wiley.
Fernández,, M., García,, J. E., & González‐López,, V. A. (2018). A copula‐based partition Markov procedure. Communications in Statistics ‐Theory and Methods, 47(14), 3408–3417.
Flajolet,, P., & Sedgewick,, R. (2009). Analytic combinatorics. Cambridge, England: Cambridge University Press.
Fu,, J. C., & Koutras,, M. V. (1994). Distribution theory of runs: A Markov chain approach. Journal of the American Statistical Association, 89, 1050–1058.
Fu,, J. C., & Lou,, W. Y. W. (2003). Distribution theory of runs and patterns and its applications: A finite Markov chain imbedding approach. River Edge, NJ: World Scientific Publishing Co. Pte. Ltd.
Gabadinho,, A., & Ritschard,, G. (2016). Analyzing state sequences with probabilistic suffix trees. Journal of Statistical Software, 72(3), 1–39.
García,, J.E. & González‐López,, V. A. (2010). Minimal Markov models.Retrieved from https://arxiv.org/abs/1002.0729.
García,, J. E., & González‐López,, V. A. (2017). Consistent estimation of partition Markov models. Entropy, 19, 1050–1058.
Gerber,, H. U., & Li,, S.‐Y. R. (1981). The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain. Stochastic Processes and their Applications, 11, 101–108.
Glaz,, J., Naus,, J., & Wallenstein,, S. (2001). Scan statistics. New York, NY: Springer‐Verlag.
Godbole,, A. P., & Papastavridis,, S. G. (1994). Runs and patterns in probability: Selected papers. Dordrecht, The Netherlands: Kluwer.
Goulden,, I., & Jackson,, D. M. (1979). An inversion theorem for cluster decompositions of sequences with distinguished subsequences. Journal of the London Mathematical Society, 20, 567–576.
Graham,, R. L., Knuth,, D. E., & Patashnik,, O. (1994). Concrete Mathematics. Boston, MA: Addison‐Wesley.
Guibas,, L. J., & Odlyzko,, A. M. (1981). String overlaps, pattern matching, and nontransitive games. Journal of Combinatorial Theory, Series A, 30, 183–208.
Hirano,, K., Aki,, S., Kashiwagi,, N., & Kuboki,, H. (1991). On Ling`s binomial and negative binomial distributions of order k. Statistics %26 Probability Letters, 11(6), 503–509.
Hopcroft,, J. E. (1971). An n log n algorithm for minimizing states in a finite automaton. In Z. Kohavi, & A. Paz, (Eds.), Theory of machines and computation (pp. 189–196). New York, NY: Academic Press.
Hopcroft,, J. E., Motwani,, R., & Ullman,, J. D. (2000). Introduction to automaton theory, languages, and computation (2nd ed.). Boston, MA: Addison Wesley.
Jääskinen,, V., Xiong,, J., Koski,, T., & Corander,, J. (2014). Sparse Markov chains for sequence data. Scandinavian Journal of Statistics, 41, 641–655.
Keich,, U., Li,, M., Ma,, B., & Tromp,, J. (2004). On spaced seeds for similarity search. Discrete Applied Mathematics, 138(3), 253–263.
Kong,, Y. (2007). Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. Journal of Computational Biology, 14(2), 238–254.
Kong,, Y. (2015). Distribution of runs revisited. Communications in Statistics ‐ Theory and Methods, 44(22), 4663–4678. https://doi.org/10.1080/03610926.2013.793350
Kong,, Y. (2017). Number of appearances of events in random sequences: A new generating function approach to type II and type III runs. Annals of the Institute of Statistical Mathematics, 69, 489–495.
Kong,, Y. (2018). Joint distribution of rises, falls, and number of runs in random sequences. Communications in Statistics ‐ Theory and Methods, 48, 493–499. https://doi.org/10.1080/03610926.2017.1414261
Koutras,, M. (1997). Waiting times and number of appearances of events in a sequence of discrete random variables. In N. Balakrishnan, (Ed.), Advances in combinatorial methods and applications to probability and statistics (pp. 363–384). Boston, MA: Birkhäuser.
Koutras,, M. V., & Alexandrou,, V. A. (1995). Runs, scans and urn models: A unified Markov chain approach. Annals of the Institute of Statistical Mathematics, 47, 743–766.
Li,, H., & Homer,, N. (2010). A survey of sequence alignment algorithms for next‐generation sequencing. Briefings in Bioinformatics, 2(5), 473–483.
Li,, S.‐Y. R. (1980). A martingale approach to the study of occurrence of sequence patterns in repeated experiments. Annals of Probability, 8, 1171–1176.
Ling,, K. D. (1989). A new class of negative binomial distributions. Statistics %26 Probability Letters, 7, 371–376.
Lladser,, M., Betterton,, M. D., & Knight,, R. (2008). Multiple pattern matching: A Markov chain approach. Journal of Mathematical Biology, 56(1–2), 51–92.
Lladser,, M. E. (2007). Minimal Markov chain embeddings of pattern problems. In Proceedings of the 2007 information theory and applications workshop. San Diego: University of California.
Ma,, B., Tromp,, J., & Li,, M. (2002). PatternHunter: Faster and more sensitive homology search. Bioinformatics, 18(3), 440–445.
Marsan,, L., & Sagot,, M. F. (2000). Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computational Biology, 7(3–4), 345–362.
Marshall,, T., & Rahmann,, S. (2008). Probabilistic arithmetic automata and their application to pattern matching statistics. In P. Ferragina, & G. M. Landau, (Eds.), Proceedings of the 19th annual symposium on combinatorial pattern matching (CPM) Lecture notes in computer science (Vol. 5029, pp. 95–106). Heidelberg, Germany: Springer.
Martin,, D. E. K. (2018). Minimal auxiliary Markov chains through sequential elimination of states. Communications in Statistics: Simulation and Computation, 48, 1040–1054. https://doi.org/10.1080/03610918.2017.1406505
Martin,, D. E. K. (2019). Distributions of patterns statistics in sparse Markov models. Annals of the Institute of Statistical Mathematics, (in press). https://doi.org/10.1007/s10463-019-00714-6)
Martin,, D. E. K., & Aston,, J. A. D. (2008). Waiting time distribution of generalized later patterns. Computational Statistics and Data Analysis, 52, 4879–4890.
Martin,, D. E. K., & Aston,, J. A. D. (2013). Distribution of statistics of hidden state sequences through the sum‐product algorithm. Methodology and Computing in Applied Probability, 15(4), 897–918.
Martin,, D. E. K., & Coleman,, D. A. (2011). Distributions of clump statistics for a collection of words. Journal of Applied Probability, 48, 1049–1059.
Martin,, D. E. K., & Noé,, L. (2017). Faster exact probabilities for statistics of overlapping pattern occurrences. Annals of the Institute of Statistical Mathematics, 69(1), 231–248.
Mohanty,, S. G. (1994). Success runs of length‐k in Markov dependent trials. Annals of the Institute of Statistical Mathematics, 46, 777–798.
Mood,, A. M. (1940). The distribution theory of runs. Annals of Mathematical Statistics, 11, 367–392.
Mosteller,, F. (1941). Note on an application of runs to quality control charts. Annals of Mathematical Statistics, 12, 228–232.
Nicodéme,, P. (2003). Regexpcount, a symbolic package for counting problems on regular expressions and words. Fundamenta Informaticae, 56(1–2), 71–88.
Nicodéme,, P., Salvy,, B., & Flajolet,, P. (2002). Motif statistics. Theoretical Computer Science, 287(2), 593–618.
Noé,, L. (2017). Best hits of 11110110111: Model‐free selection and parameter‐free sensitivity calculation of spaced seeds. Algorithms for Molecular Biology, 12(1). https://doi.org/10.1186/s13015-017-0092-1
Noé,, L., & Martin,, D. E. K. (2014). A coverage criterion for spaced seeds and its applications to SVM string‐kernels and k‐mer distances. Journal of Computational Biology, 21(12), 947–963.
Noonan,, J., & Zeilberger,, D. (1999). The Goulden‐Jackson cluster method: Extensions, applications and implementations. Journal of Difference Equations and Applications, 5, 355–377.
Nuel,, G. (2006). Numerical solutions for pattern statistics on Markov chains. Statistical Applications in Genetics and Molecular Biology, 5(1), 26.
Nuel,, G. (2008). Pattern Markov chains: Optimal Markov chain embedding through deterministic finite automata. Journal of Applied Probability, 45, 226–243.
Philippou,, A. N. (1984). The negative binomial distribution of order k and some of its properties. Biometrical Journal, 26, 789–794.
Régnier,, M., Furletova,, E., Yakovlev,, V., & Roytberg,, M. (2014). Analysis of pattern overlaps and exact computation of P‐values of pattern occurrences numbers: Case of hidden Markov models. Algorithms for Molecular Biology, 9(25), 25. https://doi.org/10.1186/s13015-014-0025-1
Régnier,, M., & Szpankowski,, W. (1998). On pattern frequency occurrences in a Markovian sequence. Algorithmica, 22(4), 631–649.
Reinert,, G., Schbath,, S., & Waterman,, M. S. (2000). Probabilistic and statistical properties of words: An overview. Journal of Computational Biology, 7(1–2), 1–46.
Ribeca,, P., & Raineri,, E. (2008). Faster exact Markovian probability functions for motif occurrences: A DFA‐only approach. Bioinformatics, 24(24), 2839–2848.
Rissanen,, J. (1983). A universal data compression system. IEEE Transactions on Information Theory, 29, 656–664.
Rissanen,, J. (1986). Complexity of strings in the class of Markov sources. IEEE Transactions on Information Theory, 32(4), 526–532.
Robin,, S., & Daudin,, J.‐J. (1999). Exact distribution of word occurrences in a random sequence of letters. Journal of Applied Probability, 36, 179–193.
Ron,, D., Singer,, Y., & Tishby,, N. (1996). The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, 25(2–3), 117–149.
Roos,, T. & Yu,, B. (2009). Sparse Markov source estimation via transformed Lasso. Paper presented at: Proceedings of the IEEE Information Theory Workshop (ITW‐2009) (pp. 241–245). Taormina, Sicily, Italy.
Schwager,, S. (1983). Run probabilities in sequences of Markov dependent trials. Journal of the American Statistical Association, 78, 168–175.
Smith,, M. L. D., & Griffith,, W. S. (2008). The analysis and comparison of start‐up demonstration tests. European Journal of Operational Research, 186(3), 1029–1045.
Stefanov,, V., & Pakes,, A. G. (1997). Explicit distributional results in pattern formation. Annals of Applied Probability, 7, 666–678.
Uchida,, M. (1998). On generating functions of waiting time problems for sequence patterns of discrete random variables. Annals of the Institute of Statistical Mathematics, 50, 655–671.
Uchida,, M., & Aki,, S. (1995). Sooner and later waiting time problems in a two‐state Markov chain. Annals of the Institute of Statistical Mathematics, 47, 415–433.
Weinberger,, M., Lempel,, A., & Ziv,, J. (1992). A sequential algorithm for the universal coding of finite memory sources. IEEE Transactions on Information Theory, 38, 1002–1024.
Weinberger,, M., Rissanen,, J., & Feder,, M. (1995). A universal finite memory source. IEEE Transactions on Information Theory, 41(3), 643–652.
Wishart,, J., & Hirshfield,, H. O. (1936). A theorem concerning the distribution of joins between line segments. Journal of the London Mathematical Society, 11, 227–235.
Wolfowitz,, J. (1943). On the theory of runs with some applications to quality control. Annals of Mathematical Statistics, 14, 280–288.
Xiong,, J., Jääskinen,, V., & Corander,, J. (2016). Recursive learning for sparse Markov models. Bayesian Analysis, 11(1), 247–263.
Yalçin,, F., & Eryilmaz,, S. (2014). Q‐geometric and q‐binomial distributions of order k. Journal of Computational and Applied Mathematics, 271, 31–38.